NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HGA8XVB2XW8E6BTF9TXBG
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
25.4s
Cost:
$0.014130
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
28.02%
Dialogue to Total Word Ratio
(details)
14.0086%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge with a notebook balanced on one knee and a juice box someone had dropped in the alley balanced on the other. Across the street, balloons bobbed over the garden gate like brightly colored surveillance devices. Bill lay beside her in a trench coat too dramatic for daylight, peering through opera glasses he had once stolen from a woman who used them exclusively for horse races and judging neighbors. “Tell me again,” Bill whispered, “why we’re robbing a birthday party and not, say, a bank with fewer clowns?” “Because,” said Molly, flipping a page, “banks have alarms, guards, dye packs, and consequences. Children’s parties have sponge cake, distracted adults, and presents piled in one convenient corner.” “That sentence alone should put us both on a watch list.” “You say that like we aren’t already on several.” Bill lowered the glasses. “I count twelve children, three mothers, one father in sandals, and a magician whose rabbit looks unionized.” “Good. The magician is our weather vane. When he starts making balloon swords, attention shifts left.” “And our objective shifts right,” Bill said. “Through the side gate, across the patio, seize the gift mountain, and vanish before anyone notices little Clementine’s educational ukulele has entered the criminal economy.” Molly nodded briskly. “Exactly. We’re not monsters, Bill. We’ll leave the handmade cards.” “That is reassuringly ethical.” Inside the garden, shrieks rose as a child in a pirate hat struck a piñata with the strategic grace of an invading army. Bill watched the candy scatter and sighed. “You know,” he said, “there was a time I dreamed bigger. Crown jewels. Casino vaults. A Fabergé egg.” “And now?” “Now I’m about to steal a remote-control dinosaur from a six-year-old named Theo.” “Adaptability is the soul of professionalism.” “It’s also the soul of moral collapse.” Molly snapped the notebook shut. “Listen carefully. Entry point is the trellis by the hydrangeas. You go in posing as an uncle.” “I can’t pose as an uncle. I have the face of a man not allowed near a bounce house.” “Then carry confidence and a casserole dish. People trust casseroles.” Bill held up an empty aluminum tray. “Fine. And you?” “I’ll be in the caterer’s apron. I walk straight to the present table, assess volume, and signal.” “What’s the signal?” “I say, ‘Who wants more lemonade?’” “That’s not a signal. That’s a sentence likely to start a riot.” “Exactly. Chaos is cover.” Bill looked again through the hedge. “Small complication. There’s a dog.” Molly leaned in. “Size?” “Medium. Hair bow. Expression of a retired detective.” “Bribable?” “With sausage, probably. With principles, no.” She produced a cocktail sausage from her pocket. Bill stared. “You brought dog rations?” “I plan for every contingency.” “You once forgot your shoes during a museum job.” “And yet I remembered the dog rations. Growth.” A burst of applause came from the yard. The magician raised a wand; the rabbit looked deeply tired. “This is it,” Molly said. “Balloon swords in three, two—” Children screamed with delight. “Go,” she hissed. Bill stood, smoothed his coat, and adopted the bland smile of a suburban relative. “If I’m caught,” he murmured, “tell the papers I died doing what I hated.” “You’ll be fine.” “If there’s a face painter, I’m finished.” They slipped through the gate, Molly gliding toward the gift table beneath a banner that read HAPPY BIRTHDAY DAISY in letters large enough to shame hesitation. Bill lifted the casserole tray like a passport. “Afternoon,” he said to no one in particular. “Terrible parking. Lovely bunting.” Molly reached the presents and inhaled. Boxes, bags, ribbons, glorious abundance. “Who wants more lemonade?” she called. The garden erupted. A dozen children turned. Three mothers panicked. The father in sandals shouted, “We’re out of cups!” Bill grabbed two armfuls of presents. Molly scooped the rest into a plastic tablecloth and knotted the corners. Then the dog appeared, silent and judgmental, sausage already somehow in its mouth. Bill froze. “It knows,” he whispered. “Everything knows,” Molly said, hefting the bundle. “Run.”
created_at
"2026-03-06T12:21:06.468Z"
3/6/2026, 12:21:06 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
470
moderation_latency
(null)
generation_time
25338
tokens_prompt
77
tokens_completion
1221
native_tokens_prompt
66
native_tokens_completion
931
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01413
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0ef7d8a6cc28ed3c0169aac6b288a08197b623cb9663092a1d"
is_byok
false
latency
148
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799666-Z4OIUDnjZ6F6IAc6byQA"
upstream_id
"resp_0ef7d8a6cc28ed3c0169aac6b288a08197b623cb9663092a1d"
total_cost
0.01413
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
671
28.02%
Dialogue to Total Word Ratio
Ratio: 55.04%, Deviation: 5.04%
neededClean
false
wordsTotal
674
wordsDialogue
371
14.0086%