NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM0A5BGZ2GRD88K7CW19E4X4
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
5.5s
Cost:
$0.004621
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0024%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crumpled invitation across the dashboard like a general studying enemy terrain. The paper was decorated with balloons, cartoon rockets, and a smiling bear that looked far too trusting. Outside, rain tapped softly on the windscreen, making the van feel like the inside of a kettle. Bill leaned over from the passenger seat and squinted. “You’re telling me the birthday boy is six, the party starts at three, and the loot is stacked in the sunroom?” “According to our source,” Molly said, using the solemn tone of someone who did not want to admit the source was the cleaner’s nephew, “yes.” Bill snorted. “A cleaner’s nephew is not a source. That’s a lad with opinions.” Molly folded the invitation. “He has excellent opinions. He said the present table is near the kitchen, the helium balloons are clustered by the stairs, and the parents are distracted by a magician who keeps losing his rabbit.” Bill brightened. “A magician with a missing rabbit. Now there’s a professional who’s really fallen apart.” “Focus,” Molly said. “We go in as party entertainers.” Bill stared at her. “You said that with a straight face, which is a disturbing skill.” “I brought the clown noses.” “You brought clown noses,” he repeated. “I brought tasteful clown noses.” “There is no such thing.” Molly reached into the back and produced two red foam noses, each attached to a thin elastic. “These say ‘fun’ without screaming ‘police investigation.’” Bill held one up, offended. “This one says ‘fun’ if fun has been sleeping in a damp cellar.” “Put it on.” He did. It sat crooked on his face. “I look like a recently divorced tomato.” “You look anonymous.” Bill peered out at the rainbow bunting fluttering above the garden gate. Children’s laughter drifted over the hedge in bright bursts, along with a shriek that suggested either delight or someone had opened a snake-shaped cake. “So,” he said, “what’s the plan beyond ‘dress like regret and wander in’?” Molly checked her watch. “We enter with a tray of cupcakes.” “We have cupcakes?” “We have a box labeled cupcakes.” Bill frowned. “What’s actually in the box?” Molly hesitated just long enough. “Bill.” “It’s pastries.” Bill closed his eyes. “Molly.” “They’re savory pastries.” “The kind with meat?” “Allegedly.” He opened his eyes again and sighed. “I hate that I trust you.” “You shouldn’t. It’s a terrible habit.” She slid a pair of cheap lanyards from her coat pocket and handed him one. Printed on the plastic badge were the words EVENT SUPPORT. She had written them herself with a black marker so the E looked like a chair and the P looked like it had fallen down a flight of stairs. Bill examined it. “This is either genius or evidence.” “It’s bold.” “It’s misspelled.” “That adds confidence.” A car rolled past outside, full of balloons tied to the roof, and Molly watched it go with the calm concentration of a cat eyeing a cream jug. “We get in through the side gate. You distract the parents with the pastry tray. I find the gifts. We load them into the laundry hampers.” Bill blinked. “Laundry hampers.” “Professional, lightweight, morally versatile.” “And if anyone asks?” “We say we’re doing a surprise relay.” Bill stared at her. “A relay of what?” Molly smiled. “Joy.” He laughed despite himself, then shook his head. “You are the worst kind of criminal.” “I know.” “You’re the kind who says ‘joy’ with a straight face.” “I do my best.” The rain eased. In the house across the hedge, a burst of children’s singing rose and fell like an argument between sparrows. Molly picked up the pastry box, and Bill straightened his ridiculous badge, suddenly looking far more serious than the foam nose deserved. “Well?” she said. Bill opened the van door and took a breath of wet grass and anticipation. “Let’s go rob the party.” Molly grinned. “That’s the spirit.” They stepped into the garden, carrying their doomed little tray of pastries toward the blue front door, where laughter, music, and the glittering promise of unattended presents waited for them like a trap set by someone who believed in fairy tales.
created_at
"2026-03-18T11:09:14.668Z"
3/18/2026, 11:09:14 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1256
moderation_latency
(null)
generation_time
4929
tokens_prompt
77
tokens_completion
1207
native_tokens_prompt
66
native_tokens_completion
1016
native_tokens_completion_images
0
native_tokens_reasoning
100
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0046215
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_04b68d698cd0cb640169ba87dab8588196a36f380c0a3a52f2"
is_byok
false
latency
197
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773832154-LvmxPr70plNfh8Xq4NvO"
upstream_id
"resp_04b68d698cd0cb640169ba87dab8588196a36f380c0a3a52f2"
total_cost
0.0046215
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
693
0.00%
Dialogue to Total Word Ratio
Ratio: 44.09%, Deviation: 14.09%
neededClean
false
wordsTotal
694
wordsDialogue
306
0.0024%