NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKABZEABQ6JQWAAX35PRPM8Z
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
24.2s
Cost:
$0.013845
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
0.0030%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the hand-drawn map across the wobbling café table, pinning one corner with a salt shaker and the other with Bill’s untouched custard tart. Outside, rain glazed the windows. Inside, she looked like a general planning a siege on a principality made of balloons. “Right,” she said, tapping the page, “this is the house. Semi-detached, blue door, inflatable dinosaur in the garden. The party starts at two. Peak chaos by half past.” Bill leaned in until his nose nearly touched the map. “I still can’t believe we’re robbing a seven-year-old.” “We are not robbing a seven-year-old,” Molly said. “We are relieving a roomful of children of excess gift inventory. It’s practically logistics.” “It feels morally lopsided.” “You stole a violin from a nun.” “A lapsed nun,” Bill said. “And she chased me with a soup ladle. Complex situation.” Molly drew a line around the back of the sketched house. “Kitchen window here. Usually left on the latch because of sausage-roll ventilation.” Bill blinked. “You’ve done surveillance on pastry airflow?” “I take my work seriously.” He looked around the café, lowered his voice, and said, “What if they’ve hired entertainment? Clowns are observant. Magicians are worse. You can’t trust a man who keeps doves in his sleeves.” “Relax. I checked. It’s a budget operation. One mother with face paints and an uncle who knows three chords on a ukulele.” “That’s somehow more dangerous. Family members fight for free.” Molly sat back as the waitress passed, then resumed in a whisper. “The presents will be stacked in the dining room under a banner saying HAPPY BIRTHDAY, JASPER. Children never guard the presents. They assume civilization will hold.” Bill folded his arms. “And our exit?” “Through the side gate, across the alley, into the florist’s van.” “You’ve acquired a florist’s van?” “I borrowed a florist’s van.” Bill stared. “Does the florist know?” “He will eventually. Ideally tomorrow.” A child at the next table shrieked with laughter. Bill flinched as if hearing artillery. “I don’t like children,” he muttered. “You don’t have to like them. You merely have to avoid being sticky.” “That’s the bit I’m worried about.” Molly pulled two items from her satchel: a fake invitation with a cartoon dragon on it, and a coil of ribbon. “Disguises.” Bill held up the invitation. “This says ‘Come celebrate Jasper turning 7!!!’ with three exclamation marks. Nobody respectable uses three exclamation marks.” “They do when glitter pens are involved.” “And the ribbon?” “You’re helping decorate. Invisible people holding ribbon are called useful.” Bill draped it over his shoulder unhappily. “What’s my cover story?” “You’re Uncle Steve from Reading.” “I’ve never been to Reading.” “Perfect. No one asks follow-up questions about Reading.” Molly smiled, quick and sharp. “Listen to the rhythm. Doorbell. Apologetic grin. Hand over invitation. Enter with confidence and mild resentment. You head for the lounge, identify present density, and signal me.” “How?” “You say, ‘Lovely balloons.’ If the coast is clear, add, ‘very vibrant.’” “And if it isn’t?” “Say, ‘Do I smell egg sandwiches?’ Then I abort.” Bill nodded solemnly. “This is the most British criminal code we’ve ever used.” “It’s elegant.” “It’s lunchtime.” Rain rattled the glass. Molly rolled up the map and slid it into her coat. Bill finally took a bite of the custard tart and looked wounded by how good it was. “One more concern,” he said through a crumb. “What if the birthday child sees us?” Molly stood, buttoning her coat. “Then smile warmly and ask where the toilet is. Nobody denies directions to a confused adult.” “And if he’s suspicious?” She tossed a coin on the table for the tart. “Then, Bill, you do what you always do.” He rose with her, gathering the ribbon like a condemned maypole dancer. “Panic?” “No,” Molly said. “Juggle.”
created_at
"2026-03-09T22:37:40.564Z"
3/9/2026, 10:37:40 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
3102
moderation_latency
(null)
generation_time
24117
tokens_prompt
77
tokens_completion
1274
native_tokens_prompt
66
native_tokens_completion
912
native_tokens_completion_images
0
native_tokens_reasoning
23
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.013845
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0041b9bd277b72f90169af4bb4a8648195b319213f24f60b3e"
is_byok
false
latency
185
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095860-FmDQmyxAmkTju5YeMc3X"
upstream_id
"resp_0041b9bd277b72f90169af4bb4a8648195b319213f24f60b3e"
total_cost
0.013845
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
630
0.01%
Dialogue to Total Word Ratio
Ratio: 63.95%, Deviation: 13.95%
neededClean
false
wordsTotal
638
wordsDialogue
408
0.0030%