NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1J43CPEK4BACWRH5TT4WV3
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
31.0s
Cost:
$0.015420
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.02%
Dialogue to Total Word Ratio
(details)
0.0103%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
“Tell me again,” Molly whispered, flattening the crinkled neighborhood map on the café table, “why we are robbing a seven-year-old.” Bill adjusted his cap with offended dignity. “We are not robbing a seven-year-old. We are intercepting luxury goods before they enter the sticky-finger economy.” “Luxury goods,” Molly repeated. “You mean six plastic dinosaurs, a foam rocket, and whatever nightmare noise machine Aunt Cheryl bought because she hates the parents.” Bill tapped a red circle on the map. “You’re forgetting volume. Twenty children. Two presents each, minimum. That’s forty units. Birthdays are supply drops with balloons.” Molly squinted. “This says ‘Princess Laser Unicorn Party.’” “Exactly. High-end branding.” “It’s in a church hall.” “Low-security venue. Ample exits. Emotional chaos built in.” Molly leaned back. “Bill, we used to steal sapphires from men who wore capes indoors. We once lifted bearer bonds off a yacht while impersonating accordionists.” “And tonight,” Bill said, “we evolve.” “To what?” “To specialists.” “In disappointment?” “In opportunity. You’re sentimental because there’ll be cake.” “There will be cake,” Molly said, pointing a finger. “And I know you. This is about the cake.” Bill put a hand to his chest. “Madam, I am insulted. This operation is about precision, timing, and if possible, buttercream.” She sighed. “Fine. Walk me through this disgrace.” “Gladly. The party begins at two. Children arrive in waves, each carrying brightly wrapped treasure. Gifts go on the designated table near the bouncy castle.” “The bouncy castle is a complication.” “The bouncy castle is cover. Nothing says invisibility like a room full of screaming nobles in paper crowns.” “And our disguises?” Bill grinned. “Entertainers.” Molly stared. “No.” “Yes.” “No, Bill. Last time you made balloon animals, yours looked medically concerning.” “I’ve improved. My giraffe no longer suggests litigation.” “I’m not wearing a clown nose.” “Magician, then.” “You’re worse at magic. You say ‘abracadabra’ like you’re filing taxes.” Bill lowered his voice. “Face painter.” Molly considered it. “That… has potential.” “Of course it does. You set up by the door. I mingle near the gift table with the easy authority of a man holding a clipboard.” “You don’t have a clipboard.” “I’ll get one. A clipboard turns any fool into infrastructure.” Molly snorted. “And then?” “And then, at peak cake, when all eyes are on the child and the ceremonial fire hazard, we execute the transfer.” “The transfer.” “We move the presents into black contractor bags.” “Subtle.” “Contractor bags say professional. Also, no one questions a person carrying garbage during a party. They assume juice has happened.” Molly pinched the bridge of her nose. “How do we explain ourselves if caught?” Bill answered instantly. “I say, ‘The parents asked us to load gifts into the car before the piñata melee.’” “And if they ask which parents?” “The tired-looking ones.” “There are always tired-looking ones.” “Exactly. This plan is built on demographics.” Molly looked at the map again. “What about the host?” “Mother named Dana. Efficient. Overcommitted. Wearing the smile of a woman one glitter spill from exile.” “And the father?” “Trying to assemble something with an Allen key while pretending that’s his whole contribution.” “That is detailed surveillance.” Bill shrugged. “I sat in a bush for forty minutes.” “For this.” “For art.” Molly laughed despite herself. “Suppose we get the haul. Then what? Fence a sack of toy ponies and educational puzzles to the underworld?” “Seasonally desperate grandparents. Online marketplaces. Boutique resale.” “You’ve thought this through.” “I have categories. Premium, battery-operated, suspiciously expensive wooden.” She drummed her fingers. “Any chance at all this is the stupidest thing we’ve ever done?” Bill smiled. “Molly, if we had any standards left, we wouldn’t be discussing theft over tea and a laminated flyer for ‘Bobo the Science Wizard.’” She took the flyer, read it, and sighed. “We’re not stealing from Bobo.” “Of course not. Professional courtesy.” “Fine,” she said, folding the map. “Face paint, clipboards, contractor bags. In and out before the piñata.” Bill raised his teacup. “To elegance.” Molly clinked his cup with hers. “To cake.” “And crime.” “And if one child cries,” Molly added, “you’re giving back at least one unicorn.” Bill considered that. “A small unicorn.” “The nicest one.” He groaned. “This is why partnerships fail.”
created_at
"2026-03-06T12:31:54.784Z"
3/6/2026, 12:31:54 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
434
moderation_latency
(null)
generation_time
30897
tokens_prompt
77
tokens_completion
1378
native_tokens_prompt
66
native_tokens_completion
1017
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01542
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_09afc48ed6fa40e40169aac93aded08193bfe1dd3e96cc57de"
is_byok
false
latency
170
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800314-CH6oThKR07iGGQEumAJI"
upstream_id
"resp_09afc48ed6fa40e40169aac93aded08193bfe1dd3e96cc57de"
total_cost
0.01542
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
698
0.02%
Dialogue to Total Word Ratio
Ratio: 83.03%, Deviation: 13.03%
neededClean
false
wordsTotal
707
wordsDialogue
587
0.0103%