NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08QK24PR2MHWVC793BGE62
Run Details
Model:
GPT-5.4 Mini (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
6.3s
Cost:
$0.004518
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened the wrinkled party invitation against the steering wheel and glared at the picture of a clown holding a helium balloon. “I hate clown theme parties,” she said. “They make every crime look like an accident.” Bill adjusted the tiny wireless earpiece in his ear, though nothing was connected to it except his sense of style and a great deal of overconfidence. Across the street, the suburban house glowed with paper lanterns and the muffled shrieks of children. A banner over the driveway read HAPPY BIRTHDAY, CAPTAIN WOBBLETOES. “You worry too much,” he said. “It’s a children’s party. No one expects sophistication.” “That’s your whole life philosophy, isn’t it?” “Worked on the bank in Dover.” Molly snorted and leaned forward to study the house again. “The presents are in the sunroom, according to our source. Three tables, one giant pile, and a suspiciously large teddy bear guarding the door.” “That teddy bear is not suspicious,” Bill said. “It’s stuffed.” “It has the eyes of a man who’s seen things.” “He’s plastic.” “Exactly.” Bill flicked open a notebook filled with neat columns and diagrams that looked as though a bored accountant had briefly turned criminal. “Right. We go in as entertainers. You’re the balloon artist.” “I can’t make balloon animals.” “You can make one balloon animal.” “It was a swan and it popped before it had legs.” “A swan is enough. I’m the magician.” “You’re not a magician.” “I own a cape.” Molly stared at him. “That cape is a bathrobe.” “It’s dramatic.” “It’s damp.” He ignored that. “We mingle, charm the parents, distract the children, and while the adults are busy pretending they enjoy sugar coma diplomacy, I roll the gifts out through the side gate.” “Roll them on what?” Bill smiled. “A trolley.” “Where did you get a trolley?” He frowned. “Borrowed it.” “From where?” “There’s a supermarket nearby.” “So you stole a trolley from a supermarket to steal presents from a party.” “Technically, I’m recycling.” Molly pressed a hand to her forehead. “Bill, I need you to say the words ‘I have thought this through’ without lying.” He looked hurt. “Of course I’ve thought it through. There are seven steps.” “Seven?” “Yes. Step one: enter confidently. Step two: locate the gifts. Step three: create a diversion. Step four: load trolley. Step five: exit.” “That’s five.” “Steps six and seven are very private.” She turned to him slowly. “Tell me those aren’t the words ‘hope for the best’ and ‘run.’” “Not precisely.” “What are they?” Bill grinned. “Step six: athleticism. Step seven: character development.” Molly laughed despite herself, then pointed at the house. “Look, the parents are opening the front door for guests. That gives us ten minutes before cake, then twenty before everyone starts looking for their shoes. We need to be invisible.” “I can do invisible,” Bill said. “I once passed a whole afternoon in a hedge.” “You were hiding from the police.” “The hedge was excellent company.” She pulled a black knit cap from her pocket and handed it to him. “Put this on. You look less like a failed pirate when you cover the bathrobe.” He tugged it over his hair. “How do I look?” “Like a failed burglar.” “Better.” They got out of the car and crossed the street with the solemnity of undertakers arriving late. Inside, the party erupted in a chorus of singing and squealing. Molly paused at the side gate and listened. “Do you hear that?” she whispered. Bill nodded. “Children. Joy. The faint sound of money on shelves.” “That’s not money.” “It is in spirit.” She lifted the latch. “Remember: no heroics.” “I never do heroics.” “You once slid down a bannister holding a chandelier.” “That was logistics.” Molly opened the gate just enough for them to slip through. The sunroom windows flashed with ribbon and glossy paper. The teddy bear by the door seemed to watch them in grim plastic silence. Bill whispered, “See? No one suspects anything.” At that moment a small voice from inside shouted, “Mum! A balloon man and a wizard have arrived!” Molly closed her eyes. “Bill.” He straightened his cape. “On the bright side, we’re in costume.”
created_at
"2026-03-18T10:44:15.056Z"
3/18/2026, 10:44:15 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
2627
moderation_latency
(null)
generation_time
6144
tokens_prompt
77
tokens_completion
1348
native_tokens_prompt
66
native_tokens_completion
993
native_tokens_completion_images
0
native_tokens_reasoning
43
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004518
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_021395ce5f5a2c730169ba81ff1f508193a8cab8dccc2b301a"
is_byok
false
latency
184
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830655-EnSnMAXoivUF2G8PRXu0"
upstream_id
"resp_021395ce5f5a2c730169ba81ff1f508193a8cab8dccc2b301a"
total_cost
0.004518
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
694
0.00%
Dialogue to Total Word Ratio
Ratio: 61.38%, Deviation: 31.38%
neededClean
false
wordsTotal
694
wordsDialogue
426
0.0000%