NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09Q49GJ0W5G1QX2E14AAAE
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
6.4s
Cost:
$0.004788
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0005%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crumpled map across the dashboard of the van, where it immediately tried to curl itself back into a ball like it had no interest in being involved in a felony. The map showed a cheerful house with balloons tied to the fence and, in handwriting so bubbly it looked legally enforced, the words HAPPY BIRTHDAY, NELLIE! as if the whole neighborhood had conspired to become smug. Bill leaned over her shoulder and squinted. “That’s a lot of pink.” “It’s a children’s party,” Molly said. “Apparently they’ve embraced a color palette.” Bill tapped the front door with one ink-stained finger. “We are still doing this, right? We go in, we take the presents, and we leave before anyone notices?” Molly glanced at him. “Bill, darling, if anyone notices, then our entire training, talent, and emotional commitment to thievery has been a waste.” “You make it sound so noble.” “It is noble. We are liberating gifts from an overindulgent social event.” Bill snorted. “You say that like we’re Robin Hood and not two middle-aged disasters in a rented van.” Molly opened the glove compartment and withdrew two walkie-talkies, a length of ribbon, and a plastic clown nose. “Disasters can be polished.” Bill stared at the clown nose. “Why is that here?” “Disguise?” “It’s a red nose.” “Yes,” said Molly. “A classic marker of anonymity.” “Only if your anonymity is ‘extremely alarming clown.’” Outside, the house pulsed with the shrieks of children and the low, cheerful chaos of adults pretending to enjoy themselves. A banner over the doorway flapped in the wind. Somewhere in the backyard, a magician was probably failing at rabbits. Molly checked her watch. “The entertainer does balloon animals until three-thirty. Cake at three-forty. Presents after that. We have a very small window before the birthday child starts opening things and making it emotionally impossible.” Bill nodded slowly. “So we distract the grown-ups, slip in through the side gate, and grab the loot.” “Not loot,” Molly said. “Presents.” “Right. The sentimental loot.” Molly reached into the back and hauled out two oversized supermarket bags. “We’ll look like parents collecting party favors.” Bill lifted one. It made a loud plastic rustle, like a duck being stepped on. “These are not stealth bags.” “They’re eco-friendly,” Molly said. “The world loves that.” “The world loves not being robbed by idiots in a van.” Molly smiled with the grim patience of a woman who had outlasted three husbands, four safecrackers, and a German shepherd named Trevor. “Bill, if we wanted subtle, we would have become accountants.” He gave her a look. “You say that every time.” “Because every time, it remains true.” They watched as a woman in a glittery apron carried a tray of cupcakes toward the patio. Two toddlers thundered past, one trailing a paper crown, the other screaming with the authority of a tiny emperor. The front door opened and closed and opened again. Bill swallowed. “I hate children’s parties.” “You hate any gathering where someone might ask you to sing.” “I hate being perceived.” “That’s why you’re excellent at crime.” He looked at the house, at the balloons, at the gift table visible through the front window, stacked high with boxes wrapped in dinosaurs and rainbows and one suspiciously shiny parcel shaped like a guitar. “What if there’s a video camera?” “There will be,” Molly said. “Modern parenting has made every childhood into evidence.” Bill rubbed his hands together. “Fine. We do it quick.” Molly folded the map and tucked it into her coat. “Quick, quiet, and no improvisation.” Bill’s mouth twitched. “You know that’s the part that always ruins us.” She opened her door and stepped out, smoothing her jacket as if she were about to attend a lecture rather than relieve a six-year-old of her birthday haul. “Then try to keep up.” Bill followed, muttering, “I’m saying this now: if there’s a clown, I’m out.” Molly glanced back with a grin as bright as a switchblade. “Bill, if there’s a clown, we’re already late.”
created_at
"2026-03-18T11:01:28.508Z"
3/18/2026, 11:01:28 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
1981
moderation_latency
(null)
generation_time
6107
tokens_prompt
77
tokens_completion
1147
native_tokens_prompt
66
native_tokens_completion
1053
native_tokens_completion_images
0
native_tokens_reasoning
148
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004788
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_04067e802c8194d50169ba8608907081978e2048190c643f6e"
is_byok
false
latency
220
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831688-O4aKTps0hsnIXLBl9Fuk"
upstream_id
"resp_04067e802c8194d50169ba8608907081978e2048190c643f6e"
total_cost
0.004788
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
669
0.00%
Dialogue to Total Word Ratio
Ratio: 45.13%, Deviation: 15.13%
neededClean
false
wordsTotal
678
wordsDialogue
306
0.0005%