NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08VXDJ0FB0VZ8R4HMPPRRN
Run Details
Model:
GPT-5.4 Mini
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
7.4s
Cost:
$0.003893
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the party invitation across the kitchen table like a general laying out a battle plan, though the table itself was wobbling under the weight of tools, disguises, and one suspiciously festive cupcake. Outside, rain tapped at the window in a rhythm that sounded, to Bill at least, faintly judgmental. “We are absolutely not dressed enough for this,” Bill said, looking down at his black turtleneck and black trousers. “I feel underprepared for crime and overprepared for a funeral.” Molly adjusted a pair of glittery novelty glasses on her nose. “That’s because you’re thinking like a burglar. We’re not breaking in. We’re attending.” Bill stared at her. “Attending the children’s party wearing a fake moustache and carrying a sack?” “A gift sack,” Molly said. “Language matters.” “It’s still a sack.” “It’s a reusable festive storage solution.” Bill peered at the invitation. “There are balloons. A clown. A bouncy castle. This is not a normal environment for professional property relocation.” Molly nodded with grave sympathy. “Exactly. No one expects thievery amid balloons. It’s the perfect cover. They’ll be distracted by sugar, noise, and a clown with emotional problems.” “That last part is oddly specific.” “Life is specific.” She reached into the sack and pulled out a laminated map of the house. It had arrows, annotations, and a tiny drawing of the family dog with the words DO NOT TRUST beneath it. Bill leaned closer. “You’ve mapped the place already?” “I had time.” “How?” “I skipped lunch.” Bill looked horrified. “Molly, that’s not preparation. That’s a cry for help.” “It’s called discipline.” “It’s called low blood sugar.” She jabbed a finger at the back door. “Here’s the plan. The parents are idiots. They’ll be in the kitchen arguing about whether the cake is enough or too much. The magician will be too busy forcing a rabbit to disappear while the children scream at him. We enter through the side gate, blend in with the delivery staff, and move straight to the gift table.” Bill scratched his chin. “And then?” “And then we take everything.” “Everything?” “All of it.” “The scooters?” “Yes.” “The board games?” “Yes.” “The gigantic teddy bear wearing a sash that says Birthday King?” “Especially that. It’s probably expensive.” Bill let out a slow breath. “I admire your ambition, but this feels morally flexible even by our standards.” Molly gave him a look. “We’re thieves.” “Yes, but I like to think we’re selective thieves.” “We are selective. We’re selecting all the presents.” He opened his mouth, closed it, and then pointed at the cupcake on the table. “Why is that there?” Molly glanced at it. “Bribe from the neighbor. She said it was a thank-you for ‘helping with the bins.’” “You mean you lifted her recycling to make the alley look like a blocked entrance.” “It was helpful.” “It was ecological sabotage.” She slid the cupcake toward him. “Eat. You’re cranky.” Bill hesitated, then took a bite. “This is really good.” “See? Crime is rewarding.” “It’s a cupcake. Not a manifesto.” At that moment, from somewhere outside, came a sudden burst of children’s singing, off-key and ferocious, followed by the sound of a trumpet failing under pressure. Bill flinched. “They’ve begun.” Molly slipped the sack over one shoulder and pulled on a pair of white gloves. “Then so have we.” He took the fake moustache from the table and held it up, dubious. “This thing looks criminal.” “Perfect.” “I still think the clown is a risk.” Molly smiled, all teeth and enthusiasm. “Bill, every great heist has a risk.” “And every terrible one has a clown.” “Exactly.” She opened the door, rain and music rushing in together, and the two thieves stepped into the night with the solemn confidence of people who had no business being invited anywhere at all.
created_at
"2026-03-18T10:46:36.733Z"
3/18/2026, 10:46:36 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
439
moderation_latency
(null)
generation_time
7174
tokens_prompt
77
tokens_completion
1133
native_tokens_prompt
66
native_tokens_completion
854
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0038925
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_06cb74942031314a0169ba828cc9b88197be837893ac9b48ee"
is_byok
false
latency
207
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830796-3qwMXZjmcAInU8vMXKHB"
upstream_id
"resp_06cb74942031314a0169ba828cc9b88197be837893ac9b48ee"
total_cost
0.0038925
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
631
0.00%
Dialogue to Total Word Ratio
Ratio: 55.92%, Deviation: 25.92%
neededClean
false
wordsTotal
633
wordsDialogue
354
0.0000%