NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1J5W5KYHBWHJSCAF0FFBFX
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
23.7s
Cost:
$0.012330
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0002%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge with a notebook balanced on one knee, staring at the church hall across the lane. Balloons bobbed over the gate like brightly colored warning beacons, and a cardboard dragon guarded the entrance with a grin so cheerful it felt accusatory. Inside, music thumped, children shrieked, and somewhere a bubble machine wheezed like an asthmatic foghorn. “Tell me again,” Bill whispered, peering through the leaves, “why we’re robbing a birthday party instead of, say, a jeweler with poor lighting?” “Because,” Molly said, “jewelers have alarms, shutters, and insurance investigators. Children’s parties have sponge cake, inattentive parents, and gifts stacked in one convenient pile.” Bill adjusted the tiny paper cone hat Molly had insisted was a disguise. “I feel this hat makes me more memorable to witnesses.” “It makes you festive. Festive people are invisible.” “That is not true. Clowns are festive. People remember clowns forever, often in court.” Molly clicked her pencil. “Focus. Target is the present table, right side of the hall, under the banner that says HAPPY EIGHTH BIRTHDAY, OLIVER! Exclamation point included, which I find vulgar.” “You object to punctuation now?” “I object to enthusiasm. It leads to mistakes.” Bill squinted toward the windows. “That’s a lot of parents.” “Not parents. Chaperones. Parents in the wild are useless after the first tray of prosecco.” “And the children?” “Sticky, loud, and structurally disorganized. We use that.” Bill nodded uncertainly. “I was hoping for a plan with fewer minors and more darkness.” Molly tore a page from the notebook and showed him a sketch. “We go in as entertainers.” Bill stared at it. “That’s not a plan. That’s a doodle of me next to a balloon dog that looks like a tax problem.” “It’s conceptual.” “I can’t do balloon animals.” “You don’t need to. You carry the pump and look disappointed in yourself. That’s authentic.” “And you?” “I’ll be the magician.” Bill blinked. “You can do magic?” “I can make unattended property disappear.” “That’s theft, not magic.” “Depends how good I am.” A child’s scream rose from the hall, followed by thunderous applause. Bill flinched. “What was that?” “Probably a game involving sugar and emotional damage.” He looked back at the gate. “Suppose someone asks which company we’re from?” Molly smiled. “Larks & Sparkles Party Solutions.” “That sounds fake.” “All company names sound fake. That’s capitalism.” “And if they ask for identification?” She produced a laminated badge from her coat. It featured a rainbow, a rabbit in a waistcoat, and the name Professor Twinkle. Bill stared at it in silence. “You made that.” “On a very good printer.” “You gave yourself a doctorate in glitter.” “Earned it.” Bill rubbed his face. “I hate this already.” “Excellent. Caution is the handrail of crime.” From inside came the muffled chant of children counting badly. Bill whispered, “What’s our exit?” “Kitchen doors, across the back car park, into the van.” “The van says Premium Drainage Services.” “No one questions a drainage van. People avert their eyes on principle.” Bill considered that. “Fair.” Molly shut the notebook. “Timing is everything. During cake, adults crowd forward with cameras. During candles, visibility drops. During the birthday song, nobody can hear conscience.” Bill took a breath. “And if we’re caught?” She straightened his paper hat. “Then smile warmly and say we were hired by Grandma.” “Which grandma?” “Exactly,” Molly said, rising from the hedge. “Chaos is the skeleton key. Now pick up the balloon pump and look underqualified.”
created_at
"2026-03-06T12:32:52.923Z"
3/6/2026, 12:32:52 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
424
moderation_latency
(null)
generation_time
23577
tokens_prompt
77
tokens_completion
1073
native_tokens_prompt
66
native_tokens_completion
811
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01233
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_06516e646499e4fb0169aac974f9f081958a2f0b433066957b"
is_byok
false
latency
142
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800372-NdEz3Cx0n2njkea2TA8Z"
upstream_id
"resp_06516e646499e4fb0169aac974f9f081958a2f0b433066957b"
total_cost
0.01233
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
576
0.00%
Dialogue to Total Word Ratio
Ratio: 65.63%, Deviation: 15.63%
neededClean
false
wordsTotal
576
wordsDialogue
378
0.0002%