NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KK9PZM86ZAE1STABPT22EBYA
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
32.3s
Cost:
$0.016590
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
23.28%
Dialogue to Total Word Ratio
(details)
11.6391%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly flattened the crinkled invitation on the café table with two fingers and a look of reverence usually reserved for treasure maps and discounted pastry. “Look at it, Bill. Balloons. A magician. A rented pony named Cinnamon. This isn’t a party. It’s a warehouse with frosting.” Bill adjusted his cap and squinted. “I feel it must be said, Mol, that robbing children has a public relations problem.” “We’re not robbing children,” Molly said. “We’re relieving affluent aunties of their excessive ribbon work.” “That is robbing children with prettier grammar.” She tapped the invitation. “Thirty-seven presents. Maybe thirty-eight if divorced parents are competing. We go in, we take the loot, we vanish before anyone notices little Oliver has received only emotional growth.” Bill leaned back. “I dislike jobs with ponies. Ponies have eyes like tax inspectors.” “It’s one pony.” “One pony too many. Last time an animal got involved, I spent a night in a duck pond.” “That goose was personally offended by you.” “It was a hostile bird in a waistcoat district.” Molly sipped her tea. “Focus. We need disguises.” “Clowns are out,” Bill said. “I won’t do clowning again. The shoes slow the getaway, and people remember a man weeping in greasepaint.” “Fair. We could go as entertainers.” “I play no instrument that doesn’t fold.” “Face painters?” “I can barely paint a wall.” “Then catering staff.” Bill brightened. “Aprons have authority. No one questions a tray.” “Exactly. We carry in cupcakes, carry out presents.” “With what cover story?” Molly smiled. “Dietary confusion. We arrive flustered, apologetic, and impossible to challenge. ‘Sorry, traffic on the ring road, where do you want the gluten-free pirate muffins?’” Bill nodded despite himself. “That’s good. Nobody understands dietary language. It creates panic.” “While they’re arguing over lactose, we locate the gift table.” “And then?” “And then,” Molly said, lowering her voice as if the sugar packets might testify, “we transfer the presents into catering crates.” Bill frowned. “Crates clatter.” “Then hampers.” “Hampers rustle.” “Sacks?” “That’s too seasonal. Alarming in March.” She drummed her nails. “Laundry baskets.” Bill pointed at her. “Now you’re thinking like a professional.” “Thank you.” “A terrible professional. But one with flair.” The waiter drifted by, and they both smiled the innocent smiles of people discussing weather instead of felony. When he was gone, Bill leaned in. “There is still the problem of children. They notice things. They’re basically sticky detectives.” “Children are distractible.” “Yes, but they’re also loud. If one spots me with a hamper full of parcels, I’ll have twelve of them chanting ‘thief’ like it’s a playground hymn.” “Not if we redirect them.” “With what?” “Cake.” Bill considered this with solemn respect. “Cake does have tactical value.” “Massive tactical value. At the crucial moment, the magician starts, the cake appears, every child turns toward sugar and low-grade sorcery, and we make our move.” “What about the adults?” “Half filming, half pretending the pony is safe.” Bill glanced again at the invitation. “This is wicked, Molly.” “Very.” “Petty.” “Extremely.” “Morally swampy.” “Ankle-deep.” He sighed. “I’m in.” “I knew you would be.” “I’m not proud of it.” “You never are. That’s part of your charm.” Bill folded the invitation and slipped it into his pocket. “If we’re doing this, we need names.” “Aliases?” “Caterers need names. Respectable ones.” Molly stood, buttoning her coat. “I’ll be Denise.” “Why Denise?” “Denise sounds overworked and competent.” Bill rose too. “Then I’m Graham.” “Graham sounds like he knows where napkins live.” They headed for the door, two professionals setting off toward an extremely stupid idea. Bill paused on the threshold. “Molly?” “Yes?” “If that pony looks at me funny, I’m leaving you.” She smiled. “If the pony looks at you funny, Bill, it means he’s recognized quality.”
created_at
"2026-03-09T16:30:46.545Z"
3/9/2026, 4:30:46 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4750
moderation_latency
(null)
generation_time
32204
tokens_prompt
77
tokens_completion
1229
native_tokens_prompt
66
native_tokens_completion
1095
native_tokens_completion_images
0
native_tokens_reasoning
181
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.01659
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_095bbc9ca43e6ed00169aef5b6a32c8195a754bf02f0d8f888"
is_byok
false
latency
210
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773073846-DBc9iLexlk27kQTwUzDQ"
upstream_id
"resp_095bbc9ca43e6ed00169aef5b6a32c8195a754bf02f0d8f888"
total_cost
0.01659
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
623
23.28%
Dialogue to Total Word Ratio
Ratio: 75.40%, Deviation: 5.40%
neededClean
false
wordsTotal
626
wordsDialogue
472
11.6391%