NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM082QT7PC6BXXQZHAYY1SNX
Run Details
Model:
GPT-5.4 Nano (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
27.2s
Cost:
$0.004087
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
99.38%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
99.6882%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Moonlight skimmed the hedges while Molly and Bill squirmed behind a collapsing inflatable dinosaur. Molly adjusted her headlamp until it glowed like a polite firefly. Bill held a clipboard made of cardboard, because he believed confidence was a key. “Are you sure this is the right house?” Molly asked. “Relax, I scoped the driveway; the car seats match the mailbox decals,” Bill replied. They crept toward the doorway, stepping over a trail of glitter that sparkled like evidence. “What’s the plan if the kids notice?” Molly whispered. “We blame the magician,” Bill said, and flipped open a rubber rabbit that squeaked “surprise.” Molly snorted. “Too loud. Borrow your inside voice.” Bill pointed at the present pile. “I’ll grab the big ones—coincidentally the ones with the loudest tape. You take the smaller loot.” Molly nodded, then lifted her cupcake-napkin map. “Then we exit through the snack table,” she said. Bill paused, listening to distant laughter. “Do you think they’ll offer us juice boxes?” Molly blinked. “Only if we confess to being very tiny thieves.” Doorbell chirped nearby, mercifully. “If we get caught, we’ll say we’re just evaluating gift wrap for safety,” Molly said. “Perfect—then I’ll negotiate with the clown’s rabbit and request batteries,” Bill replied.
created_at
"2026-03-18T10:32:51.793Z"
3/18/2026, 10:32:51 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
26584
moderation_latency
(null)
generation_time
26942
tokens_prompt
77
tokens_completion
1171
native_tokens_prompt
66
native_tokens_completion
3259
native_tokens_completion_images
0
native_tokens_reasoning
2963
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00408695
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_0626d731227f32240169ba7f53dd20819492ac2ab83d3f6a5d"
is_byok
false
latency
192
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773829971-Z3iZxMOvqfY9JjMTIoon"
upstream_id
"resp_0626d731227f32240169ba7f53dd20819492ac2ab83d3f6a5d"
total_cost
0.00408695
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.38%
Matches word count
n/a
neededClean
false
words
205
100.00%
Dialogue to Total Word Ratio
Ratio: 49.51%, Deviation: 0.49%
neededClean
false
wordsTotal
206
wordsDialogue
102
99.6882%