NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM09H9PP5ATFD4CHSSSW05PQ
Run Details
Model:
GPT-5.4 Nano (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
19.2s
Cost:
$0.004214
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
99.9595%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the balloon bouquet outside Mrs. Denton's playhouse, studying the security camera like a jealous hamster. Bill slid a toolbox from his coat, but it immediately belched glitter. “Bill, the guest of honor is cake—no, presents. Step softly.” Bill pointed at the motion sensor blinking beneath a sticker of a smiling dinosaur. Molly brushed her finger along the sticker’s edges; the plastic flexed, begging to be lifted like a cheap Band-Aid. “Relax, I brought the silent sneakers and your best distraction.” Inside, inflatable slides hummed and children’s laughter ricocheted through the walls, pure frosting. Molly lifted her earpiece and listened for the present ritual. At exactly seven, a clown would insist that everyone open the brave new boxes. The plan was to blend into the chaos, swap gift tags with decoy ribbons, and leave with a wagon full of stuffed unicorns and board games that smelled like batteries. Bill rehearsed his smile in the window: all teeth, no guilt. Molly adjusted her disguise—a librarian hat and an expression of innocent curiosity—and nodded toward the patio door. When it clicked, the party would never know what stole its thunder; it would only feel, briefly, that everyone had gotten taller.
created_at
"2026-03-18T10:58:17.444Z"
3/18/2026, 10:58:17 AM
model
"openai/gpt-5.4-nano-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
18706
moderation_latency
(null)
generation_time
19035
tokens_prompt
77
tokens_completion
901
native_tokens_prompt
66
native_tokens_completion
3361
native_tokens_completion_images
0
native_tokens_reasoning
3097
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00421445
router
(null)
provider_responses
0
endpoint_id
"0c835f2e-c18d-4e8c-b245-e1e3bd08b97f"
id
"resp_0ebd7262a62fed640169ba8549c6a48193a4ed83f6cdb17301"
is_byok
false
latency
804
model_permaslug
"openai/gpt-5.4-nano-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773831497-cj6HpxIsv9jcCz9ro8rq"
upstream_id
"resp_0ebd7262a62fed640169ba8549c6a48193a4ed83f6cdb17301"
total_cost
0.00421445
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
203
100.00%
Dialogue to Total Word Ratio
Ratio: 10.29%, Deviation: 0.29%
neededClean
false
wordsTotal
204
wordsDialogue
21
99.9595%