NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GAMWFC3C4SKYPP8RH04J3
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
9.0s
Cost:
$0.004425
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
43.38%
Matches word count
(details)
99.10%
Dialogue to Total Word Ratio
(details)
71.2368%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedges with the gravity of a general and the hat of a woman who had lost a bet with a scarecrow. Across the garden, balloons bobbed over the fence like cheerful witnesses, and a magician in a sequined waistcoat was sawing at a sponge rabbit for no clear reason. On the patio, a mountain of wrapped presents shimmered beneath a banner that read HAPPY SEVENTH, OLIVER, as if this were a perfectly sensible place to leave treasure unattended. Bill adjusted his black gloves, which were technically oven mitts, and peered through a gap in the leaves. “We go in smiling.” “That’s your terrifying face,” Molly hissed. Children shrieked around a rented bouncy castle that wheezed like an asthmatic dragon. A clown on stilts lurched past the gate carrying a tray of lemonade with the melancholy dignity of a doomed butler. Molly unfolded the plan, which was drawn on the back of a pizza menu and improved by crumbs. “Cake table, then presents.” Bill nodded solemnly. “In and out.” They rose together, only for Molly’s foot to sink into a flowerbed and Bill’s mitt to catch in the hedge, leaving him attached to a rosebush and swearing softly while the garden gate drifted open by itself, as if the afternoon were inviting disaster in.
created_at
"2026-03-06T12:00:32.366Z"
3/6/2026, 12:00:32 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
390
moderation_latency
(null)
generation_time
8634
tokens_prompt
77
tokens_completion
332
native_tokens_prompt
66
native_tokens_completion
284
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004425
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_010db6d4427a0c210169aac1e08f8c8196912fdf10d488312b"
is_byok
false
latency
180
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772798432-IrokSlmmswGuQLNQ3cQD"
upstream_id
"resp_010db6d4427a0c210169aac1e08f8c8196912fdf10d488312b"
total_cost
0.004425
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
43.38%
Matches word count
n/a
neededClean
false
words
217
99.10%
Dialogue to Total Word Ratio
Ratio: 6.91%, Deviation: 3.09%
neededClean
false
wordsTotal
217
wordsDialogue
15
71.2368%