NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-09-01KKABZ7F0T20609SMS1F1HB45
Run Details
Model:
GPT-5.4 (Reasoning, Low)
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
37.7s
Cost:
$0.042870
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
100.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the garden shed, dressed in black except for the unicorn socks she insisted were lucky. Bill, beside her, wore a burglar’s cap and a birthday hat stolen from the hedge. Beyond the fence, the children’s party raged with chaos: balloons bobbed like surveillance drones, a clown made balloon swords, and a magician was losing an argument with a rabbit. “In, grab gifts, avoid cake, and don’t make eye contact,” Molly whispered. “Fine, but if the clown sees me, I’m taking hostages,” Bill murmured. They studied the target: a trestle table groaning under wrapped boxes, gift bags, and one large stuffed giraffe. The parents were distracted by cake logistics, diplomacy, and a dispute over who had licked the inflatable castle. Bill produced a crayon drawing of the yard. His arrows were wild; his labels included SNACK ZONE, DOG??, and POSSIBLE NANA. Molly nodded like she was briefing commandos instead of a man who had been defeated by a revolving door. Then the dog appeared, tiny and furious, wearing a bow tie. Bill blanched. Molly grinned. Every crime, she thought, began with underestimating the security. Unfortunately, security was now barking at his ankles and summoning every witness nearby loudly.
created_at
"2026-03-09T22:37:33.547Z"
3/9/2026, 10:37:33 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
35341
moderation_latency
(null)
generation_time
37563
tokens_prompt
77
tokens_completion
452
native_tokens_prompt
66
native_tokens_completion
2847
native_tokens_completion_images
0
native_tokens_reasoning
2570
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.04287
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_06f8887b0465051d0169af4bad9de48195a4217c9f3fe59b5d"
is_byok
false
latency
182
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773095853-s4E43oe3ToeAasptcFDP"
upstream_id
"resp_06f8887b0465051d0169af4bad9de48195a4217c9f3fe59b5d"
total_cost
0.04287
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
100.00%
Dialogue to Total Word Ratio
Ratio: 10.00%, Deviation: 0.00%
neededClean
false
wordsTotal
200
wordsDialogue
20
100.0000%