NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1GX6AQGC1XRP48ETB3CRJF
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
8.8s
Cost:
$0.004470
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
35.00%
Matches word count
(details)
88.25%
Dialogue to Total Word Ratio
(details)
61.6263%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the community-center hedge, eyeing the inflatable castle bobbing above the fence like a cheerful surveillance balloon. The yard beyond throbbed with the lawless energy of seven-year-olds on frosting. Paper hats flashed. A magician lost control of a dove. Somewhere, a parent laughed with the doomed confidence of the poorly informed. Beside her, Bill unfolded a hand-drawn map of the party grounds, though it was really just a napkin covered in arrows, cake stains, and the word LOOT written three times. “Presents by the gazebo,” Molly whispered. “Excellent,” Bill whispered back. “Tiny citizens, tiny defenses.” A clown on stilts wobbled past the gate, and both thieves instinctively ducked lower, as if the oversized shoes possessed elite detection skills. Molly adjusted her black gloves, which made her look less like a master criminal than a burglar attending a winter funeral. The presents sat in a mountain of glossy paper near the punch bowl, guarded only by a sleepy spaniel in a bow tie. Bill studied the dog with professional unease. He feared animals in formalwear. Molly produced two juice boxes, three party hats, and a roll of stickers from her satchel. Bill blinked. Within moments they were disguised as exhausted entertainers, walking through the gate with the solemn authority of people who absolutely belonged near a piñata.
created_at
"2026-03-06T12:10:39.839Z"
3/6/2026, 12:10:39 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
385
moderation_latency
(null)
generation_time
8766
tokens_prompt
77
tokens_completion
360
native_tokens_prompt
66
native_tokens_completion
287
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.00447
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_032cb0f7db1a94310169aac43fe51c8194bf953fd805e2abe6"
is_byok
false
latency
142
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799039-08CYSo32UkWQejvv41Du"
upstream_id
"resp_032cb0f7db1a94310169aac43fe51c8194bf953fd805e2abe6"
total_cost
0.00447
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
35.00%
Matches word count
n/a
neededClean
false
words
218
88.25%
Dialogue to Total Word Ratio
Ratio: 4.05%, Deviation: 5.95%
neededClean
false
wordsTotal
222
wordsDialogue
9
61.6263%