NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1HA8EWB6D4CP1A4HMK6FZ3
Run Details
Model:
GPT-5.4 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
9.8s
Cost:
$0.004875
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
20.19%
Matches word count
(details)
3.56%
Dialogue to Total Word Ratio
(details)
11.8759%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the garden wall, peering over a hedge strung with balloons and a banner that sagged, threatening to spell HAPY BIRHDAY TIMMY. “Bill,” she whispered, “tell me again why we’re robbing a six-year-old.” “Because,” Bill said, adjusting a tiny paper crown as if it were tactical gear, “children possess two things in abundance: presents and weak perimeter security.” Inside the yard, a clown tripped over a sprinkler and saluted nobody. Parents laughed with the blind confidence of people who had never met professionals. Molly checked her watch. “We go in during cake. Maximum chaos, minimum vigilance.” “Classic frosting diversion,” Bill said. “I’ll blend with the dads. You mingle with the aunts. Try not to look criminal.” “I always look festive,” Molly said, straightening a cardigan covered in ducks. “What about the dog?” Bill glanced at the bulldog asleep beside the gift table, snoring like a faulty accordion. “Bribed. Half a sausage roll.” “And the children?” “Sugar-drunk and structurally unsound.” A shriek erupted as a magician produced a dove that immediately flew into the punch bowl. Molly grinned. “That’s our window.” Bill nodded, producing a large empty sack decorated with smiling dinosaurs. “Shall we commit a birthday?” “With dreadful manners,” Molly said, and together they strolled toward the gate, wearing the expressions of people prepared to clap for magic.
created_at
"2026-03-06T12:17:48.003Z"
3/6/2026, 12:17:48 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
435
moderation_latency
(null)
generation_time
9693
tokens_prompt
77
tokens_completion
398
native_tokens_prompt
66
native_tokens_completion
314
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.004875
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0ec39d5ae00f8fee0169aac5ec102081968bb468e25c68bbb3"
is_byok
false
latency
250
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772799468-T8Qa10DXemmXWRDknHFO"
upstream_id
"resp_0ec39d5ae00f8fee0169aac5ec102081968bb468e25c68bbb3"
total_cost
0.004875
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
20.19%
Matches word count
n/a
neededClean
false
words
220
3.56%
Dialogue to Total Word Ratio
Ratio: 36.49%, Deviation: 13.51%
neededClean
false
wordsTotal
222
wordsDialogue
81
11.8759%