NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-18-01KM08QN2ECBZ72EVFE3J26E3E
Run Details
Model:
GPT-5.4 Mini (Reasoning)
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
31.0s
Cost:
$0.023612
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
23.13%
Dialogue to Total Word Ratio
(details)
61.5643%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the hedge, watching the house glow with balloons, streamers, and the kind of optimism that frightened professionals. "That's the target," Molly whispered. "Lots of soft loot tonight." "Maybe," Bill said, "we should also steal some cake too." Bill tugged his tie, striped like a warning sign, and stared at the window where children shrieked beside wrapping paper. "We're monsters," he said. "Excellent disguise for party work, apparently." "Keep voice down," Molly said. "Professionals never sound cheerful, ever." Molly pulled out a map drawn on a napkin and a piece of licorice, her preferred tools for excellence today. "You still think the clown cover is necessary?" Bill asked. "Absolutely," said Molly. "Everyone trusts a clown with organizational skills." They slipped toward the side gate as a disco version of Happy Birthday thumped through the walls like nervous thunder. "I don't trust clowns," Bill muttered. "That's why you'll survive." "If anyone asks, we're entertainers," Molly said. "Terrible ones, clearly." Inside, the party erupted in confetti and applause for a magician who looked tragically aware of his own limitations already. "See?" Molly breathed. "They're distracted by actual incompetence, beautifully, finally." "Did we rob a party?" Bill asked. "Delightfully," Molly said.
created_at
"2026-03-18T10:44:17.116Z"
3/18/2026, 10:44:17 AM
model
"openai/gpt-5.4-mini-20260317"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
30260
moderation_latency
(null)
generation_time
30766
tokens_prompt
77
tokens_completion
1297
native_tokens_prompt
66
native_tokens_completion
5236
native_tokens_completion_images
0
native_tokens_reasoning
4946
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0236115
router
(null)
provider_responses
0
endpoint_id
"9ee065b2-3d1c-43bc-bdd7-28af3b148282"
id
"resp_00d698c2581924240169ba82012f388195a0565db196860f0f"
is_byok
false
latency
245
model_permaslug
"openai/gpt-5.4-mini-20260317"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1773830657-y23jlmE2pJCgUSKjsixm"
upstream_id
"resp_00d698c2581924240169ba82012f388195a0565db196860f0f"
total_cost
0.0236115
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
23.13%
Dialogue to Total Word Ratio
Ratio: 39.00%, Deviation: 11.00%
neededClean
false
wordsTotal
200
wordsDialogue
78
61.5643%