NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAKKWEDCM255M3P52JA4HC
Run Details
Model:
GPT-5 Mini
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
36.0s
Cost:
$0.006294
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly: "It's simple, Bill. We go in, blend, bag the loot, go." Bill: "That sounds like a cartoon and also like my résumé." Molly: "We dress as entertainers. Balloons, bad jokes, sticky fingers." Bill: "Entertainers? You ever met a three-year-old? They're tiny lawyers." Molly: "We charm them. Offer magic. Reveal present-summoning trick." Bill: "Present-summoning? That's not a thing." Molly: "We will invent it. You pull scarves, I pull guilt." Bill: "Guilt is heavy. Do we have a bag strong enough?" Molly: "We have a tote, sequins, and a plan that probably fails." Bill: "Probably fails? That's our brand. Improbable, slightly damp success." Molly: "Also, we leave a tiny toy as apology." Bill: "Who apologizes after stealing? You getting soft?" Molly: "Soft is cunning. Soft distracts, then pockets replace." Bill: "Pockets replace? Are you making verbs?" Molly: "Language evolves. Also cake is slippery; eat some first?" Bill: "Eat cake to case the joint. That's either genius or sugar." Molly: "Both. After sugar we sing, then we gracefully disappear." Bill: "Disappearing gracefully requires practice. I trip on confetti." Molly: "Then improvise, Bill." A papier-mâché unicorn watched them from the punch table as they practiced smiles and negotiated moral elasticity with glee quietly.
created_at
"2026-02-16T13:36:07.461619+00:00"
2/16/2026, 1:36:07 PM
model
"openai/gpt-5-mini-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7248
moderation_latency
(null)
generation_time
35680
tokens_prompt
77
tokens_completion
1108
native_tokens_prompt
66
native_tokens_completion
3139
native_tokens_completion_images
0
native_tokens_reasoning
2752
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0062945
router
(null)
provider_responses
0
id
"resp_0eae3321794c57cb0169931d22d0f08190a505e54bffa9c212"
status
200
is_byok
false
latency
157
endpoint_id
"c4f66d01-20b0-4c27-a225-438ea22fda43"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-mini-2025-08-07"
api_type
"completions"
id
"gen-1771248930-30B9dtTsodpQ6c449rPM"
upstream_id
"resp_0eae3321794c57cb0169931d22d0f08190a505e54bffa9c212"
total_cost
0.0062945
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.3769%
Matches word count
n/a
neededClean
false
words
205
44.1006%
Dialogue to Total Word Ratio
Ratio: 80.49%, Deviation: 9.51%
neededClean
false
wordsTotal
205
wordsDialogue
165
71.7388%