NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAVHFA329FQM7T73C569GV
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
37.8s
Cost:
$0.025823
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Molly, is this the first time we’ve targeted a bouncy-castle stronghold?" "Bill, we once liberated cupcakes from a magician’s hat; this is merely adjacent." "Scope the perimeter: balloons, juice boxes, and a roaming dinosaur with frosting on its claws." "That’s the cake, Molly; the dinosaur is a cake." "Good; cakes never file reports." "Mission: borrow every present forever; cover story: we’re the Gift Quality Inspectors." "Uniforms?" "High-visibility smiles, clipboards, and an air of bureaucratic whimsy." "Do we salute the piñata or negotiate?" "Negotiate; I speak fluent papier-mâché." "Extraction plan?" "Wagon parade; we form a solemn procession of responsibility, wheeling gifts to the Imaginary Testing Facility." "And if someone challenges us?" "I deploy the Certificate of Unreadable Authority and you nod like a wise squirrel." "What about the birthday kid?" "We present a heroic ribbon for Outstanding Generosity Toward Traveling Inspectors; applause will lubricate logistics." "You brought the decoy gift?" "Yes, a box of existential advice and bubble wrap; it hums with plausible importance." "Final concern: frosting footprints." "We walk backwards; it confuses crumbs and historians." "Bill, this is either genius or a parade of impractical nouns." "Molly, hold your clipboard like destiny and remember: presents fear confidence." "Into festive legend."
created_at
"2026-02-16T13:40:28.791531+00:00"
2/16/2026, 1:40:28 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
10672
moderation_latency
(null)
generation_time
37566
tokens_prompt
77
tokens_completion
1082
native_tokens_prompt
66
native_tokens_completion
2574
native_tokens_completion_images
0
native_tokens_reasoning
2240
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0258225
router
(null)
provider_responses
0
id
"resp_05b1b7aac1d5a80d0169931e26748c8193b5abb42406e699eb"
status
200
is_byok
false
latency
155
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771249190-4ExOZurAMHRyfB2xgnfu"
upstream_id
"resp_05b1b7aac1d5a80d0169931e26748c8193b5abb42406e699eb"
total_cost
0.0258225
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9190%
Matches word count
n/a
neededClean
false
words
203
36.7879%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
203
wordsDialogue
203
68.3535%