NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAXG89VCZPYPW9HCECWXMD
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
49.5s
Cost:
$0.033903
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly: Recon status, Bill? Bill: Clowns everywhere, frosting on the floor, and a bouncy castle that violates several treaties. Molly: Excellent. Our disguises? Bill: I brought two paper crowns and an air of misplaced confidence. Molly: Confidence is my specialty and paper is my size. Entry plan? Bill: We stroll in singing Happy Birthday, slightly off-key, then blend like poorly whisked batter. Molly: I can be the human kazoo. Present extraction? Bill: We compliment the wrapping, claim we’re the Gift Safety Commission, and relocate the loot to our “inspection facility.” Molly: Does the facility rhyme with trunk? Bill: It rhymes aggressively with trunk. Molly: Contingencies? Bill: If challenged, we deploy balloon animals as legal counsel. Molly: I’ll cross-examine with a giraffe. Bill: Also, we avoid cake; frosting prints are a forensics nightmare. Molly: Wise. I once left a buttercream alibi on a doorknob. Bill: Alarm systems? Molly: Eight-year-olds sugar-charged to DEFCON Sprinkles. Handle with compliments and stickers. Bill: I brought dinosaur stickers and counterfeit napkins. Molly: Counterfeit? Bill: They claim to be cloth but crumble under interrogation. Molly: Alright, on my signal—wink, nod, jazz hands. Bill: That’s three signals. Molly: Redundancy saves thieves. Bill: And birthdays. Molly: Ready? Bill: Absolutely.
created_at
"2026-02-16T13:41:45.004379+00:00"
2/16/2026, 1:41:45 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7916
moderation_latency
(null)
generation_time
49247
tokens_prompt
77
tokens_completion
1187
native_tokens_prompt
66
native_tokens_completion
3382
native_tokens_completion_images
0
native_tokens_reasoning
3008
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0339025
router
(null)
provider_responses
0
id
"resp_0241f2259d9160600169931e66c02881969d1d0e5715ca1fea"
status
200
is_byok
false
latency
169
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771249254-uF4jwaGeLKeN2FtB63aa"
upstream_id
"resp_0241f2259d9160600169931e66c02881969d1d0e5715ca1fea"
total_cost
0.0339025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
98.7124%
Matches word count
n/a
neededClean
false
words
206
0.0000%
Dialogue to Total Word Ratio
Ratio: 0.97%, Deviation: 89.03%
neededClean
false
wordsTotal
206
wordsDialogue
2
49.3562%