NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAYNK8E5NHV0TNXB628BB6
Run Details
Model:
GPT-5 Mini
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
39.2s
Cost:
$0.009021
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly and Bill crouched behind a toppled lawn flamingo, whispering like conspirators rehearsing a play. They catalogued absurd contingencies: runaway clowns, sugar-fueled confessions, a chorus of toddlers demanding return of stuffed unicorns. Molly practiced innocent smiles; Bill perfected a panicked face for when a parent accused them of stealing cookies instead of gifts. They imagined triumphant escapes involving ridiculous costumes and improbable distractions. Both stalled at the moral weight of glitter and spoiled cake, then shrugged, choosing mischief over malice. Outside, party music swelled and the living room glowed with ribbons and crinkled paper. Neither truly trusted their own restraint. "Molly: In cake chaos we slip, pocket presents, exit left." "Bill: What about tiny ninjas who might notice missing balloons?" "Molly: Distract with piƱata shaped like suspiciously loud goose now." "Bill: I will wear a tutu and offer balloon dancing." "Molly: How explain a van full of glitter and stickers?" "Bill: Claim to be surprise entertainers, instantly beloved, then exit." "Molly: What about the cake? We cannot eat evidence, sadly." "Bill: We'll stage a frenzy with confetti, blame the dog." "Molly: Dress tiny suspiciously, return gifts labeled Found At Party." "Bill: We'll split proceeds for pizza, therapy, fair and final."
created_at
"2026-02-16T13:42:12.744919+00:00"
2/16/2026, 1:42:12 PM
model
"openai/gpt-5-mini-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4771
moderation_latency
(null)
generation_time
39008
tokens_prompt
77
tokens_completion
1107
native_tokens_prompt
66
native_tokens_completion
4502
native_tokens_completion_images
0
native_tokens_reasoning
4160
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0090205
router
(null)
provider_responses
0
id
"resp_0238c38f9653914f0169931e8cf9f08194bea7a8123209d368"
status
200
is_byok
false
latency
129
endpoint_id
"c4f66d01-20b0-4c27-a225-438ea22fda43"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-mini-2025-08-07"
api_type
"completions"
id
"gen-1771249292-YeNqy9KdbGfWFODUlqQw"
upstream_id
"resp_0238c38f9653914f0169931e8cf9f08194bea7a8123209d368"
total_cost
0.0090205
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9990%
Matches word count
n/a
neededClean
false
words
201
100.0000%
Dialogue to Total Word Ratio
Ratio: 49.75%, Deviation: 0.25%
neededClean
false
wordsTotal
201
wordsDialogue
100
99.9995%