NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-04-16-01JRZYTRFHJ74AKPJ55AM5CDQW
Run Details
Model:
o4 Mini
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
35.5s
Cost:
$0.012859
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
50.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The thieves crouched in the dimly lit hallway outside a suburban home hosting an exuberant celebration. A birthday banner hung crooked above the entrance door. Faint laughter drifted through the walls and balloons bobbed behind a window. Molly’s heart pounded with exhilaration as she checked her lockpick. Bill adjusted the small jammer strapped to his belt and glanced at a wristwatch. Both shared a glance that spoke volumes of adrenaline and nerves. They had rehearsed this absurd plan countless times. Tonight’s payoff was a mountain of glittery paper bundles stuffed with fragile treasures destined for giggling children. Mischief awaited now. Molly: Are you really sure the coast is clear tonight? Bill: Ten minutes until the party ends, kids start crying. Molly: Those wrapped boxes are begging for our sticky fingers. Bill: The birthday clown hides behind a full sheet cake. Molly: I’ll disable the surveillance feed with this signal jammer. Bill: After that, we grab loot, vanish into the night. Molly: Don’t underestimate those frosting booby traps near the table. Bill: Sticky fingers are a hazard with kids’ cake frosting. Molly: Wristwatch timer synced; we move at eight sharp. Ready? Bill: Ready. Let’s make this the sweetest heist ever executed.
id
"gen-1744829244-j7z1sxsfq99FxadvcOwn"
upstream_id
"chatcmpl-BN256xCH4jVJUDmXJiJjOKpDYP4dh"
total_cost
0.012859
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-16T18:47:59.921811+00:00"
4/16/2025, 6:47:59 PM
model
"openai/o4-mini-2025-04-16"
app_id
182717
streamed
true
cancelled
false
latency
31841
moderation_latency
115
generation_time
35001
tokens_prompt
68
tokens_completion
2887
native_tokens_prompt
66
native_tokens_completion
2906
native_tokens_reasoning
2624
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.012859
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
0.00%
Dialogue to Total Word Ratio
Ratio: 1.50%, Deviation: 48.50%
neededClean
false
wordsTotal
200
wordsDialogue
3
50.0000%