NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ08C8Y692TPWHD365DZ8WYY
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 8s
Cost:
$0.001693
Provider:
Io Net
Evaluation summary
Score
Evaluator
Details Link
95.99%
Matches word count
(details)
0.17%
Dialogue to Total Word Ratio
(details)
48.0778%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van idled in the cul‑de‑sac like a nervous cat, rain drumming on the cracked windshield. Inside, Molly adjusted night‑vision goggles while balancing a sack of glittering confetti. Bill fidgeted with a rubber chicken, eyeing the brightly colored balloons tied to the rear window. The children’s party they were about to crash was a two‑story suburban house with a banner reading “Superhero Spectacular!” and a table stacked with gift‑wrapped boxes. Both wore matching black jumpsuits, Molly’s with a tiny embroidered cape—because even thieves liked drama. "We’ve got the perfect plan—sneak in, nab the presents, vanish before the cake gets frosting." "Speaking of frosting, did you bring the sugar‑coated distraction?" "Of course. I stole a whole tub of whipped cream from a bakery downtown. We’ll spray it on the hallway floor." "Sticky floors are great for slipping up the adults. But what about the kids?" "Kids love whipped cream. They’ll think it’s a surprise party effect." "And the balloons?" "I rigged the red ones with tiny speakers. When we pop them, they’ll blare the ‘Mission Impossible’ theme." "Brilliant. The parents will be too busy dancing to the music to notice the missing toys." "Exactly. Plus, I disabled the house alarm using my grandma’s old fax machine hack." "Your grandma still has a fax machine?" "She uses it as a fancy paperweight. It’s perfect for blocking signal." "Alright, what’s the entry point?" "The doggy door. The golden retriever is asleep; we’ll slip through with the quietest of sneaks." "And the gifts?" "We’ll dump them into this sack and later swap them for…" "…for a thousand donuts? You know I can’t resist a donut heist." "Donuts are the side gig. The real loot is the limited‑edition LEGO set on the top shelf." "The one with the glow‑in‑the‑dark dragons? Got it. How do we get past the laser security?" "I’ve painted the laser beams with glow‑in‑the‑dark paint. They’ll look like a rave." "You’re a genius. Any backup plan if the kids notice?" "We’ll tell them we’re part of the superhero show. I have the cape for proof." "Do we have a superhero name?" "Captain Confetti. And you’re…" "The Great Glitter." "Together, we’re unstoppable." "Do we need masks?" "I already stole a mask from a mime. It’s invisible." "Invisible? That’s…perfectly absurd." "Absurdity is our specialty. Also, I packed a decoy—" "—a box of crayon‑colored rocks?" "Exactly. When the cops arrive, they’ll think we’re arts‑and‑crafts enthusiasts." "And the kids will think we left a surprise art installation." "Now, let’s rehearse the entrance." "One, two, three…" "Shhh! The dog is waking up!" "Abort! Abort! The dog is barking!" "Quick, throw the whipped cream, create a slip‑n‑slide." "It’s working! The adults are slipping, the kids are laughing." "Now the sack—" "—full of presents! We did it!" "Time to vanish before the birthday song starts." "And maybe grab a donut on the way out." "Deal. Let’s make this the most spectacular heist ever." "We should leave a calling card—maybe a glitter bomb." "Perfect. The kids will think the superheroes left a gift."
created_at
"2026-04-24T17:25:09.713Z"
4/24/2026, 5:25:09 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
408
moderation_latency
(null)
generation_time
68251
tokens_prompt
77
tokens_completion
4628
native_tokens_prompt
125
native_tokens_completion
4781
native_tokens_completion_images
(null)
native_tokens_reasoning
3775
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.001692634
router
(null)
provider_responses
0
endpoint_id
"aa7799a4-4935-4de1-9cb3-8ebf9c1cf4e3"
id
"chatcmpl-2c046a38-4a8c-425c-a9cd-a69c52cca181"
is_byok
false
latency
408
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Io Net"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777051509-vPtRt0yl2GyOPXcWvHCL"
session_id
(null)
api_type
"completions"
id
"gen-1777051509-dLuzjiVOrAiOhKFejJYM"
upstream_id
"chatcmpl-2c046a38-4a8c-425c-a9cd-a69c52cca181"
total_cost
0.001692634
cache_discount
0.00000576
upstream_inference_cost
0
provider_name
"Io Net"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
95.99%
Matches word count
n/a
neededClean
false
words
508
0.17%
Dialogue to Total Word Ratio
Ratio: 81.30%, Deviation: 11.30%
neededClean
false
wordsTotal
508
wordsDialogue
413
48.0778%