NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ02YMDMF7W896M7KBTDHGW2
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 37s
Cost:
$0.000931
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
47.94%
Dialogue to Total Word Ratio
(details)
23.9718%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the inflatable castle, its rainbow arches swaying in the summer breeze. The backyard was a chaotic jungle of balloons, streamers, and a jittery clown who kept checking his watch like a nervous accountant. Bill, wearing a fake moustache that kept slipping, peeked from behind a palm‑tree shaped cake stand, his eyes scanning the candy table for strategic entry points. They whispered like conspirators at a PTA meeting, their plans absurdly elaborate for a stash of plastic toys and glittery stickers. Molly tapped her finger against the side of a piñata shaped like a unicorn, wondering if the kids' parents would notice a slight wobble. Bill adjusted his oversized sunglasses, despite the night being bright enough to read a children’s book without a flashlight. They were determined: tonight, the only thing louder than the giggles would be the sound of their silent escape with the mountain of presents for the ultimate loot. Molly: "Alright, Bill, the kids are about to open the cake. That's our window." Bill: "Got it. I’ll disable the clown’s squeaky horn with a well‑placed marshmallow." Molly: "Marshmallow? Are you sure that's not a snack for the guests?" Bill: "Trust me, the horn is louder than a marching band; a marshmallow will mute it nice and sticky." Molly: "Fine. What about the gift table? How do we get past the bubble‑wrap moat?" Bill: "I brought the ultimate weapon: a pair of garden shears forged from a retired toy dinosaur." Molly: "That sounds terrifyingly effective." Bill: "Exactly. I’ll snip the wrap, you scoop the presents into the sack." Molly: "What’s in the sack?" Bill: "A giant tote made from a repurposed piñata; it’s cheap, colorful, and unsuspecting." Molly: "Brilliant. Any alarms?" Bill: "Only one: the motion‑sensor that lights up the disco ball. I’ll distract it with a flash‑dance." Molly: "Do we have any dance moves?" Bill: "I practiced the sprinkler while listening to a children's lullaby on repeat." Molly: "Perfect. I’ll time the entry with the balloon pop." Bill: "The pop happens at exactly 7:42 PM when the fire‑cracker cake ignites." Molly: "Fire‑cracker cake? That’s a pyrotechnic nightmare." Bill: "It’s actually a cake filled with pop rocks, so the kids think it’s a fireworks show." Molly: "And when they scream, we’ll be in the shadows." Bill: "Shadows, yes, but we need a getaway. I’ve arranged a getaway scooter painted like a dinosaur." Molly: "A dinosaur scooter? Who would chase that?" Bill: "Nobody. Parents think it’s a party favor." Molly: "And the police?" Bill: "They’ll be too busy counting the missing cupcakes." Molly: "Speaking of cupcakes, any backup plan if the clown notices?" Bill: "I’ll swap his red nose for a squeaky toy that sounds like a fart." Molly: "That’ll definitely divert attention." Bill: "Now, about the loot. Should we prioritize the plush unicorns or the LEGO sets?" Molly: "The unicorns are soft, so they’re easy to hide. The LEGOs are heavy but valuable." Bill: "We can split the haul: you take the unicorns, I’ll lug the LEGOs." Molly: "Deal. And after we escape, we meet at the abandoned merry‑go‑round for the split." Bill: "Sounds like a plan. Ready?" Molly: "Ready as a rubber chicken at a convention now."
created_at
"2026-04-24T15:50:19.838Z"
4/24/2026, 3:50:19 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
309
moderation_latency
(null)
generation_time
96816
tokens_prompt
77
tokens_completion
4843
native_tokens_prompt
114
native_tokens_completion
4875
native_tokens_completion_images
(null)
native_tokens_reasoning
3955
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000930696
router
(null)
provider_responses
0
endpoint_id
"85835306-dba3-4334-87bc-ab9ffa5c9c62"
id
"chatcmpl-REDtOtwDPng2EcVLFpJ8VAyF"
is_byok
false
latency
126
model_permaslug
"openai/gpt-oss-120b"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777045819-ngKC6Yyflf7zqsyeF1la"
session_id
(null)
api_type
"completions"
id
"gen-1777045819-3on2RVOUgx7x5DUPcmti"
upstream_id
"chatcmpl-REDtOtwDPng2EcVLFpJ8VAyF"
total_cost
0.000930696
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
532
47.94%
Dialogue to Total Word Ratio
Ratio: 66.17%, Deviation: 3.83%
neededClean
false
wordsTotal
532
wordsDialogue
352
23.9718%