NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ06VWE211ZB53EA6RB9C1CH
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
2.7s
Cost:
$0.000590
Provider:
Groq
Evaluation summary
Score
Evaluator
Details Link
68.10%
Matches word count
(details)
0.01%
Dialogue to Total Word Ratio
(details)
34.0554%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly slunk behind the overgrown hedge that bordered the community center, her sneakers barely making a sound on the damp grass. The night air smelled of damp leaves and cheap hot dog vapor drifting from the nearby grill. She glanced at her watch—12:03 am. The party lights flickered inside, a rainbow of balloons bobbing like lazy jellyfish in a bathtub. "Bill, you see that security guard?" she whispered, her breath forming a mist. "I'm hiding behind the dumpster, just where we planned. He hasn't noticed the rubber chicken we left as a distraction," Bill replied, his voice muffled behind a cardboard box. Molly nudged the fence, peeking at the hall's glass doors. A chorus of giggles and a faint pop song leaked out, incongruous with the midnight hush. The presents were stacked in a pyramid on the far side of the room, each wrapped in glittery paper that promised candy, toys, and the occasional mystery item that would end up in the trash. "Okay, the guard will take his coffee break in five minutes. After that, we slip in through the service door. I already disabled the alarm with my trusty—" "My trusty what? A rubber chicken?" Bill snorted, trying not to laugh. "Exactly. It squeaks when the guard steps on it, sending a false fire alarm. He'll run out screaming, and the kids will think it's part of the entertainment." Molly raised an eyebrow. "And the kids?" "They'll think it's a clown show. We’re just adding to the chaos." She slipped a lock-pick from her pocket and held it up. "You ready to crack the pantry door?" "Born ready. I’ve practiced on my grandma’s jam jars." A distant shout echoed from the street. The guard, apparently not a fan of midnight coffee, shouted, "Hey! Who's out there?" "Hold that thought," Molly hissed, covering her mouth with a hand. She slipped out of the hedge, crouching low, and slipped through the side gate. Inside, the room was a sea of plastic chairs and a cake that looked suspiciously like a volcano. The presents towered like a mountain of hope. Bill slipped behind a table, pulling out a small, sleek sack. "This is the perfect size for the Lego set and that squeaky rubber duck," he said, his eyes gleaming. Molly crouched next to the birthday banner, whispering, "Remember, we only take the gifts, leave the party favors. No one needs an extra plastic spoon." "Got it," Bill replied, stuffing a plush unicorn into his sack. The guard's footsteps faded. The fire alarm screeched, a shrill wail that made the children gasp. The lights flickered, and a clown—an actual clown hired for the event—spun in a circle, juggling balloons. Molly grinned. "Now, the real show begins." "On the count of three?" Bill asked, checking his watch. "One, two—" "—three!" they shouted in unison, bolting toward the gift mountain, their sacks bulging with stolen treasures. The party erupted in confused applause, unaware that the biggest surprise was a pair of thieves making a clean getaway, laughing as they vanished into the night.
created_at
"2026-04-24T16:58:44.242Z"
4/24/2026, 4:58:44 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
164
moderation_latency
(null)
generation_time
829
tokens_prompt
77
tokens_completion
1093
native_tokens_prompt
131
native_tokens_completion
951
native_tokens_completion_images
(null)
native_tokens_reasoning
242
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00059025
router
(null)
provider_responses
0
endpoint_id
"900c5273-4931-4e9f-bb52-57935dd154b4"
id
"chatcmpl-33e713b1-9aaf-460a-8cef-ca735a1b21a3"
is_byok
false
latency
158
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Groq"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777049924-PGBYJSziSOTaCjldutn7"
session_id
(null)
api_type
"completions"
id
"gen-1777049924-lx5gVmftwtNhtfWBvS5E"
upstream_id
"chatcmpl-33e713b1-9aaf-460a-8cef-ca735a1b21a3"
total_cost
0.00059025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Groq"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
68.10%
Matches word count
n/a
neededClean
false
words
514
0.01%
Dialogue to Total Word Ratio
Ratio: 36.31%, Deviation: 13.69%
neededClean
false
wordsTotal
515
wordsDialogue
187
34.0554%