NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ093P1JVJ7012880898BWYE
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
4m 31s
Cost:
$0.001944
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
97.63%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
48.8138%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Morning light filtered through the cracked blinds of the abandoned bakery, casting a striped pattern on the dusty floor. Molly crouched beside a rusted toolbox, her gloved fingers tapping a rhythm on the lid as if it were a drum. Bill lounged against the brick wall, munching on a stale bagel while eyeing the crumpled blueprint spread on the table. The plan was absurd, glorious, and entirely unsuitable for anyone with a conscience. A children's birthday party at the upscale community center—balloons, clowns, a cake taller than a toddler—was the perfect target for their latest caper. The gifts, piled high in glossy wrapping paper, glittered like stolen treasure in the mind of a professional thief who once tried to steal a museum’s prized dinosaur bone. Molly's eyes glittered as she traced the route from the service entrance to the pantry where the presents would be stacked. Bill's brow furrowed when he imagined the security guard's night vision goggles, a relic from a precinct budget cut that still managed to spot a thief in a tutu. The duo had repurposed a magician's invisible box for the job, a contraption that would let them slip in unnoticed, as long as they could keep the kids entertained with a fake rabbit. Their rehearsals had been a disaster of squeaky toys and spilled punch, but they were determined. The clock ticked, the coffee grew cold, and a stray balloon drifted down, popping with a soft hiss that sounded like a warning. They whispered, eyes sparkling with mischief, silently together. Molly: 'Alright, Bill, the party starts at two. When do we slip in quickly?' Bill: 'We roll through the service door at eleven, blend in with the clown parade, then duck behind the cake table.' Molly: 'Sounds good, but the security guard is armed with a laser pointer that tracks movement.' Bill: 'I have a laser mirror tucked in my jacket to bounce the beam into the ceiling tiles.' Molly: 'Great, and the presents are stacked on the high table near the balloon arch.' Bill: 'We’ll use the invisible box to glide past the kids, then grab the gifts swiftly quick.' Molly: 'Quick? Like a ninja, or like a raccoon with sticky paws?' Bill: 'Both, actually. I wore the raccoon costume for the last gig, so it’s already on.' Molly: 'Perfect. The kids’ attention will be on the magician’s rabbit, not on us.' Bill: 'And the rabbit is a stuffed sock with a spring, so when I pull the lever, it will hop out, drawing screams and chaos, perfect cover.' Molly: 'Chaos is our ally. Once the room is a mess, we sprint to the gift table, stuff the sack, and vanish.' Bill: 'What about the parents’ return? They’ll be too busy with cake crumbs to notice a missing present.' Molly: 'If they do, we have the excuse: “The raccoon stole it for purposes!”' Bill: 'Brilliant. I’ll record a confession, then delete it.' Molly: 'Deal. Meet at the dumpster behind the gym before the cake cutting.' Bill: 'Got it, boss.'
created_at
"2026-04-24T17:37:56.798Z"
4/24/2026, 5:37:56 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
482
moderation_latency
(null)
generation_time
270490
tokens_prompt
77
tokens_completion
7432
native_tokens_prompt
129
native_tokens_completion
7749
native_tokens_completion_images
(null)
native_tokens_reasoning
6673
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0019437
router
(null)
provider_responses
0
endpoint_id
"3dc95f88-3960-4c91-b2bb-bbd2f5069c48"
id
"40a9896cfde83056b60634380a8643b5"
is_byok
false
latency
389
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777052276-2tStxyjNPEHAytuPwmKh"
session_id
(null)
api_type
"completions"
id
"gen-1777052276-wH0akRjPpAgYpV03xNXA"
upstream_id
"40a9896cfde83056b60634380a8643b5"
total_cost
0.0019437
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
97.63%
Matches word count
n/a
neededClean
false
words
507
0.00%
Dialogue to Total Word Ratio
Ratio: 31.56%, Deviation: 18.44%
neededClean
false
wordsTotal
507
wordsDialogue
160
48.8138%