NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0A82A0WBNMPS6HXFFX3YX6
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
1m 13s
Cost:
$0.000653
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
100.00%
Dialogue to Total Word Ratio
(details)
50.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The backyard was a mess of balloons and plastic cups, lights strung like a nervous fairy. Molly crouched behind the garden gnome, peeking over the fence. Bill slipped glittery confetti into his pocket, his grin as wide as the clown nose on the snack table. They'd scoped the party for weeks, and now the scent of cake and crayons hung in the night. Molly: "Alright, Bill, the birthday kid's favorite superhero—Captain Sparkle—has a secret stash behind the piñata. We hit it when the clown's juggling flaming marshmallows." Bill: "Flaming marshmallows? That's a new one. I thought they'd distract the kids, not burn the cake." Molly: "Exactly, it's a win-win. While they're gawking at the fire, we slip in, grab the gifts, and disappear before the cake cooling fan starts whirring." Bill: "Do we have a map of the gift tables? The ones with the glittery wrapping are near the unicorn inflatable, right?" Molly: "Yep. The biggest pile is under the rainbow banner, beside the pinata that looks like a dragon. The kids' parents will be too busy taking selfies to notice." Bill: "What about security? The adult volunteers are armed with foam swords and a clipboard. We need a distraction." Molly: "I've got it. I’ll unleash the inflatable dinosaur to chase the kids. While they scream, you’ll tip the cake stand and create a frosting avalanche." Bill: "Sounds messy, but effective. And the frosting will mask our footprints. Do we have a getaway vehicle?" Molly: "The clown's tiny pink convertible parked in the driveway. It's barely big enough for a clown nose, but it'll fit two grown thieves with bags of loot." Bill: "Perfect. And the kids' party favors—those cheap plastic sunglasses—can double as our disguises." Molly: "Alright, timeline. At 8:15, the clown starts the magic show. At 8:17, the dinosaur inflates. At 8:20, we move in." Bill: "Got it. I'll bring the oversized grappling hook shaped like a candy cane—looks innocent, works great." Molly: "And I’ll wear the giant clown shoes. Nobody suspects a thief in oversized clown shoes." Bill: "One last thing—what do we do with the presents after we grab them?" Molly: "We dump them in the neighbor's trash bin. They'll think the kids got a surprise recycling party." Bill: "Brilliant. Kids love recycling, and parents love not having to deal with extra wrapping." The moon hung low, casting silver over pink balloons as the thieves slipped into shadows. Molly adjusted her oversized shoes, Bill checked the candy‑cane grappling hook. Children's laughter rose in gentle waves, oblivious to mischief beyond pastel arches. In a heartbeat, the party would become a treasure trove, and they the only ones leaving with loot.
created_at
"2026-04-24T17:57:49.003Z"
4/24/2026, 5:57:49 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
840
moderation_latency
(null)
generation_time
73159
tokens_prompt
77
tokens_completion
3422
native_tokens_prompt
114
native_tokens_completion
3415
native_tokens_completion_images
(null)
native_tokens_reasoning
2715
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000653296
router
(null)
provider_responses
0
endpoint_id
"85835306-dba3-4334-87bc-ab9ffa5c9c62"
id
"chatcmpl-RdKVdtRWddeGxYx6AgEVcMb3"
is_byok
false
latency
103
model_permaslug
"openai/gpt-oss-120b"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777053469-PbTxP4azTPdosRlvrvN8"
session_id
(null)
api_type
"completions"
id
"gen-1777053469-OowGaWHxfjelHa3SqPAz"
upstream_id
"chatcmpl-RdKVdtRWddeGxYx6AgEVcMb3"
total_cost
0.000653296
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
449
100.00%
Dialogue to Total Word Ratio
Ratio: 70.00%, Deviation: 0.00%
neededClean
false
wordsTotal
450
wordsDialogue
315
50.0000%