NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ09G94P3J3QWRAXXVYGG2M2
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
3m 13s
Cost:
$0.001167
Provider:
Novita
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain pattered against the cracked window of the abandoned warehouse, drumming a nervous beat that matched the twitch of Bill’s cufflinks. Molly perched on a dusty barrel, her cap tilted like a crooked crown, and stared at the flyer taped to the wall: “Little Heroes Birthday Bash – 2 pm, 12‑year‑old Timmy’s Jungle Safari.” The colors screamed plastic dinosaur and glitter. “Kid‑friendly chaos,” Bill said, tapping his lock‑pick with a thumb. “Perfect camouflage for a heist.” Molly folded the flyer with reverent sighs. “You mean a party where the cake is shaped like a volcano and the piñata is a bribed giraffe?” Bill chuckled, his breath smelling of stale popcorn. “Exactly. No security cameras, just a line of adults with mismatched socks and a DJ who thinks ‘Baa Baa Black Sheep’ is a remix.” Molly leaned in, eyes glinting. “So the plan?” “First, we crash the gift table before the kids get a chance to pinch it themselves,” Bill replied, tracing a mental map over the crayon‑drawn floor plan he’d sketched on a napkin. “Then we swap the real presents with… I don’t know, inflatable dinosaurs that squeak when you pull the string?” Molly laughed, the sound echoing off the concrete. “Inflatable dinosaurs, huh? I thought about a decoy: a mountain of bubble wrap that looks like a stack of toys. Kids love bubble wrap.” “Bubble‑wrap mountain,” Bill repeated, nodding. “And a distraction—” “—a karaoke showdown,” Molly cut in, “We’ll get the DJ to play ‘Let It Go’ on loop. Parents will be too busy pretending to sing to notice us.” “Fine, but we need a way out. The exit is a pop‑up tent that looks like a treasure chest. We’ll hide in it after the loot.” Molly paced, her shoes squeaking on the linoleum. “And the gift bag? We can’t just grab them in the open. We need a diversion, maybe a fake fire drill.” “—I’ve got a smoke‑bomb shaped like a dragon. When it blows, everyone thinks the party’s a ‘dragon‑breath’ theme and runs for the exits. Meanwhile we slip in, swap the gifts, and vanish.” Molly clapped her hands. “Brilliant! And I’ve already recruited my cousin, Tiny Tim, who’s eight and can crawl under the table without being seen.” Bill raised an eyebrow. “He’s a kid. Won’t that blow our cover?” Molly shrugged. “He’s the ‘gift inspector.’ He’ll pretend to check each present for quality. Parents love that.” Bill paced, counting on his fingers. “Alright, timeline: 1 pm – we arrive in clown disguise, 1:20 – smoke‑dragon, 1:30 – swap, 1:45 – exit through treasure tent. That gives us fifteen minutes of pure loot.” Molly tapped the flyer. “And after we leave, we’ll donate the presents to a charity that doesn’t know they’re stolen. Good karma, right?” Bill laughed. “Just make sure the charity isn’t in the same neighborhood. Otherwise the kids will spot their missing toys and start a neighborhood watch.” Molly smirked. “Then we’ll get a new identity—‘Molly the Magician’ and ‘Bill the Balloon Artist.’ Both perfect covers for a future heist.” Bill checked his watch, the second hand spinning like a tiny carnival ride. “We’re set. Let’s meet at the party in fifteen minutes, bring the dragon smoke‑bomb, and remember: no kids, no cry.” Molly tipped her cap. “And no tears, only giggles. Let’s go steal a birthday.”
created_at
"2026-04-24T17:44:49.568Z"
4/24/2026, 5:44:49 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
532
moderation_latency
(null)
generation_time
193392
tokens_prompt
77
tokens_completion
4353
native_tokens_prompt
129
native_tokens_completion
4642
native_tokens_completion_images
(null)
native_tokens_reasoning
3416
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00116695
router
(null)
provider_responses
0
endpoint_id
"3dc95f88-3960-4c91-b2bb-bbd2f5069c48"
id
"5e508e7da73900929a0f47cce32beab7"
is_byok
false
latency
411
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Novita"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777052689-yt57ynp2NT5jYYYjYoAO"
session_id
(null)
api_type
"completions"
id
"gen-1777052689-73Gv1wZlZtl80xjh4BMM"
upstream_id
"5e508e7da73900929a0f47cce32beab7"
total_cost
0.00116695
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Novita"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
549
0.00%
Dialogue to Total Word Ratio
Ratio: 71.40%, Deviation: 41.40%
neededClean
false
wordsTotal
549
wordsDialogue
392
0.0000%