NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ04MMYRAEASJZNQJ7NYF2J3
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
5m 12s
Cost:
$0.002955
Provider:
Phala
Evaluation summary
Score
Evaluator
Details Link
0.03%
Matches word count
(details)
44.60%
Dialogue to Total Word Ratio
(details)
22.3139%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The van idled outside the suburban cul-de-sac like a nervous cat. Inside, Molly adjusted the oversized rubber glove she’d borrowed from a costume shop, while Bill polished a glittering plastic keychain that looked suspiciously like a toy wand. The mission was simple: infiltrate the neighborhood’s biggest birthday bash, a pink balloon extravaganza for eight-year-old Tommy, and walk away with every neatly wrapped gift before the cake was even cut. The only thing more ridiculous than the plan was the fact that they’d both signed up for a “Kids Party Safety” course last month, just to scout the layout. Molly flipped through a crinkled flyer titled “Tommy’s Superhero Spectacular” and pointed at a doodle of a cape. “Two entrances, three exits, and a chocolate fountain that’s basically a sticky trap,” she whispered, her voice a mix of excitement and mischief. Bill chuckled, tapping his earpiece, which was actually a broken walkie-talkie he’d repurposed as a candy dispenser. He had already mapped the garden gnomes’ positions on a napkin, noting which ones were mere decorations and which concealed motion sensors. The plan involved a distraction—Bill would unleash a horde of squeaky ducks from the trash can, while Molly slipped through the pet door behind the inflatable dinosaur. Their only concern was the security guard, a lanky teenage boy named Chad who took his job after watching too many spy movies. They imagined him being terrified by a sudden burst of glitter confetti. They grinned, ready for the chaos. Nothing could stop them now. Tonight. Molly: "The gift table sits under the clown tent, guarded by squeaky toys." Bill: "Squeaky toys are perfect; they’ll think we’re part of the act." Molly: "We need a distraction louder than a kazoo orchestra." Bill: "I’ve loaded a speaker with a marching band remix for extra chaos." Molly: "I’ll rig the balloon arch to pop on cue, spraying glitter everywhere." Bill: "Make sure the glitter doesn’t stick to our gloves; we need traction." Molly: "Anti‑static spray is ready, bought from the hardware store yesterday." Bill: "Chad the guard uses a night‑vision headset; we’ll blind him with slime." Molly: "A glow‑in‑the‑dark packet on his visor will look like a bug." Bill: "He’ll panic, call backup, and we’ll already have the loot." Molly: "The presents are in color‑coded bags; gold ones hide the biggest toys." Bill: "Prioritize gold bags, especially the LEGO set under the pink cake." Molly: "I’ll swap the cake with a decoy so no one notices the missing bricks." Bill: "Cake cutting is at 7:15; we hit the back door at 7:10." Molly: "Escape through the doghouse tunnel if anything goes wrong." Bill: "I’ve cleared the tunnel of cats; they’re terrified of our masks." Molly: "Our superhero capes will make us look like part of the party." Bill: "No one will suspect a superhero stealing candy and LEGO bricks." Molly: "After we’re out, we split the loot at the abandoned laundromat." Bill: "Sounds perfect. I’ll bring the portable safe to stash the LEGO sets." Molly: "Just remember, no laughing until we’re far away from the balloons." Bill: "Deal. Let’s make this the most legendary birthday‑party heist ever." Molly: "We’ll vanish before the last balloon pops."
created_at
"2026-04-24T16:19:49.86Z"
4/24/2026, 4:19:49 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
4670
moderation_latency
(null)
generation_time
310915
tokens_prompt
77
tokens_completion
5915
native_tokens_prompt
129
native_tokens_completion
6005
native_tokens_completion_images
(null)
native_tokens_reasoning
5038
native_tokens_cached
64
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00295535
router
(null)
provider_responses
0
endpoint_id
"909fdbc8-1b02-46ea-a9fc-cc3cfe65d992"
id
"chatcmpl-ae16e56ffdf08b84"
is_byok
false
latency
4595
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Phala"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777047589-bNkQzdFXmZsfz8pL7mEL"
session_id
(null)
api_type
"completions"
id
"gen-1777047589-Es0QN6ZNukgdnFMd5Vp4"
upstream_id
"chatcmpl-ae16e56ffdf08b84"
total_cost
0.00295535
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Phala"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.03%
Matches word count
n/a
neededClean
false
words
530
44.60%
Dialogue to Total Word Ratio
Ratio: 54.02%, Deviation: 4.02%
neededClean
false
wordsTotal
535
wordsDialogue
289
22.3139%