NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ078AZ7VDYZVWBWH39Q6RXE
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
1m 58s
Cost:
$0.005136
Provider:
Together
Evaluation summary
Score
Evaluator
Details Link
51.93%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
25.9628%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Night draped over the suburban cul‑de‑sac like a cheap blanket, and the faint scent of cake frosting lingered from the party across the street. Molly crouched behind a hedge, adjusting the strap of her oversized duffel bag, labeled 'Kid‑Loot' in glittery marker. She peered over the low wall, counting balloons—nine orange, seven blue, a rogue unicorn float that looked stolen from a daycare graduation. Behind her, Bill flexed his fingers, cheap sunglasses reflecting the porch lights. The kids' laughter was a thin, sugary soundtrack to their caper. Molly glanced at the schedule scavenged from a discarded invitation: '2:00 pm – magic show, 2:30 pm – clown, 3:00 pm – cake, 3:30 pm – open gifts.' The optimal window for a clean swipe was clear: the moment the clown vanished and before parents realized the presents were missing. She pulled a battered walkie‑talkie, testing the static before slipping it into her blazer. The house was a fortress of pastel plastic—cardboard towers, streamers hanging like a nervous system. Yet the backdoor, a forgotten side entrance left ajar for the garbage truck, offered a quiet path. Bill, ever the opportunist, had already scoped the kitchen where the gift table would be loaded, noting the bright green wrap and the oversized teddy bear guarding the pile. A rustle of leaves announced a stray cat darting across the lawn, a perfect distraction. Molly smirked, her mind racing through logistics: disable the motion sensor, divert the clown with a fake balloon pop, and—most importantly—avoid the inevitable showdown with the birthday girl, a six‑year‑old known for crying at the slightest disruption. Molly: All right, Bill, the clock hits two, we slip in through the side door; think the motion sensor will bite? Bill: If it chirps, I’ll just dance; the clown’s squeaky shoe can be swapped for a decoy to trigger a false alarm. Molly: The kids are glued to the magician’s rabbit; we’ll be the rabbit’s unwelcome backup. Bill: I’ll handle the ground level, pocket the green‑wrapped boxes while you climb the ladder for the top tier. Molly: The security camera on the porch has a blinking red eye—what’s our plan there? Bill: I’ve got the old floppy‑disk magnet; one swipe and it goes static like a dead TV. Molly: I need a replacement squeaky shoe for the clown; do we have one? Bill: Stole one from a circus supply shop; it squeaks louder than a mouse in a broom closet. Molly: We hit the cake table at three, grab the frosting, and use it as camouflage for our gloves. Bill: Frosting‑covered gloves mean no fingerprints; kids love frosting, so it’s perfect cover. Molly: If the birthday girl sees us, we’ll claim we’re part of the entertainment and do a “magic trick” to make the presents vanish. Bill: The cat will be our alibi—‘the cat knocked over the table’; fluffy witnesses never testify. Molly: Remember the backup exit through the kitchen window; it’s narrow but enough for us and the loot. Bill: Sync our watches—zero‑zero‑one to two‑zero‑three; once we’re in, we’ll make this the most epic birthday heist ever now.
created_at
"2026-04-24T17:05:32.146Z"
4/24/2026, 5:05:32 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
202
moderation_latency
(null)
generation_time
117735
tokens_prompt
77
tokens_completion
7438
native_tokens_prompt
127
native_tokens_completion
8528
native_tokens_completion_images
(null)
native_tokens_reasoning
6609
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00513585
router
(null)
provider_responses
0
endpoint_id
"c7cc0654-cfe3-4531-bd8a-68cd162ba020"
id
"ogSrTGR-z1gNr-9f16b180065390c8"
is_byok
false
latency
174
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Together"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777050332-hMOhkdFCXcgaBAFjNvfh"
session_id
(null)
api_type
"completions"
id
"gen-1777050332-kXw7esGZ7hLKw79LktPY"
upstream_id
"ogSrTGR-z1gNr-9f16b180065390c8"
total_cost
0.00513585
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Together"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
51.93%
Matches word count
n/a
neededClean
false
words
516
0.00%
Dialogue to Total Word Ratio
Ratio: 11.05%, Deviation: 38.95%
neededClean
false
wordsTotal
516
wordsDialogue
57
25.9628%