NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ01QA4R1N7JRA6SYPRFZPW3
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
49.8s
Cost:
$0.000263
Provider:
AtlasCloud
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
95.95%
Dialogue to Total Word Ratio
(details)
47.9760%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind a stack of cardboard boxes, tapping a pencil against the edge of a dented metal tray. The loft smelled of stale coffee and rusted ambition. “You really think a kids’ party is worth the risk?” Bill asked, adjusting the oversized clown nose perched on his face. “Exactly,” Molly replied, smirking. “Nobody expects grown‑ups in a balloon arch, and the presents are basically mini‑treasure chests. Think of the loot—designer toys, plastic swords, a mountain of chocolate. It’s a sweet gig.” Bill chuckled, his voice muffled by the squeaky red shoes. “Sweet, huh? You know the host is a PTA president, right? She probably has a security system that could spot a raccoon in a tutu.” “I’ve got a plan for that,” Molly said, flipping a sheet of crumpled schematics. “We’ll pose as entertainers. I’ll be the ‘Magic Molly’ who pulls rabbits out of hats. You’ll be ‘Bouncy Bill’ the inflatable clown.” “So I’m supposed to jump on a bouncy castle and steal presents while kids are distracted?” Bill asked, eyebrows raising. “Yes! The kids will be squealing, the parents will be snapping photos, and we’ll be sliding down the candy‑cane slide with a sack of loot. Plus, we can hide a bag in the ‘party favor’ station—no one will suspect a thing.” Bill tapped the table
created_at
"2026-04-24T15:28:51.364Z"
4/24/2026, 3:28:51 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
10034
moderation_latency
(null)
generation_time
49704
tokens_prompt
77
tokens_completion
638
native_tokens_prompt
77
native_tokens_completion
638
native_tokens_completion_images
(null)
native_tokens_reasoning
280
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
(null)
native_finish_reason
(null)
usage
0.0002629
router
(null)
provider_responses
0
endpoint_id
"8b516897-c13e-4925-afc3-44fa86608fe9"
id
"981736e693834298aed708a13b2ac11e"
is_byok
false
latency
9916
model_permaslug
"openai/gpt-oss-120b"
provider_name
"AtlasCloud"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777044531-RpOczy6xjdD4FFGEcDah"
session_id
(null)
api_type
"completions"
id
"gen-1777044531-aMNOxpBQhF9OGDSydLah"
upstream_id
"981736e693834298aed708a13b2ac11e"
total_cost
0.0002629
cache_discount
(null)
upstream_inference_cost
0
provider_name
"AtlasCloud"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
220
95.95%
Dialogue to Total Word Ratio
Ratio: 69.09%, Deviation: 0.91%
neededClean
false
wordsTotal
220
wordsDialogue
152
47.9760%