NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ09VNRW1HWSV6WAEFM5YNJT
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
12.8s
Cost:
$0.001347
Provider:
Nebius
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
88.44%
Dialogue to Total Word Ratio
(details)
94.2190%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The dusty attic echoed with muffled giggles as Molly unfurled a crumpled map of the party venue for the loot. Molly: "We hit the kid zone at midnight, when the clown's nap time overlaps the cake frosting." Bill: "Midnight? The party ends at eleven; that's a two‑hour window for nap‑time mayhem." Molly: "Exactly, the parents will be snoozing, the kids will be asleep, and the candy will be unguarded." Bill: "What about the balloon arch? It could trigger the motion sensor." Molly: "I'll bring the anti‑balloon deflater; a tiny pinprick and poof—no more floating alarms." Bill: "And the gift table? It's a fortress of wrapping paper." Molly: "I've got a heat gun to melt tape without ripping the boxes, perfect for a silent swap." Bill: "Do we need disguises? I can dress as a giant dinosaur, it scares kids and gives us cover." Molly: "A dinosaur is cliché; I prefer a giant inflatable unicorn, double‑rainbow effect distracts the security cameras." Bill: "What about the toy robot that patrols the hall? It's programmed to squeal at intruders." Molly: "We'll jam its firmware with a homemade EMP; a forty‑second blackout and it's a tin can." Bill: "How many presents are we talking?"
created_at
"2026-04-24T17:51:02.954Z"
4/24/2026, 5:51:02 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
266
moderation_latency
(null)
generation_time
12159
tokens_prompt
77
tokens_completion
2176
native_tokens_prompt
125
native_tokens_completion
2214
native_tokens_completion_images
(null)
native_tokens_reasoning
1859
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00134715
router
(null)
provider_responses
0
endpoint_id
"de9b9344-2116-44ea-b30a-fd9ec94b1157"
id
"chatcmpl-9cbd5afbe1f6933f"
is_byok
false
latency
265
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Nebius"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777053062-w2LfCs76AHqHaId0e4F6"
session_id
(null)
api_type
"completions"
id
"gen-1777053062-7DTXdRhmLs2ci1ltXYWb"
upstream_id
"chatcmpl-9cbd5afbe1f6933f"
total_cost
0.00134715
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Nebius"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
201
88.44%
Dialogue to Total Word Ratio
Ratio: 84.08%, Deviation: 5.92%
neededClean
false
wordsTotal
201
wordsDialogue
169
94.2190%