NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ06HKSFQE51PRTAFMGG2KGF
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write unattributed dialogue
Duration:
16.3s
Cost:
$0.000223
Provider:
AtlasCloud
Evaluation summary
Score
Evaluator
Details Link
0.03%
Count dialogue tags
(details)
Model input
User/Human
You are an expert novel writer in the cozy mystery genre. Write a scene of about 200 words, with dialogue between two characters. The dialogue should be unattributed, meaning the speaker is not explicitly mentioned. The scene should be set in a small town bakery. Only return the story, nothing else.
Model response
Messages
Meta
AI
The bell over the bakery door jingled, scattering crumbs of sugar onto the polished floor. Sunlight filtered through the lace curtains, painting amber patterns on the wooden tables where a half‑eaten scone waited beside a steaming mug of tea. A faint hum of the mixer blended with the distant murmur of Main Street. “Did you hear about the old Miller’s heirloom clock they found in the pantry?”
the voice asked
, low and curious. “It disappeared the night before the town fair—right when the bakery was closed for cleaning,” the reply came, a hint of concern threading through the words. “The back door was left ajar, and the only thing missing was the brass key that locks the pantry,” the first
voice continued
, eyes flicking toward the rack of fresh croissants. “Someone must have known the code. The lock’s not complicated, but you’d need the key or a precise feel for the tumbler,”
the other answered
, tapping a fingertip against a warm cinnamon roll. “A clue, perhaps, in the flour sack? I recall seeing a tiny silver pin tucked in there last week,” the first mused, leaning closer to the counter. “It’s worth a look. If we find the pin, we might just turn the tide on this little mystery,”
the second said
, a smile curling at the edge of the mouth.
created_at
"2026-04-24T16:53:07.515Z"
4/24/2026, 4:53:07 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
550
moderation_latency
(null)
generation_time
16115
tokens_prompt
81
tokens_completion
646
native_tokens_prompt
128
native_tokens_completion
525
native_tokens_completion_images
(null)
native_tokens_reasoning
306
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0002228
router
(null)
provider_responses
0
endpoint_id
"8b516897-c13e-4925-afc3-44fa86608fe9"
id
"5ae7f2b22ed34c6d92adb12b70c95058"
is_byok
false
latency
513
model_permaslug
"openai/gpt-oss-120b"
provider_name
"AtlasCloud"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777049587-HQfdzgX5ltbZisNNiLBa"
session_id
(null)
api_type
"completions"
id
"gen-1777049587-4oXhOrTDqYUzQC3UZHO6"
upstream_id
"5ae7f2b22ed34c6d92adb12b70c95058"
total_cost
0.0002228
cache_discount
(null)
upstream_inference_cost
0
provider_name
"AtlasCloud"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.03%
Count dialogue tags
n/a
neededClean
false
dialogueTags
0
"the voice asked"
1
"voice continued"
2
"the other answered"
3
"the second said"