NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ02NJVWFS6954YD4F59XWSF
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
1m 56s
Cost:
$0.000869
Provider:
DeepInfra
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
97.47%
Dialogue to Total Word Ratio
(details)
98.7362%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the garage, Molly and Bill huddled over the party invitation, whispering. Molly: "The party is at 3 p.m., kids will be distracted by balloons." Bill: "We slip in through the back, grab the gift table, and vanish before the clown arrives." Molly: "What about the piñata? Should we swing it or just take the candy?" Bill: "Leave the piñata, it's messy; the presents are all in the gift bags." Molly: "The gift table is under a pink tarp. We need a ladder." Bill: "I brought one from the construction site, hidden in my van." Molly: "Great, we knock on the door, pretend to be entertainers." Bill: "I'll wear the oversized rabbit suit, you’ll be the magician." Molly: "Kids love magicians, they’ll gather around, we’ll distract them with tricks." Bill: "Meanwhile I’ll scoop up the bags, quick as a bunny." Molly: "We need to time it with the cake cutting, that’s the peak chaos." Bill: "Exactly, the sirens of frosting drown out any alarm." Molly: "After we’re out, we split the loot at the dock." Bill: "And we never speak of it again." Bill: "We'll vanish like ghosts, no footprints left." Molly: "And the kids will think it was magic."
created_at
"2026-04-24T15:45:23.337Z"
4/24/2026, 3:45:23 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
286
moderation_latency
(null)
generation_time
115512
tokens_prompt
77
tokens_completion
3984
native_tokens_prompt
114
native_tokens_completion
4550
native_tokens_completion_images
(null)
native_tokens_reasoning
3673
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.000868946
router
(null)
provider_responses
0
endpoint_id
"85835306-dba3-4334-87bc-ab9ffa5c9c62"
id
"chatcmpl-RTudefsYHspwUGEVJwLEQHe8"
is_byok
false
latency
83
model_permaslug
"openai/gpt-oss-120b"
provider_name
"DeepInfra"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777045523-iUaulbQeSftftrhGjiXv"
session_id
(null)
api_type
"completions"
id
"gen-1777045523-pDUosV7LOX2AooUZz2U9"
upstream_id
"chatcmpl-RTudefsYHspwUGEVJwLEQHe8"
total_cost
0.000868946
cache_discount
(null)
upstream_inference_cost
0
provider_name
"DeepInfra"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
200
97.47%
Dialogue to Total Word Ratio
Ratio: 86.00%, Deviation: 4.00%
neededClean
false
wordsTotal
200
wordsDialogue
172
98.7362%