NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ045MCF1TD7QM6XQVYCYS7E
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 200 words with 50% dialogue
Duration:
8m 38s
Cost:
$0.002407
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
99.44%
Dialogue to Total Word Ratio
(details)
99.7200%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Rain drummed on the roof of the center, where the planning session hid behind castles. Molly crouched beside a cracked cookie tin, studying the floor plan of the birthday bash. Bill perched on a wobbling chair, fiddling with a toy grenade. The room smelled of cake frosting and latex. Outside, giggling children shouted “Pin the Tail on the Donkey!” while parents struggled with party hats. The target: a room packed with wrapped gifts, each promising cash value to the buyer. Their mission was simple—sneak in, swipe every present, vanish before the cake was cut. Molly: "We'll slip through the piñata opening, it's the biggest gap after they dump the candy." Bill: "The aunt with the oversized clown nose watches the gift table, but she can't see behind the balloon arch." Molly: "We disable the noisy DJ console, then the kids think it's a game and keep dancing." Bill: "What about the chocolate fountain? It's sticky, but we can use it to slide under the table." Molly: "After we grab the presents, we dump them into the laundry basket and exit through the staff door." Bill: "Got it. Meet at the cupcake stand at midnight, and no one mentions word 'pirates'."
created_at
"2026-04-24T16:20:04.742Z"
4/24/2026, 4:20:04 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
10929
moderation_latency
(null)
generation_time
11277
tokens_prompt
77
tokens_completion
4172
native_tokens_prompt
127
native_tokens_completion
3980
native_tokens_completion_images
(null)
native_tokens_reasoning
3870
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
usage
0.00240705
router
(null)
provider_responses
0
endpoint_id
"b3f80c05-e31a-44f6-b63f-414467b87553"
id
"3cdc40b2-26f2-4a36-b364-35367bf7b610"
is_byok
false
latency
325
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Amazon Bedrock"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777047604-JkjoacX6kQErsZGLs8x1"
session_id
(null)
api_type
"completions"
id
"gen-1777047604-YRZpg7fh1C3EzG3eYK6G"
upstream_id
"3cdc40b2-26f2-4a36-b364-35367bf7b610"
total_cost
0.00240705
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
201
99.44%
Dialogue to Total Word Ratio
Ratio: 52.74%, Deviation: 2.74%
neededClean
false
wordsTotal
201
wordsDialogue
106
99.7200%