NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0296BCH170S43Q2CQEYFG1
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
8.9s
Cost:
$0.001641
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
100.00%
Matches word count
(details)
98.39%
Dialogue to Total Word Ratio
(details)
99.1958%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The dim garage smelled of pizza and old plans; Molly and Bill crouched over a crumpled invitation. Molly: “We slip through the pet door, same as before, kids distracted by the clown.” Bill: “Clown's pins cover us; they’ll think the noise is part of the act.” Molly: “I’ll disable the lights with the portable dimmer, then we grab the gift table before the piñata explodes.” Bill: “What about the security camera? I brought the signal jammer from the last job.” Molly: “Plug it into the power strip behind the cake stand, and the footage will loop a five‑minute clip of balloons.” Bill: “Got the zip‑ties for the presents, right? We’ll bag them fast, no torn wrapping.” Molly: “Remember, we leave the candy untouched; kids will think their treats are a miracle and not suspect a thing.” Bill: “All right, on the count of three, we bust in, grab the loot, and disappear before the parent‑teacher meeting starts.” Bill: “When the clown tosses the last balloon, I’ll swing the rope, slide under the table, and snatch the biggest gift box marked ‘Superhero’ before anyone notices.” Molly: “If any kid cries, we’ll blame the overdue nap time and slip out quietly.”
created_at
"2026-04-24T15:38:37.317Z"
4/24/2026, 3:38:37 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
8359
moderation_latency
(null)
generation_time
8854
tokens_prompt
77
tokens_completion
2706
native_tokens_prompt
127
native_tokens_completion
2704
native_tokens_completion_images
(null)
native_tokens_reasoning
2380
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
usage
0.00164145
router
(null)
provider_responses
0
endpoint_id
"b3f80c05-e31a-44f6-b63f-414467b87553"
id
"2fce495c-c972-4da3-91ac-378e893608c9"
is_byok
false
latency
322
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Amazon Bedrock"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777045117-M1UQrhvQrafzFmyWycUJ"
session_id
(null)
api_type
"completions"
id
"gen-1777045117-IqZG77G0PIQyTUXZJreY"
upstream_id
"2fce495c-c972-4da3-91ac-378e893608c9"
total_cost
0.00164145
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Matches word count
n/a
neededClean
false
words
199
98.39%
Dialogue to Total Word Ratio
Ratio: 86.43%, Deviation: 3.57%
neededClean
false
wordsTotal
199
wordsDialogue
172
99.1958%