NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ04DN00JRJ0N71Y4W5WGNB6
Run Details
Model:
GPT-OSS 120B
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
34.6s
Cost:
$0.003458
Provider:
Amazon Bedrock
Evaluation summary
Score
Evaluator
Details Link
99.92%
Matches word count
(details)
98.77%
Dialogue to Total Word Ratio
(details)
99.3457%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The warehouse smelled like stale pizza and ambition as Molly leaned against a dented metal crate, flipping a half‑eaten bagel. Bill was perched on a stack of glossy flyers from the community center, his oversized sunglasses making his eyes look like half‑open windows. Outside, the neighborhood was buzzing with preparations for Timmy’s fifth birthday, a party that promised a bouncy castle, dinosaur‑shaped balloons, and a cake that could double as a fireworks display. The gymnasium where the celebration would unfold was a sprawling arena of plastic chairs, glitter‑covered tables, and a wall of gift‑wrapped boxes stacked in neat, colorful rows. In the corner, an inflatable pool glistened under fluorescent lights, ready to be filled with squealing children. A sprinkler system hung overhead, its nozzles concealed behind a banner that read “Happy Birthday, Timmy!” The plan, still more fantasy than fact, hinged on exploiting the chaos of a water show that would distract parents and kids alike. Molly had drawn a crude schematic on a napkin, marking the entry points: the main door guarded by volunteers, the side door locked with a child‑proof latch, and the sprinkler valves that could be turned on remotely. Bill’s specialty was timing; his watch showed two minutes until the clown’s grand entrance, the moment the crowd would be most distracted. The loot they imagined consisted of Lego sets, action figures, plush animals, and enough candy to supply a small army. Molly: "We slip in through the sprinkler, pretend it’s a surprise water show for the kids." Bill: "Good. While they splash, we crawl under the table of presents and grab the biggest boxes first." Molly: "The biggest boxes hide the Lego Star Wars sets and the limited‑edition action figures." Bill: "Don’t forget the plush unicorns; they’re worth more on the black market than a handful of coupons." Molly: "I’ve mapped the sprinkler valves; one turn will flood the dance floor for exactly three minutes." Bill: "Three minutes is plenty to unload everything into the inflatable pool and paddle out the back door." Molly: "The back door is guarded by the clown’s assistant, but he’s terrified of water, so we’ll soak him." Bill: "If he screams, the kids will think it’s part of the entertainment and cheer louder." Molly: "Perfect. After we’re out, we’ll leave a note that says ‘Happy Birthday, enjoy your clean‑swept presents!’" Bill: "That’ll throw the parents off the scent while we disappear into the night with candy‑coated treasure." Molly: "One last thing—keep your phone on silent; the toddlers’ squeals can drown out any alarm." Bill: "Got it. I’ll set the timer for the sprinklers at exactly 2:58, then we’re golden." Molly: "Remember, no one ever suspects a clown with a water gun." Bill: "And if we get caught, we can always blame the magician’s disappearing act." Molly: "Deal. Let’s make this the most diabolical birthday ever." Bill: "Heroes, before we vanish with the loot in a giant inflatable dinosaur." Molly: "And we’ll leave a trail of foam so the police think it’s a prank."
created_at
"2026-04-24T16:16:00.523Z"
4/24/2026, 4:16:00 PM
model
"openai/gpt-oss-120b"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
22990
moderation_latency
(null)
generation_time
24506
tokens_prompt
77
tokens_completion
6063
native_tokens_prompt
127
native_tokens_completion
5732
native_tokens_completion_images
(null)
native_tokens_reasoning
5239
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"end_turn"
usage
0.00345825
router
(null)
provider_responses
0
endpoint_id
"ce118593-d9b2-4d2d-bcd4-33f0609b42f1"
is_byok
false
latency
10077
model_permaslug
"openai/gpt-oss-120b"
provider_name
"SiliconFlow"
status
504
1
endpoint_id
"b3f80c05-e31a-44f6-b63f-414467b87553"
id
"85a60626-6948-4fc0-ac49-08fe46c81576"
is_byok
false
latency
313
model_permaslug
"openai/gpt-oss-120b"
provider_name
"Amazon Bedrock"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777047360-SO5WaXQ0rOes74whFlvj"
session_id
(null)
api_type
"completions"
id
"gen-1777047360-gDQF15NsuoXiIL9nmZg9"
upstream_id
"85a60626-6948-4fc0-ac49-08fe46c81576"
total_cost
0.00345825
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Amazon Bedrock"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
99.92%
Matches word count
n/a
neededClean
false
words
503
98.77%
Dialogue to Total Word Ratio
Ratio: 50.50%, Deviation: 0.50%
neededClean
false
wordsTotal
503
wordsDialogue
254
99.3457%