NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 200 words with 10% dialogue
Run ID
2025-04-14-01JRTSYJP9SV8EANN7VNAVDB04
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
In the shadow of the inflatable unicorn archway, Molly crouched low, her eyes locked on a fortress of gift-wrapped boxes stacked precariously by the lemonade table. The scent of frosting drifted through the summer air, mingling with shrieks of sugar-fueled joy. “This is madness,” Bill hissed, flattening himself behind a lattice fence festooned with crepe paper. “We look suspicious.” He nudged his paper party hat, trying to balance stealth and celebration. Molly offered a sly grin. “If you want subtlety, lose the giraffe face paint.” Against reason, Bill’s cheeks had transformed into a menagerie. He scowled, rubbing at a blue blot. Nearby, a gaggle of balloon-sword wielding children thundered past, oblivious. Bill surveyed the mountain of gifts with greedy calculation. “So we just stroll up, grab Santa’s haul, and waltz out?” “Inconspicuous waltzing is an art form,” Molly replied, unspooling a length of pink streamer. “We need a diversion.” She nodded toward an adult distracted by a sunken bouncy castle—opportunity incarnate. Bill shuddered. “If that castle eats another toddler, we’re in trouble.” Molly filed that under ‘Plan B.’ Lifting a cupcake, she whispered, “We blend in. Grim determination. Sugar high.” Bill sighed. “This is the weirdest job yet.”
id
"gen-1744656354-SgZr3QDZEmvQyPVMOgJC"
upstream_id
"chatcmpl-BMJ6Yh5X0zp9wQRNUsNTYuyLisbl3"
total_cost
0.002534
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:45:58.563455+00:00"
Monday, April 14th, 2025 at 6:45:58 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
354
moderation_latency
126
generation_time
2920
tokens_prompt
68
tokens_completion
299
native_tokens_prompt
67
native_tokens_completion
300
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.002534
Evaluation details
Result
Evaluator
Details
Meta Data
99.9840%
Matches word count
n/a
neededClean
false
words
202
0.0000%
Dialogue to Total Word Ratio
Ratio: 30.69%, Deviation: 20.69%
neededClean
false
wordsTotal
202
wordsDialogue
62
49.9920%