NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKB644MBJ5YF7NG7ERYY8AX
Run Details
Model:
GPT-5
Test:
Dialogue tags
Scenario:
Write 500 words with 70% dialogue
Duration:
50.0s
Cost:
$0.021952
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 70% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The alley behind the community center smelled like frosting and face paint, and the air pulsed with off-key renditions of Happy Birthday. Molly pressed her back to the mural of balloons and considered the ethics of villainy where juice boxes were involved. Bill cradled a grocery bag that rustled suspiciously. “We’re not monsters, Bill,” Molly whispered. “We’re entrepreneurs with unfortunate timing.” “Timing is precisely the issue,” Bill said. “How do you walk out with a mountain of presents while a conga line blocks the door?” “We don’t walk,” Molly said. “We glide. Elegance is our camouflage.” “Since when do we glide?” Bill asked. “Last time we tried elegance, you wore a cape that got sucked into a vacuum cleaner.” “That vacuum started it,” Molly said. “Focus. What do we know about children? They are unpredictable, sticky, and distracted by glitter.” “Which explains this,” Bill said, rustling the bag. “Glitter. Ethical distraction.” “Ethical is a stretch,” Molly said. “But sparkly chaos is chaos nonetheless. What else?” “We could pose as Gift Quality Control,” Bill said. “Clipboards, stern nodding, lots of humming.” “I can hum authoritatively,” Molly said. “But I can’t carry out a bicycle with a clipboard.” “Then we divide,” Bill said. “You charm the parents with elaborate forms. I shepherd the presents to the… temporary evaluation zone.” “Parents can smell a form from a mile away,” Molly said. “Fine. We’ll need lanyards. Lanyards say, ‘Don’t ask questions.’” “Do we have lanyards?” Bill asked. “We have ribbon,” Molly said. “Ribbons are lanyards that believe in themselves.” “Okay, but the cake,” Bill said. “There’s always a cake moment. Applause, candles, cameras. That’s a distraction and a minefield.” “Candles mean darkness for a dramatic twelve seconds,” Molly said. “We move during the wish. The birthday child’s wish will cover our sins.” “That is not how wishes work,” Bill said. “Also, what’s our cover phrase if someone asks why we’re hugging a stack of boxes?” “‘Voluntary Wrapping Recall,’” Molly said. “Say the tape is noncompliant.” “What does that mean?” Bill asked. “It means say it confidently,” Molly said. “Confidence is the duct tape of reality.” “Speaking of tape,” Bill said, “no tools. I made a vow after the piñata incident.” “That unicorn died a hero,” Molly said. “Fine. No tools. Just charisma and questionable badges.” “You brought badges?” Bill said. “I brought stickers that say HELLO MY NAME IS,” Molly said. “Names confer power. My name will be Supervisor Sparkle.” “I refuse to be Assistant Glitter,” Bill said. “I’ll be Logistics.” “Logistics is just a hat that says I have a plan,” Molly said. “Perfect.” “Ethical check,” Bill said. “We’re redistributing presents from the over-celebrated to the under-gifted?” “We’re temporarily relocating resources to a safe location where they will be appreciated by us,” Molly said. “It’s a pilot program.” “Pilot programs need metrics,” Bill said. “How many presents constitute success?” “Success is measured in bows per minute,” Molly said. “Now, visualize the room.” “I see balloons, a bouncy castle, and an aunt guarding the gift table like a dragon,” Bill said. “She smells fear.” “Then we’ll offer decoy fear,” Molly said. “I’ll accidentally mention gluten.” “That’s warfare,” Bill said. “What about the kids?” “We promise a magic trick,” Molly said. “We announce the Great Disappearing Pile. They clap. We bow. The pile leaves.” “They’ll expect it to reappear,” Bill said. “It will,” Molly said. “In our apartment, where applause is quieter.” “This is madness,” Bill said, smiling. “I love it.” “Remember the codeword,” Molly said. “If something goes wrong, say ‘cupcake emergency.’” “What if there is an actual cupcake emergency?” Bill asked. “Then the universe is conspiring to help,” Molly said. “Ready?” “As I’ll ever be,” Bill said. “Supervisor Sparkle, lead the way.” “Logistics, carry the confidence,” Molly said. “And maybe the unicorn helmet. It says ‘I mean business.’” “I thought no costumes,” Bill said. “It’s protective gear,” Molly said. “Against sprinkles and shame.” “Cupcake emergency,” Bill said, testing the phrase. “Save it for the candles,” Molly said, straightening her ribbon lanyard as the party cheered inside. “Let’s glide.”
created_at
"2026-02-16T13:46:27.944095+00:00"
2/16/2026, 1:46:27 PM
model
"openai/gpt-5-2025-08-07"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
19616
moderation_latency
(null)
generation_time
49896
tokens_prompt
77
tokens_completion
1781
native_tokens_prompt
66
native_tokens_completion
2187
native_tokens_completion_images
0
native_tokens_reasoning
1152
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0219525
router
(null)
provider_responses
0
id
"resp_088754446a14a4cc0169931f81411c8194b89251f70dfb5ed4"
status
200
is_byok
false
latency
182
endpoint_id
"7c2f859a-7890-4e8e-b1de-1cd1c0a800b4"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5-2025-08-07"
api_type
"completions"
id
"gen-1771249537-vuY18ZRBllSCdNvV3WyH"
upstream_id
"resp_088754446a14a4cc0169931f81411c8194b89251f70dfb5ed4"
total_cost
0.0219525
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.0000%
Matches word count
n/a
neededClean
false
words
678
9.6156%
Dialogue to Total Word Ratio
Ratio: 76.84%, Deviation: 6.84%
neededClean
false
wordsTotal
678
wordsDialogue
521
4.8078%