NC Bench
Overview
About
Tests
Models
Model
GPT-4.1
Test
Dialogue tags
Scenario
Write 200 words with 10% dialogue
Run ID
2025-04-14-01JRTSWEK530M7Q9FSHVQZX30X
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly squinted through the hedge, clutching the world’s stickiest glitter bomb. Across the yard, unicorns danced on napkins and cake toppers. Seven children, faces smeared with frosting, circled a tower of neon-wrapped presents atop a rickety picnic table. The air shimmered with the shrieks and giggles of sugar-fueled anarchy. “Bill, did you bring the decoy piñata?” Molly whispered, eyes never leaving her targets. Bill’s hoodie bristled as he produced a giant papier-mâché dragon. “Obviously. It’s stuffed with kale.” Molly grinned. “Your genius astounds me.” The sound of a balloon popping made Bill flinch. “We go in during Duck, Duck, Goose. Kids are distracted then. Follow my lead.” As a harried mother herded the kids into a squawking circle, Molly and Bill darted forward, the dragon wobbling precariously between them. Molly twisted the glitter bomb’s fuse, confidence blooming. Bill hissed, “Left, left! That kid’s looking suspicious.” “Relax,” Molly replied, sliding behind a stack of juice boxes. “I blend in. I was almost a birthday clown once.” A small hand tugged Molly’s sleeve. “Are you the magician?” Molly smiled. “Kid, you have no idea.” The real fun, she thought, was one birthday wish away.
id
"gen-1744656284-76E9nteTHQrKFp75QzjK"
upstream_id
"chatcmpl-BMJ5Rq1Pt2U8BhXLsai6RXrnscv6b"
total_cost
0.002494
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:44:51.75961+00:00"
Monday, April 14th, 2025 at 6:44:51 PM GMT+00:00
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
1124
moderation_latency
105
generation_time
5208
tokens_prompt
68
tokens_completion
294
native_tokens_prompt
67
native_tokens_completion
295
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.002494
Evaluation details
Result
Evaluator
Details
Meta Data
99.3769%
Matches word count
n/a
neededClean
false
words
195
0.0005%
Dialogue to Total Word Ratio
Ratio: 28.72%, Deviation: 18.72%
neededClean
false
wordsTotal
195
wordsDialogue
56
49.6887%