NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKB67CK0TFWQ9SCE54Q2ZBD
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
28.9s
Cost:
$0.019252
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the inflatable castle, her black hoodie dusted with glitter from some earlier, less successful crime. Bill knelt beside her, clutching a crumpled party flyer like it was a blueprint to Fort Knox for toddlers. Beyond the plastic drawbridge, the backyard pulsed with shrieks, balloons, and frosting. Parents huddled near the grill, distracted by smoke and existential exhaustion. The gift table, however, stood majestically alone, groaning under neon wrapping paper and the naïve generosity of relatives who’d never met a sugar crash. "Focus, Bill, they're six-year-olds, not dragons," Molly whispered. Bill’s eyes stayed glued to a unicorn piñata swaying ominously from a tree branch. "Six-year-olds have stickiness," Bill muttered. He wiped imaginary jam from his gloves. Somewhere, a child began crying with the raw power of a malfunctioning fire alarm. "Relax; we wear black, we're invisible to sugar-highs," Molly said. She unfolded her own diagram, a lovingly annotated sketch of the backyard, including the sprinkler’s death zone and the clown’s probable blast radius. "That's your theory." Bill gulped. A cupcake flew past and splattered nearby. Molly blinked, recalculating. She silently corrected the plan; clearly, they were robbing a war zone with frosting artillery. The operation was only beginning.
created_at
"2026-02-16T13:46:09.655174+00:00"
2/16/2026, 1:46:09 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
6485
moderation_latency
(null)
generation_time
28606
tokens_prompt
77
tokens_completion
990
native_tokens_prompt
66
native_tokens_completion
1917
native_tokens_completion_images
0
native_tokens_reasoning
1626
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0192525
router
(null)
provider_responses
0
id
"resp_084a3b5909d520520169931f84931c81968bb87dbe7a9c243d"
status
200
is_byok
false
latency
136
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771249540-9jUQFqBY7OfyzwcJAGT4"
upstream_id
"resp_084a3b5909d520520169931f84931c81968bb87dbe7a9c243d"
total_cost
0.0192525
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.3769%
Matches word count
n/a
neededClean
false
words
205
99.7681%
Dialogue to Total Word Ratio
Ratio: 12.20%, Deviation: 2.20%
neededClean
false
wordsTotal
205
wordsDialogue
25
99.5725%