NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHKAAY0B6F6063K2X6W1WCK9
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
1m 10s
Cost:
$0.043202
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly crouched behind the inflatable unicorn, covert in a raincoat patterned with ducklings. Inside the community hall, toddlers ricocheted off furniture, balloons drifted like rubbery asteroids, and the gift table sagged under mountains of aggressively cheerful wrapping paper. Icing already frosted the walls in optimistic handprints. "Perfect chaos for generous parents," Molly whispered. Bill adjusted his glittery clown wig, the elastic biting his neck. Confetti clung to his fake mustache like radioactive dandruff as he peered through the smeared window. Parents clustered near the coffee urn, clinging to paper cups as if they were flotation devices on a sea of shrieking children. A weary magician misfired doves into the ceiling fan. "Route, timing, exit, minimal screaming," Bill muttered. Molly produced a hand-drawn blueprint of the hall, executed in crayon, complete with stick-figure lookouts and tiny skulls over each security hazard. The bouncy castle was marked as a high-risk trampoline of doom. "We slip in during piƱata," Molly said. Bill eyed the towering pile of presents and the wobbling cake. "Relax, kids never suspect clowns," Bill replied. He straightened his oversized shoes, practiced a wobbling pratfall in the gravel, and followed Molly toward the door as the music peaked inside, shrill.
created_at
"2026-02-16T13:31:57.536484+00:00"
2/16/2026, 1:31:57 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
5010
moderation_latency
(null)
generation_time
69887
tokens_prompt
77
tokens_completion
1230
native_tokens_prompt
66
native_tokens_completion
4312
native_tokens_completion_images
0
native_tokens_reasoning
4023
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0432025
router
(null)
provider_responses
0
id
"resp_0740c497ff1f54c10169931c0644f88195b72a9267a88e4836"
status
200
is_byok
false
latency
184
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248646-37kvuFutoXCMNmah8PIg"
upstream_id
"resp_0740c497ff1f54c10169931c0644f88195b72a9267a88e4836"
total_cost
0.0432025
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.9190%
Matches word count
n/a
neededClean
false
words
203
100.0000%
Dialogue to Total Word Ratio
Ratio: 9.85%, Deviation: 0.15%
neededClean
false
wordsTotal
203
wordsDialogue
20
99.9595%