NC Bench
Overview
About
Tests
Models
Vendors
Run: 2026-02-16-01KHK9Z4D5T2GP1J85ZEWXQARY
Run Details
Model:
GPT-5.1
Test:
Dialogue tags
Scenario:
Write 200 words with 10% dialogue
Duration:
42.9s
Cost:
$0.024852
Provider:
OpenAI
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 10% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The bouncy castle loomed like a technicolor fortress over the suburban backyard, its plastic turrets squeaking with every frantic jump. Streamers fluttered, balloons bobbed, and somewhere inside a magician was failing to impress a ruthless audience of six-year-olds. From behind the hedge, Molly adjusted her ridiculous unicorn headband, the price of blending in. Bill, in an overtaxed clown suit, checked his watch with professional gloom. Parents guarded the grill, not the gift table, which sagged under a mountain of shiny paper and capitalist dreams. Molly sketched the perimeter in her notebook, crayons stolen from the party favors bucket. “Bill, this is morally wrong but financially delightful,” she whispered. “Relax, Molly, kids forget toys; I remember rent and overdue pizza bills,” he replied, already calculating resale values of remote-control dinosaurs. A sugar-crazed child cannoned past, spraying neon frosting. Molly dodged, slipping closer to the pile. She tagged each present with coded stickers: stars for noisy, moons for batteries not included, lightning bolts for probable cash envelopes. The cake was wheeled out, candles blazing. As everyone swarmed the spectacle, Bill hissed the signal. Together they rolled the entire gift table backward, inch by inch, toward the waiting van across the stunned lawn.
created_at
"2026-02-16T13:25:03.081063+00:00"
2/16/2026, 1:25:03 PM
model
"openai/gpt-5.1-20251113"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
36441
moderation_latency
(null)
generation_time
42573
tokens_prompt
77
tokens_completion
580
native_tokens_prompt
66
native_tokens_completion
2477
native_tokens_completion_images
0
native_tokens_reasoning
2205
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.0248525
router
(null)
provider_responses
0
id
"resp_0af531726cb199150169931a83941c8193a76889b2a9cf7851"
status
200
is_byok
false
latency
178
endpoint_id
"764eb97f-8bab-4326-b29b-7a8799b00a70"
provider_name
"OpenAI"
model_permaslug
"openai/gpt-5.1-20251113"
api_type
"completions"
id
"gen-1771248259-YSUOQT0RfbbTnhDYbC7t"
upstream_id
"resp_0af531726cb199150169931a83941c8193a76889b2a9cf7851"
total_cost
0.0248525
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
99.7443%
Matches word count
n/a
neededClean
false
words
204
100.0000%
Dialogue to Total Word Ratio
Ratio: 9.80%, Deviation: 0.20%
neededClean
false
wordsTotal
204
wordsDialogue
20
99.8722%