NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0PWEXE9TYK9CDQK6HE23Y7
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
28.2s
Cost:
$0.034860
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.04%
Dialogue to Total Word Ratio
(details)
0.0198%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn floor plan across the bonnet of the van. It had been sketched on the back of a pizza menu, and the ballroom of the community centre was represented by a wobbly rectangle labelled “CAKE HERE!!!” Bill leaned over it, squinting. “Why are there seventeen exclamation marks after cake?” “Because it’s a children’s party, Bill. Cake is basically a security system.” He nodded gravely. “Right. Sugar-based surveillance.” Beyond the hedge, bunting flapped in the afternoon breeze. Balloons bobbed at the entrance. Somewhere inside, thirty children shrieked with the pure joy of being given paper hats and unlimited orange squash. Molly tapped the map. “The presents are stacked beside the puppet theatre. We go in during Pass the Parcel, when everyone’s distracted.” Bill adjusted his false moustache. It was pink, curly, and clearly made of pipe cleaners. “Remind me why I’m dressed as Professor Pickle, professional balloon dentist?” “Because the invitation said there’d be entertainers.” “It said there’d be one entertainer. That clown with the tiny bicycle.” “Exactly,” Molly said. “Nobody questions extra clowns. They multiply naturally.” Bill looked down at his enormous yellow shoes. “I don’t feel naturally multiplied. I feel like a cautionary tale.” A burst of applause came from inside, followed by a honk and what sounded like a grown man falling into a drum kit. “That’s our window,” Molly said. “Our window is full of toddlers.” “Our opportunity, Bill.” He swallowed. “And after we take the presents?” “We load them into the van, drive to the warehouse, and sort the loot.” Bill scratched his chin. “What if the loot is mostly socks, plastic dinosaurs, and picture books about sharing?” “Then we fence the dinosaurs and learn a valuable lesson from the books.” They crept along the hedge, though Molly’s striped trousers squeaked and Bill’s shoes made small rubbery sighs with every step. At the side door, Molly paused, listening. Inside, a woman called, “Who wants another round of musical statues?” The reply was a roar so mighty that two pigeons abandoned the roof. Bill whispered, “I’ve robbed a bank vault with less noise.” “You tripped over a mop bucket and apologised to a ficus.” “It looked disappointed.” Molly pushed the door open and they slipped into a corridor smelling of icing, crisps, and panic. At the far end, the party blazed with colour. Children froze mid-dance as music cut out, then toppled in giggling heaps. “There,” Molly murmured. The presents glittered in a mountain of wrapping paper, bows, and suspiciously large teddy bears. Bill’s eyes widened. “That’s not a pile. That’s a festive avalanche.” Molly grinned. “We take the whole lot.” Just then, a little girl in a unicorn dress appeared beside them, holding a paper plate. She stared at Molly. Then at Bill. “Are you the magicians?” Bill opened his mouth. Molly elbowed him. “Yes.” The girl handed Bill a sausage roll. “Mummy says you’re late.” Bill held the sausage roll like evidence. “Madam, we are professionals.” “Do a trick.” Molly smiled tightly. “We’re saving our best trick for the presents.” The girl gasped. “You’re going to make them disappear?” Bill looked at Molly. Molly looked at the presents. The children began chanting, “Magic! Magic! Magic!” Molly’s shoulders sank. Bill whispered, “If we disappoint them, they’ll riot.” Molly snatched a wand from a nearby costume box. “Fine. One trick.” Five minutes later, Bill stood inside a cardboard spaceship while Molly dramatically covered it with a tablecloth. The children screamed with delight when he emerged wearing three party hats and holding the same sausage roll. By the time the parents applauded, the real clown had gone home, the birthday boy had hugged Molly’s knees, and Bill had been paid in cake. Outside, Molly climbed into the van empty-handed. Bill licked icing from his thumb. “So. Warehouse?” Molly glared at the community centre, where children waved through the window. “No,” she said. “Toy shop.” Bill blinked. “Why?” Molly started the engine. “Because now we owe them an encore.”
created_at
"2026-04-24T21:38:40.19Z"
4/24/2026, 9:38:40 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
7531
moderation_latency
(null)
generation_time
28078
tokens_prompt
77
tokens_completion
1300
native_tokens_prompt
66
native_tokens_completion
1151
native_tokens_completion_images
0
native_tokens_reasoning
229
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03486
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0578dd4516313ba30169ebe2e047508197a6a6343fffcdc770"
is_byok
false
latency
278
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066720-XOtBjYn29r1qSm1ZsynA"
session_id
(null)
api_type
"completions"
id
"gen-1777066720-rThsk9svtdhnhqpMdTon"
upstream_id
"resp_0578dd4516313ba30169ebe2e047508197a6a6343fffcdc770"
total_cost
0.03486
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
663
0.04%
Dialogue to Total Word Ratio
Ratio: 37.48%, Deviation: 12.52%
neededClean
false
wordsTotal
667
wordsDialogue
250
0.0198%