NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-04-24-01KQ0Q1YGYQDV6NPT3AX8GD42A
Run Details
Model:
GPT-5.5 (Reasoning)
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
28.1s
Cost:
$0.034590
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
26.37%
Dialogue to Total Word Ratio
(details)
13.1851%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly spread the crayon-drawn map across the bonnet of their van, pinning the corners with a half-eaten sausage roll and Bill’s left shoe. Across the street, the village hall pulsed with balloons, bunting, and the terrible sugar-scream of thirty children discovering jelly. “The target,” Molly said, tapping a glittery rectangle marked PRESENT MOUNTAIN, “is here.” Bill squinted. “Why is there a dragon?” “That’s not a dragon. That’s the hired magician.” “He’s got wings.” “It’s a cloak, Bill.” “And smoke coming out.” “That’s his vape.” Bill nodded, unconvinced. He wore a black turtleneck, black trousers, and a party hat he had stolen from a toddler earlier as “camouflage.” It sat on his head at a tragic angle. Molly pulled out binoculars. Through the window she could see the prize: a bright heap of parcels stacked beside the stage, wrapped in dinosaurs, unicorns, and aggressive amounts of tape. It was the sort of mound that made a thief’s heart sing and a parent’s wallet whimper. “We wait until pass-the-parcel,” she said. “Maximum confusion.” Bill raised a finger. “Question.” “No.” “What if pass-the-parcel is one of the presents?” Molly lowered the binoculars. “Then we steal it last.” “Clever. Cold, but clever.” A burst of applause erupted inside. The magician had produced a rabbit, or possibly failed to conceal one. Children surged toward him like tiny, sticky wolves. “That’s our distraction,” Molly said. Bill peered at the hall. “I don’t like this job.” “You don’t like any job involving eye contact.” “I don’t like jobs where the victims can’t tie their own shoes. Feels unsporting.” Molly sighed. “They’re children, Bill. They’ll get over it.” “I still remember when someone stole my fire engine at nursery.” “That was you. You traded it for a biscuit.” “It was a very persuasive biscuit.” Molly folded the map with military precision, then unfolded it again because the sausage roll had left grease over the emergency exit. Bill watched a small girl in a fairy dress chase a boy with a balloon sword. The boy went down dramatically beside the cake table. “The thing is,” Bill said, “these presents might contain socks.” “All presents might contain socks. That is the risk we accept.” “Or educational puzzles.” “Bill.” “Or glitter.” Molly paused. Glitter changed things. Even hardened criminals feared glitter. It got into seams, alibis, eyebrows. She imagined opening one parcel and releasing a pink blizzard that would cling to them until retirement. “We’ll use the sacks,” she said. “The Santa sacks?” “Yes.” Bill looked wounded. “That feels seasonally confusing. It’s April.” “Crime doesn’t follow a calendar.” Inside, the music began: a savage children’s version of “The Wheels on the Bus.” Molly slipped her gloves on. Bill adjusted his party hat and tried to look menacing, but the elastic snapped under his chin and made him yelp. Molly stared at him. “I’m fine,” he said. “The hat attacked first.” Then the hall doors burst open. A woman in a cardigan stepped out, carrying a clipboard and the exhausted authority of someone who had iced forty cupcakes at midnight. “You two!” she called. Molly froze. Bill waved. The woman marched over. “Are you the entertainers?” Molly blinked. “In a manner of speaking.” “Excellent. You’re late. The children are expecting pirates.” Bill leaned toward Molly. “We did say we wanted to steal all the presents.” Molly smiled slowly. “And pirates are famous for treasure.” Five minutes later, Molly and Bill stood inside the hall wearing eye patches made from napkins, surrounded by cheering children. The present mountain gleamed nearby, close enough to touch and guarded by a grandmother with the suspicious gaze of a retired prison warden. Molly drew herself up. “Right, Bill. New plan.” Bill whispered, “Less burglary?” “More improvisation.” A child handed Bill a foam sword. He accepted it solemnly. “Do we still get the presents?” he asked. Molly looked at the children, the cake, the grandmother, and the magician’s rabbit chewing her shoelace. “We’ll steal the show instead,” she said. Bill grinned. “Finally. A victimless crime.”
created_at
"2026-04-24T21:41:40.005Z"
4/24/2026, 9:41:40 PM
model
"openai/gpt-5.5-20260423"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
5870
moderation_latency
(null)
generation_time
27975
tokens_prompt
77
tokens_completion
1217
native_tokens_prompt
66
native_tokens_completion
1142
native_tokens_completion_images
0
native_tokens_reasoning
213
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
num_fetches
(null)
web_search_engine
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.03459
router
(null)
provider_responses
0
endpoint_id
"58e5b336-423e-430b-a2ab-8bc353f0c51b"
id
"resp_0bc350fa057a42fa0169ebe3941e448197bc0e0834bd4b17a7"
is_byok
false
latency
270
model_permaslug
"openai/gpt-5.5-20260423"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.1; linux; x64))"
http_referer
(null)
request_id
"req-1777066900-rxi3U5mXBAlDJzGnUQof"
session_id
(null)
api_type
"completions"
id
"gen-1777066900-XD0gQXuBzL3wpgr7QTzn"
upstream_id
"resp_0bc350fa057a42fa0169ebe3941e448197bc0e0834bd4b17a7"
total_cost
0.03459
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
response_cache_source_id
(null)
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
667
26.37%
Dialogue to Total Word Ratio
Ratio: 35.16%, Deviation: 5.16%
neededClean
false
wordsTotal
674
wordsDialogue
237
13.1851%