NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-03-06-01KK1J3X0SW9CXR7A8DPA4S3ET
Run Details
Model:
GPT-5.4
Test:
Dialogue tags
Scenario:
Write 500 words with 50% dialogue
Duration:
25.1s
Cost:
$0.014055
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.02%
Dialogue to Total Word Ratio
(details)
0.0107%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 50% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
Molly unfolded the neighborhood map across the sticky café table and pinned one corner with a sugar jar. Rain pecked at the window. Bill leaned in, hat dripping, studying the route to Number Twelve, where balloons bobbed in the garden like sentries with cheerful faces. “Children’s parties are chaos with cake,” Molly said. “That’s our natural habitat.” Bill sniffed. “I still say robbing a bank has more dignity.” “Dignity doesn’t come with goody bags, Bill.” “I’m not interested in the goody bags.” “That’s because you lack imagination. Miniature bubbles. Temporary tattoos. Sometimes a whistle.” “I despise whistles. Whistles are how civilization ended.” Molly tapped the map. “Listen. Two o’clock, magician arrives. Two-fifteen, parents stop supervising because a man in a purple cape has produced a rabbit from somewhere unsanitary. Two-twenty, we slip through the side gate dressed as entertainers.” Bill looked at the scribbled notes. “You’ve written clown? with a question mark.” “That’s a brainstorm, not a commitment.” “I won’t wear oversized shoes again. You remember Plymouth.” “The shoes weren’t the problem. Your balloon giraffe insulted a mayor.” “It looked at me with contempt.” Outside, thunder muttered. Inside, Molly produced a guest list she’d acquired with suspicious ease. She ran a finger down the names: Ava, Noah, Poppy, Leo. Beside each was a present estimate, from “book-shaped” to “definitely expensive.” “Target room,” she said, “the dining room. Gifts stacked by the piano. We enter with a speaker, claim to be the musical portion, and load the loot into instrument cases.” Bill brightened. “I do own a trumpet case.” “You own half a trumpet in a pillowcase.” “It suggests culture.” “It suggests tetanus.” He sat back, folding his arms. “What about the dog?” “Spaniel,” Molly said. “Bribable.” “With sausage?” “With standards, Bill. Cumberland, not the cheap stuff.” He nodded solemnly. “A professional courtesy.” The waitress passed, giving them the look reserved for couples either in love or conspiracy. Molly smiled blandly until she moved on. “Escape route,” Bill said. “Front street will be clogged with tiny scooters.” “Exactly why we use the alley. Garden fence is low, hedge is dense, and Mrs. Waverley next door is hard of hearing.” “The one who reports suspicious behavior?” “She reported a cloud once.” Bill considered this. “Promising.” Molly folded the map smaller, her eyes bright with the particular excitement of bad ideas polished to a shine. “There is one complication.” “There’s always one complication.” “The birthday child is turning six.” Bill waited. “And?” “Six-year-olds are canny. They still believe in magic, but they also ask direct questions. If one corners you and asks who you are, don’t improvise.” “I improvise beautifully.” “You told a security guard you were the Duke of Plumbing.” “He saluted me.” “He pitied you.” Bill sighed and drummed his fingers. “Fine. Who am I?” “You’re Mr. Bongo.” “No.” “You need something plausible.” “Nothing about me says bongo.” “Everything about you says bongo.” He glanced at his reflection in the spoon and scowled. “Then who are you?” “Madame Melody.” “That sounds made up.” “All names are made up.” Rainwater slid down the window in nervous streaks. Molly tucked the guest list into her sleeve. Bill reached into his coat and set a paper crown on the table, slightly crushed, with dinosaurs around the rim. “What’s this?” she asked. “A contingency disguise. I found it in the gutter.” Molly stared at him, then laughed despite herself. “Bill, that’s either genius or hepatitis.” “Usually my best work lives in that region.” She set the crown on her head and tilted it rakishly. “Right. Mr. Bongo, fetch the sausage, polish the trumpet fragment, and practice looking nonthreatening.” “I have one expression.” “Yes,” Molly said, rising with the map. “That’s why I’ll do the talking while you carry the presents and try not to terrify the balloons.” Bill stood, adjusting his dripping hat. “And if the magician spots us?” Molly grinned. “Then we nick the rabbit too.”
created_at
"2026-03-06T12:31:48.257Z"
3/6/2026, 12:31:48 PM
model
"openai/gpt-5.4-20260305"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
357
moderation_latency
(null)
generation_time
24993
tokens_prompt
77
tokens_completion
1226
native_tokens_prompt
66
native_tokens_completion
926
native_tokens_completion_images
0
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"completed"
usage
0.014055
router
(null)
provider_responses
0
endpoint_id
"9ff5625c-403f-4d7f-b895-58ac7295062c"
id
"resp_0b880d0063faf6650169aac93457448194bcb71b9050c75318"
is_byok
false
latency
156
model_permaslug
"openai/gpt-5.4-20260305"
provider_name
"OpenAI"
status
200
user_agent
"langchainjs-openai/1.0.0 ((node/v24.14.0; linux; x64))"
http_referer
(null)
api_type
"completions"
id
"gen-1772800308-coFgmAo73DAn332GG3CF"
upstream_id
"resp_0b880d0063faf6650169aac93457448194bcb71b9050c75318"
total_cost
0.014055
cache_discount
(null)
upstream_inference_cost
0
provider_name
"OpenAI"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
652
0.02%
Dialogue to Total Word Ratio
Ratio: 63.00%, Deviation: 13.00%
neededClean
false
wordsTotal
654
wordsDialogue
412
0.0107%