NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-05-22-01JVWKZES47SEE3F7PDF144EKS
Run Details
Model:
Claude Opus 4
Test:
Dialogue tags
Scenario:
Write 200 words with 90% dialogue
Duration:
18.9s
Cost:
$0.025575
Evaluation summary
Score
Evaluator
Details Link
6.09%
Matches word count
(details)
36.79%
Dialogue to Total Word Ratio
(details)
21.4393%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 200 word scene that has 90% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
"Bill, you can't be serious." "Dead serious, Molly. Tommy Fitzgerald's eighth birthday party. Saturday. Two o'clock." "We're stealing from children now?" "Not from children. From their parents. Big difference." "How exactly?" "Listen, these Riverside Heights kids get iPads, gaming consoles, smartwatches—" "They're eight!" "Exactly my point. Little Timmy doesn't need a PlayStation 5. But we do. Well, we need the money from selling it." "This is insane." "Is it? Is it really? We dress as clowns—" "Absolutely not." "Fine, balloon artists. We go in, entertain for twenty minutes, grab the gift table, out." "What about the children, Bill? The trauma?" "What trauma? They'll cry for five minutes then go back to their bouncy castle. Meanwhile, we make rent." "I hate this plan." "You got a better one?" "Literally any other plan would be better." "The Murphy wedding?" "Too much security." "The country club?" "Cameras everywhere." "So we're agreed then? Operation Birthday Bandit?" "God help me. Fine. But I'm not wearing the clown wig." "Deal. You can be the birthday magician instead." "I quit." "See you Saturday, partner."
total_cost
0.025575
Evaluation details
Result
Evaluator
Details
Meta Data
6.09%
Matches word count
n/a
neededClean
false
words
177
36.79%
Dialogue to Total Word Ratio
Ratio: 100.00%, Deviation: 10.00%
neededClean
false
wordsTotal
177
wordsDialogue
177
21.4393%