NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2026-02-25-01KJBJVF1ZH71W97J263X8QJC2
Run Details
Model:
Qwen 3.5 Flash
Test:
Dialogue tags
Scenario:
Write 500 words with 30% dialogue
Duration:
1m 23s
Cost:
$0.004561
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
0.00%
Matches word count
(details)
0.00%
Dialogue to Total Word Ratio
(details)
0.0000%
Overall
Model input
User/Human
You are a author of comedic heist books. Please write a 500 word scene that has 30% words of dialogue between two thieves named Molly and Bill who are planning to break into a children's party to steal all the presents. Only include the prose of the scene without any markdown or commentary.
Model response
Messages
Meta
AI
The cold rain pounded against the roof of the parked van while the neighborhood remained dark around them as the heavy fog hung low over the street. Molly tightened the gloves on her hands for the last time while Bill the heavy plastic bags near the door. The front door remained closed, protected by the thick fog that hung low over the street outside the house. She checked the watch on her wrist again while he adjusted the gloves on both hands. "It is exactly eight o'clock right now," Bill whispered, checking the camera. Molly paused. "They're not leaving yet. They're opening appetizers. That buys us twenty minutes of chaos before nap time begins." Bill wiped sweat. "Twenty minutes. For twenty kids. I thought we were stealing diamonds. High value items." "Diamonds don't squeak when you sit on them. Children's toys are high value. Legos alone sell for a fortune on the black market," Molly replied. "We're thieves. We don't touch toys," Bill said. "You rob banks. I rob supermarkets. This is retail theft," Molly said. "Now hush. I need to hack the garage signal." "Because the toys are inside. Parents are too busy taking selfies to notice the vanishing act," Molly replied. "Stealth mode. We are ghosts," Bill nodded. "Not stealth. Panic mode. When the door opens, we flood in. Run. Simple," Molly corrected. "Do we cover the camera?" Bill asked. "I looped the feed. The system thinks you're watching. You aren't," Molly smiled. "Wait, do I have the keycard?" Bill asked. "No one has the key." "Then how do we open the door?" "I picked the lock already," Molly sighed. "Oh." "Focus," she said. "Right," Bill breathed. The silence stretched between them like a rubber band ready to snap. Suddenly, the front door slammed open. A party horn squeaked in the distance. "Now," Molly stood. "Wait, are we stealing everything?" Bill panicked. "Only gifts. Strictly gifts," Molly checked her clipboard. "Right. Gifts," Bill said. "Don't drop the paper. It makes noise," Molly warned. "I won't drop anything." "We leave no tracks. No evidence." "Okay," Bill whispered. "We go in, we take, we bounce," Molly pointed. "Okay," he replied. "Wait, who is Bill?" "That's you," she laughed. They sprinted toward the house as the window opened below them in the dark.
created_at
"2026-02-25T23:41:22.885Z"
2/25/2026, 11:41:22 PM
model
"qwen/qwen3.5-flash-20260224"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
490
moderation_latency
(null)
generation_time
82987
tokens_prompt
77
tokens_completion
11217
native_tokens_prompt
76
native_tokens_completion
11383
native_tokens_completion_images
(null)
native_tokens_reasoning
10828
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0045608
router
(null)
provider_responses
0
endpoint_id
"5f6b71b5-8672-4b27-8b2d-f674429f9e63"
id
"chatcmpl-c57b505d-27c8-94cf-a922-1a89062bb0b4"
is_byok
false
latency
490
model_permaslug
"qwen/qwen3.5-flash-20260224"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1772062882-ZrDfJH2pZChymxnY1UTs"
upstream_id
"chatcmpl-c57b505d-27c8-94cf-a922-1a89062bb0b4"
total_cost
0.0045608
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
0.00%
Matches word count
n/a
neededClean
false
words
382
0.00%
Dialogue to Total Word Ratio
Ratio: 54.19%, Deviation: 24.19%
neededClean
false
wordsTotal
382
wordsDialogue
207
0.0000%