NC Bench
Overview
About
Tests
Models
Vendors
Compare
Run: 2026-02-21-01KJ15CZ6HZZNP8KZVJJYF6P86
Run Details
Model:
Qwen 3.5 Plus (2026-02-15)
Test:
Codex Red Herring (False Positive Detection)
Scenario:
Short text (~524 words), small codex (11 entries)
Duration:
1m 10s
Cost:
$0.012632
Provider:
Alibaba
Evaluation summary
Score
Evaluator
Details Link
50.00%
Correct "no violations" response
(details)
50.00%
No hallucinated violations
(details)
50.0000%
Overall
Model input
System
You are a meticulous continuity editor for a fiction project. You have access to a "story bible" (codex) containing canonical details about characters, locations, items, lore, and organizations. Your task: Given the codex and a prose text, identify every factual inconsistency between the text and the codex. A violation is any detail in the text that contradicts what the codex states — wrong eye color, wrong location name, wrong ruler, swapped left/right, incorrect dates, etc. Be thorough. Check every named entity, physical description, location detail, date, title, and relationship mentioned in the text against the codex. Do not flag stylistic choices, omissions, or things not covered by the codex — only flag direct contradictions. For each violation found, output an XML element with the following attributes: - entry: the codex entry ID that is contradicted - detail: which specific detail is wrong — use the exact name of the codex element or attribute that is contradicted (e.g. "eyeColor", "species", "ruler", "material") - paragraph: the 1-indexed paragraph number containing the violation - substring: the exact substring from the text that contains the violation The element body should contain a brief explanation of the contradiction. Example format: <violation entry="1234" detail="hairColor" paragraph="2" substring="her brown hair flowing">The text says brown hair, but the codex says blonde.</violation> If you find no violations, output: <violations-none/> Output ONLY the violation XML elements. Do not include any other text, preamble, or explanation outside the XML.
User/Human
<codex> <character id="100" name="Torben Reikhart" aliases="Stray"> <gender>Male</gender> <eyeColor>Green</eyeColor> <hairColor>Silver</hairColor> </character> <character id="101" name="Sable Dunmore"> <species>Human</species> <gender>Female</gender> <age>28</age> <eyeColor>Brown</eyeColor> <hairColor>Auburn</hairColor> </character> <character id="102" name="Old Rivka"> <gender>Female</gender> </character> <character id="103" name="Petal Dunmore"> <gender>Female</gender> </character> <character id="104" name="Jurren Dunmore"> <gender>Male</gender> </character> <location id="105" name="The Rusty Lantern"/> <location id="106" name="Dunmore"> <terrain>Coastal</terrain> </location> <item id="107" name="Kindling"> <material>Dark iron</material> <objectType>Weapon</objectType> <power>Fire</power> </item> <item id="108" name="Petal's Locket"> <material>Silver</material> <objectType>Jewelry</objectType> </item> <lore id="109" name="The Felling"> <timePeriod>Two centuries ago</timePeriod> <category>War</category> </lore> <lore id="110" name="The Kindling Rite"> <category>Tradition</category> </lore> </codex> <text> The Rusty Lantern sat crooked on its foundations, listing toward the harbor like a drunk leaning on a post. Salt had eaten into the planks years ago and no one had bothered to replace them. The ceiling beams were black with smoke. Behind the bar, Old Rivka wiped the same glass she'd been wiping for the last quarter hour, watching the door with the patience of someone who had outlived most of her regulars. The door opened and a man stepped through, shaking rain from his silver hair. Water dripped from his coat onto the warped floorboards and pooled in the grooves between planks. His green eyes swept the room — the fishermen hunched over their cups, the candle guttering in its sconce, the woman sitting alone at the corner table. He smelled of wet wool and the cold air off the harbor. Old Rivka raised her chin. "Stray," she called across the room. "Didn't think you'd show." He didn't answer. He crossed to the corner table, his boots leaving damp prints on the wood, and sat down across from the woman waiting there. Sable Dunmore was twenty-eight years old, human, and tired of waiting. Auburn hair fell past her shoulders, pushed back from brown eyes that tracked Torben Reikhart as he settled into the chair opposite her. On the table between them lay a sword — Kindling, she called it. A dark iron blade, the metal faintly warm even in the chill of the tavern. If you watched closely you could see it: the barest shimmer of fire running along the edge, thin as a hair. At her neck hung a silver locket. Petal's Locket, though she never called it that aloud. "It still has her hair inside," she said, touching it once. "That's all I have left of her." Torben waited. He'd learned when to be quiet. "My sister Petal," Sable said. "She was fourteen when The Felling reached Dunmore. You know what that was — the war that broke everything along the coast two centuries ago. Jurren — our father — he told me to take Petal and run. I did. She didn't make it." Torben didn't ask the obvious question. Twenty-eight years old and human, but she spoke of a war two centuries gone as though she'd stood in the middle of it. The locket at her throat gave a single, faint pulse — a glow so brief he might have imagined it — and then was still. Sable's hand moved to the locket again. "We couldn't give her a Kindling Rite. You know the tradition? You burn the dead with what they loved. Their tools. Their letters. You give it all to the fire so they carry it onward. But there was no body to burn. Nothing left of her but this locket and whatever I can remember." She looked at Torben. "Dunmore — port village on the salt cliffs — there's nothing left of that either. Just rock and old foundations." Torben said, "Then we'd better make it count." Sable picked up Kindling and sheathed it. They left the Rusty Lantern without another word. The rain had stopped. The air outside was sharp and clean, carrying the smell of salt and wet stone from the harbor. Somewhere down the waterfront a shutter banged in the wind. Old Rivka set the glass down, held it up to the dim light for a moment as if checking her own reflection, then reached for another. </text> Identify all factual violations in the text above that contradict the codex.
Model response
Messages
Meta
AI
<violation entry="108" detail="material" paragraph="3" substring="At her neck hung a silver locket.">The text states the locket is silver, but the codex defines Petal's Locket (id="108") as having the material "Silver" while the text implies the locket itself is the object made of silver, which matches. Wait, re-reading. Codex: material="Silver". Text: "silver locket". This is consistent. Let me re-evaluate all items carefully. Re-evaluating Paragraph 3: "A dark iron blade..." matches Codex 107 material="Dark iron". "silver locket" matches Codex 108 material="Silver". "Sable Dunmore was twenty-eight years old, human... Auburn hair... brown eyes" matches Codex 101 exactly. "Torben Reikhart... silver hair... green eyes" matches Codex 100 exactly. Let's check the lore and locations. Paragraph 5: "The Felling... war that broke everything along the coast two centuries ago." Codex 109: timePeriod="Two centuries ago", category="War". Location Dunmore (106) has terrain="Coastal". This seems consistent. Paragraph 5: "Dunmore — port village on the salt cliffs". Codex 106 says terrain="Coastal". "Port village on salt cliffs" is a specific description not in the codex, but not a contradiction unless "Coastal" excludes cliffs or ports. Usually "Coastal" includes these. No direct contradiction found here. Let's look closer at relationships. Paragraph 5: "Jurren — our father". Codex 104 is Jurren Dunmore (Male). Codex 103 is Petal Dunmore (Female). Codex 101 is Sable Dunmore (Female). The text establishes Jurren as the father of Sable and Petal. The codex does not explicitly state the relationship, but the shared surname "Dunmore" and gender/roles do not contradict. Is there a subtle contradiction? "Sable Dunmore was twenty-eight years old... spoke of a war two centuries gone as though she'd stood in the middle of it." The text acknowledges the oddity ("Torben didn't ask the obvious question..."). This is a narrative point, not a factual error in the text's assertion of facts. The text *says* she is 28 and the war was 200 years ago. The codex agrees. The character's behavior is strange, but the facts stated (age, war date) are consistent with the codex. Let's re-read the prompt constraints: "A violation is any detail in the text that contradicts what the codex states". Maybe I missed a detail. Codex 107: Kindling. Material: Dark iron. Power: Fire. Text: "Kindling... dark iron blade... shimmer of fire". Consistent. Codex 108: Petal's Locket. Material: Silver. Text: "silver locket". Consistent. Codex 100: Torben. Eyes: Green. Hair: Silver. Text: "silver hair", "green eyes". Consistent. Codex 101: Sable. Species: Human. Age: 28. Eyes: Brown. Hair: Auburn. Text: "twenty-eight years old, human", "Auburn hair", "brown eyes". Consistent. Codex 105: The Rusty Lantern. Text: "The Rusty Lantern". Consistent. Codex 106: Dunmore. Terrain: Coastal. Text: "port village on the salt cliffs". Consistent (cliffs are coastal). Codex 109: The Felling. Time: Two centuries ago. Category: War. Text: "war that broke everything... two centuries ago". Consistent. Codex 110: The Kindling Rite. Category: Tradition. Text: "You know the tradition?". Consistent. Wait, did I miss something about the Locket? Text: "It still has her hair inside". Codex doesn't mention contents. Not a violation. Text: "Petal's Locket, though she never called it that aloud." Codex name is "Petal's Locket". Not a violation. Is it possible there are NO violations? Let's check the genders again. Torben (100): Male. Text: "man", "He". OK. Sable (101): Female. Text: "woman", "She". OK. Old Rivka (102): Female. Text: "her regulars", "she". OK. Petal (103): Female. Text: "sister", "her". OK. Jurren (104): Male. Text: "father", "he". OK. Let's check the item "Kindling" again. Codex 107: objectType="Weapon". Text: "lay a sword". Sword is a weapon. OK. Let's check the location "The Rusty Lantern". Text: "sat crooked on its foundations, listing toward the harbor". Codex only gives the name. No contradiction. Let's check the Lore "The Felling". Text: "reached Dunmore". Codex doesn't say where it happened, just "Two centuries ago", "War". No contradiction. Is there a contradiction in the timeline logic? Sable is 28. The war was 200 years ago. She says "She [Petal] was fourteen when The Felling reached Dunmore." If the war was 200 years ago, and Petal was 14 then, Petal would be 214 now (if alive). Sable says "She didn't make it." Sable is 28. How can Sable be 28 if her sister died 200 years ago? The text explicitly addresses this: "Torben didn't ask the obvious question. Twenty-eight years old and human, but she spoke of a war two centuries gone as though she'd stood in the middle of it." The text *acknowledges* the discrepancy as a mystery/plot point involving the locket pulsing. However, the prompt asks for "factual inconsistency between the text and the codex". Does the text *assert* a fact that contradicts the codex? The text asserts: Sable is 28. Sable is Human. The Felling was 200 years ago. Sable says Petal died in The Felling. The Codex asserts: Sable is 28. Sable is Human. The Felling was 200 years ago. The text does NOT contradict the codex regarding these specific data points. The text presents a logical impossibility for a normal human, but it presents it as a mystery to be solved (hinted by the locket pulse), not as an established fact that overrides the codex. The codex doesn't say "Sable was born 28 years ago and has no memory of the past". It just lists her stats. Actually, if Sable is human and 28, she could not have witnessed a war 200 years ago. The text says "she spoke of a war two centuries gone as though she'd stood in the middle of it." It describes her *manner* of speaking. It doesn't explicitly state "Sable fought in the war 200 years ago." It leaves it ambiguous/magical. However, look at the dialogue: "She was fourteen when The Felling reached Dunmore... I did [run]. She didn't make it." Sable claims to have been there. If Sable is human and 28, and the event was 200 years ago, her claim to have been there is a factual impossibility based on the codex definitions (Human, 28, War 200 yrs ago). But is this a "violation" of the codex by the text? Or is it the *character* lying or being magical? Usually, in these tasks, if the text states "X is Y" and the codex says "X is Z", it's a violation. Here, the text states Sable is 28 and Human. The text states the War was 200 years ago. The text states Sable claims to have been there. If the narrative implies she *was* actually there (despite being 28/human), then the text contradicts the logical implications of the codex. BUT, the text immediately flags this as strange: "Torben didn't ask the obvious question...". The narrator highlights the inconsistency. This suggests the text is *aware* of the codex facts and is building a mystery around them, rather than accidentally contradicting them. If the text said "Sable, who was 214 years old..." that would contradict the codex (Age 28). If the text said "The Felling happened 10 years ago..." that would contradict the codex. The text seems to adhere strictly to the codex attributes (28, Human, 200 years). The *situation* is paradoxical, but the *stated facts* align. Let's look really closely at the "Petal's Locket" description again. Codex 108: material="Silver". Text: "At her neck hung a silver locket." Text: "Petal's Locket". This matches. Let's look at "Kindling". Codex 107: material="Dark iron", power="Fire". Text: "dark iron blade", "shimmer of fire". Matches. Let's look at "Dunmore". Codex 106: terrain="Coastal". Text: "port village on the salt cliffs". Matches. Is it possible the user considers the "Human" + "28" + "Witnessed war 200 years ago" as a violation? If Sable is human and 28, she cannot have witnessed the war 200 years ago. The text has her saying "I did [run]". This implies she was there. If the text asserts she was there, and the codex asserts she is 28 and human and the war was 200 years ago, then the text contains an internal contradiction based on codex rules. However, often "violations" in these tasks refer to simple attribute mismatches (e.g. saying eyes are blue when they are green). The "mystery" aspect makes this tricky. But let's look for simpler errors first. Did I miss a name? "Jurren Dunmore" -> Codex 104. "Petal Dunmore" -> Codex 103. "Sable Dunmore" -> Codex 101. "Torben Reikhart" -> Codex 100. Alias "Stray". Text uses "Stray". Correct. "Old Rivka" -> Codex 102. Correct. "The Rusty Lantern" -> Codex 105. Correct. What about "The Kindling Rite"? Codex 110: Category "Tradition". Text: "You know the tradition?" Correct. Is there a detail about the "Kindling" sword? Codex: objectType="Weapon". Text: "sword". Correct. Maybe the violation is in the description of the locket's content? Text: "It still has her hair inside". Codex: No mention of contents. Not a violation. Let's reconsider the "Human/Age/War" issue. If the text presents Sable's statement "I did [run]" as a literal truth within the narrative reality (not a lie, not a metaphor), then the text asserts that a 28-year-old human was alive 200 years ago. This contradicts the definition of "Human" and "Age 28" relative to "War 200 years ago". However, the narrator says: "she spoke of a war two centuries gone **as though** she'd stood in the middle of it." "As though" implies she wasn't actually there, or that it seems that way despite the facts. Then the locket pulses. This suggests magic. If the text leaves it ambiguous, is it a violation? Usually, strict continuity checkers flag explicit contradictions. "Sable was 28". Codex: 28. OK. "Sable was human". Codex: Human. OK. "War was 200 years ago". Codex: 200 years ago. OK. The text does NOT explicitly say "Sable was alive 200 years ago." It says she *spoke* as if she was. And she *claims* in dialogue "I did [run]". If a character lies, is it a violation? No. If a character is mistaken, is it a violation? No. Unless the narrator confirms it as fact. The narrator calls it an "obvious question" regarding the impossibility. This implies the narrator knows it shouldn't be possible. So the text is consistent with the codex in establishing the paradox, not violating the facts. Is there ANY other detail? "listing toward the harbor". Codex 106 (Dunmore) is Coastal. Harbor fits. "Salt had eaten into the planks". Coastal fits. "Fishermen". Coastal fits. Wait, check the hair colors again. Torben: Codex "Silver". Text "silver hair". OK. Sable: Codex "Auburn". Text "Auburn hair". OK. Eye colors? Torben: Codex "Green". Text "green eyes". OK. Sable: Codex "Brown". Text "brown eyes". OK. Gender? Torben: Male. Text "man", "He". OK. Sable: Female. Text "woman", "She". OK. Rivka: Female. Text "her", "she". OK. Petal: Female. Text "sister", "her". OK. Jurren: Male. Text "father", "he". OK. Items? Kindling: Dark iron. Text "dark iron". OK. Locket: Silver. Text "silver". OK. Locations? Rusty Lantern. OK. Dunmore. OK. Lore? Felling: 2 centuries ago, War. OK. Kindling Rite: Tradition. OK. Is it possible there are NO violations? Let's check the substring "port village on the salt cliffs". Codex 106: terrain="Coastal". Does "Coastal" contradict "salt cliffs"? No. Does "Coastal" contradict "port village"? No. Let's check the substring "The Rusty Lantern sat crooked on its foundations". Codex 105: Just the name. No contradiction. Let's check "Old Rivka wiped the same glass". Codex 102: Just name and gender. No contradiction. Let's check "Petal's Locket, though she never called it that aloud." Codex 108 name is "Petal's Locket". The text says the *character* doesn't call it that. That's a character quirk, not a fact violation. Let's check "Kindling, she called it." Codex 107 name is "Kindling". She calls it that. OK. Is there a violation in the "Felling" description? Text: "the war that broke everything along the coast two centuries ago". Codex 109: "Two centuries ago", "War". The text adds "broke everything along the coast". Codex 106 (Dunmore) is "Coastal". This seems like elaboration, not contradiction. Let's look at the "Human" species again. Codex 101: species="Human". Text: "human". Is there any indication she is NOT human in the text other than the age anomaly? "Twenty-eight years old and human". The text explicitly re-states she is human. The paradox is the point of the scene. Could the violation be "Stray"? Codex 100: aliases="Stray". Text: "Stray". OK. What about "Jurren"? Codex 104: name="Jurren Dunmore". Text: "Jurren — our father". OK. Is it possible I am missing a very subtle detail? "Petal's Locket" material is Silver. Text: "silver locket". Is it possible the locket is MADE of something else and only looks silver? No, text says "silver locket". Let's reconsider the "Human" + "200 years" thing. If the prompt implies that the text *must* be logically consistent with the codex without magical explanations unless the codex provides them... But the codex doesn't say "Sable is purely normal human with no magical extensions". Usually, "Human" just means species. Vampires/Witches in fiction are often "Human" species with magical traits. The text introduces a magical element (pulsing locket) to explain the discrepancy. Therefore, the text is not contradicting the codex; it is expanding on it. Is there a typo in my reading? Codex 101: Age 28. Text: "twenty-eight". Codex 101: EyeColor Brown. Text: "brown eyes". Codex 101: HairColor Auburn. Text: "Auburn hair". Codex 100: HairColor Silver. Text: "silver hair". Codex 100: EyeColor Green. Text: "green eyes". Codex 107: Material Dark iron. Text: "dark iron". Codex 108: Material Silver. Text: "silver". Codex 109: TimePeriod Two centuries ago. Text: "two centuries ago". Codex 106: Terrain Coastal. Text: "coast", "salt cliffs", "harbor". Everything seems perfectly aligned. The "mystery" of Sable's age vs the war is framed as a mystery *within* the text, not an error *by* the text. The text explicitly notes the impossibility ("Torben didn't ask the obvious question"). If the text had said "Sable, who had lived for two hundred years..." THAT would be a violation of Age=28. But it says she is 28 and acts/claims otherwise, prompting a reaction. Wait! "The Felling reached Dunmore." Codex 109: Name "The Felling". Category "War". Codex 106: Name "Dunmore". Text says "The Felling reached Dunmore". This implies Dunmore existed then. Codex doesn't say when Dunmore was founded. No contradiction. Is it possible the violation is "Petal's Locket" name? Codex: "Petal's Locket". Text: "Petal's Locket". Text: "she never called it that aloud". This is fine. Let's check the item "Kindling" again. Codex: Power "Fire". Text: "shimmer of fire". OK. Is there a violation regarding "Old Rivka"? Codex: Female. Text: "her regulars", "she". OK. Maybe the violation is in the "Kindling Rite" description? Text: "You burn the dead with what they loved." Codex: Category "Tradition". No specific details in codex to contradict. What if "Stray" is not an alias but a nickname the text treats as a separate entity? Codex: aliases="Stray". Text: "Stray," she called... "He didn't answer." It refers to Torben. OK. Let's look at the "Rusty Lantern" location. Codex: ID 105, Name "The Rusty Lantern". Text: "The Rusty Lantern". OK. Is there any chance "Dunmore" terrain "Coastal" contradicts "salt cliffs"? Some definitions of coastal might imply beaches only? No, cliffs are standard coastal terrain. Is it possible the answer is <violations-none/>? I must be extremely careful. Let's check the paragraph numbers for any potential mismatches if I were to find one. Para 1: Rusty Lantern, Rivka. Para 2: Torben enters. Silver hair, green eyes. Rivka calls him Stray. Para 3: Sable description. 28, human, auburn, brown. Kindling (dark iron, fire). Locket (silver). Para 4: Torben waits. Para 5: Dialogue. Petal, 14, Felling, Dunmore, Jurren (father). "war that broke everything along the coast two centuries ago". Para 6: Narrator comment on age/war. Locket pulse. Para 7: Kindling Rite dialogue. Dunmore description ("port village on the salt cliffs"). Para 8: Dialogue. Para 9: Exit. All attributes match. Torben: Male, Green, Silver. (Text: man, green eyes, silver hair). Match. Sable: Female, Human, 28, Brown, Auburn. (Text: woman, human, 28, brown eyes, auburn hair). Match. Kindling: Dark iron, Weapon, Fire. (Text: sword, dark iron, shimmer of fire). Match. Locket: Silver, Jewelry. (Text: silver locket, jewelry implied by "neck", "locket"). Match. Felling: War, 2 centuries. (Text: war, 2 centuries). Match. Dunmore: Coastal. (Text: coast, salt cliffs, harbor). Match. Rite: Tradition. (Text: tradition). Match. The only "weird" thing is the timeline logic, but as analyzed, the text frames it as a mystery/paradox rather than stating a contradictory fact (e.g. it doesn't say "Sable was born 200 years ago"). It says she IS 28 and HUMAN, and she CLAIMS/SPEAKS AS IF she was there. The narrator flags this as weird. This is consistent storytelling, not a continuity error. Therefore, there are no factual violations. <violations-none/>
created_at
"2026-02-21T22:33:52.087Z"
2/21/2026, 10:33:52 PM
model
"qwen/qwen3.5-plus-20260216"
app_id
182717
external_user
(null)
streamed
true
cancelled
false
latency
844
moderation_latency
(null)
generation_time
70087
tokens_prompt
1587
tokens_completion
4532
native_tokens_prompt
1563
native_tokens_completion
5003
native_tokens_completion_images
(null)
native_tokens_reasoning
0
native_tokens_cached
0
num_media_prompt
(null)
num_input_audio_prompt
(null)
num_media_completion
0
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0126324
router
(null)
provider_responses
0
endpoint_id
"71e0f6c3-7815-4734-92d3-8a04f3b0ca91"
id
"chatcmpl-3d1353dc-1c5c-9f1e-afa1-deab97989c1c"
is_byok
false
latency
844
model_permaslug
"qwen/qwen3.5-plus-20260216"
provider_name
"Alibaba"
status
200
api_type
"completions"
id
"gen-1771713232-EJSjF0wR4tsBZAOAlHJN"
upstream_id
"chatcmpl-3d1353dc-1c5c-9f1e-afa1-deab97989c1c"
total_cost
0.0126324
cache_discount
(null)
upstream_inference_cost
0
provider_name
"Alibaba"
Evaluation details
Result
Evaluator
Details
Meta Data
50.00%
Correct "no violations" response
Response should contain <violations-none/> or equivalent phrasing
hasViolationTags
true
hasViolationsNoneTag
true
50.00%
No hallucinated violations
Response must not contain any <violation> tags
hasViolationTags
true
responseLength
17139
violationCount
1
50.0000%