Tests

This page shows the performance of each test and its scenarios.

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario Best Model Score
Long: The Spire of Echoes (Dense) Claude Opus 4.8 (Reasoning) 99.21%
Medium: The Hollow (Inferred) Claude Opus 4.8 (Reasoning) 99.41%
Medium: Through the Thornveil (Scattered) Gemini 3.5 Flash (Reasoning, Minimal) 99.08%
Short: The Rusty Lantern (Explicit) Z.AI GLM 5.1 99.49%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Best Model Score
Create alternate prose sections Gemini 3.5 Flash (Reasoning) 100.00%

Relationship tree

Extracts a deterministic XML family and relationship tree from cumulative literary prose.

Scenario Best Model Score
Core relationship tree GPT-5.4 (Reasoning) 98.71%
Family relationship tree GPT-5.4 (Reasoning) 92.77%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario Best Model Score
Generic Prompt
Avoid said/asked/replied/answered Z.AI GLM 4.5 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Claude Opus 4.6 (Reasoning) 100.00%
Combined: 3rd person past → 1st person present Z.AI GLM 5 Turbo 99.83%
Expand all contractions Z.AI GLM 5 Turbo 100.00%
Location rename: market square, outer ring, bridge, northern mines Qwen 3.6 35B 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged Claude Sonnet 4 100.00%
Passive voice → active voice Claude Opus 4.8 (Reasoning) 98.46%
POV shift: 3rd person to 1st person (Elena's perspective) Z.AI GLM 5.1 100.00%
Tense rewriting: past to present Claude Sonnet 4.5 99.91%
Specific Prompt
Avoid said/asked/replied/answered Claude Opus 4.6 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Gemini 3 Flash (Preview) 100.00%
Combined: 3rd person past → 1st person present GPT-4.1 100.00%
Expand all contractions Z.AI GLM 5 Turbo 100.00%
Location rename: market square, outer ring, bridge, northern mines GPT-5.4 Nano (Reasoning, Low) 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged GPT-4o, May 13th (temp=0) 100.00%
Passive voice → active voice GPT-5.5 (Reasoning) 99.23%
POV shift: 3rd person to 1st person (Elena's perspective) ByteDance Seed 2.0 Lite 100.00%
Tense rewriting: past to present Writer: Palmyra X5 100.00%