Tests

This page shows the performance of each test and its scenarios.

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario Best Model Score
Long: The Spire of Echoes (Dense) Claude Opus 4.8 (Reasoning) 99.21%
Medium: The Hollow (Inferred) Claude Opus 4.8 (Reasoning) 99.41%
Medium: Through the Thornveil (Scattered) Gemini 3.5 Flash (Reasoning, Minimal) 99.08%
Short: The Rusty Lantern (Explicit) Z.AI GLM 5.1 99.49%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Best Model Score
Create alternate prose sections GPT-5 Nano 100.00%

Relationship tree

Extracts a deterministic XML family and relationship tree from cumulative literary prose.

Scenario Best Model Score
Core relationship tree GPT-5.4 (Reasoning) 98.71%
Family relationship tree GPT-5.4 (Reasoning) 92.77%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario Best Model Score
Generic Prompt
Avoid said/asked/replied/answered Mistral Medium 3.1 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Mistral Large 3 100.00%
Combined: 3rd person past → 1st person present Z.AI GLM 5 Turbo 99.83%
Expand all contractions Z.AI GLM 4.5 Air 100.00%
Location rename: market square, outer ring, bridge, northern mines Mistral Small 3.2 24B 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged Claude 3.7 Sonnet 100.00%
Passive voice → active voice Claude Opus 4.8 (Reasoning) 98.46%
POV shift: 3rd person to 1st person (Elena's perspective) MiniMax M3 100.00%
Tense rewriting: past to present Claude Sonnet 4.5 99.91%
Specific Prompt
Avoid said/asked/replied/answered GPT-5.4 (Reasoning) 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Gemma 4 26B 100.00%
Combined: 3rd person past → 1st person present Gemini 3.5 Flash (Reasoning) 100.00%
Expand all contractions Claude 3.7 Sonnet 100.00%
Location rename: market square, outer ring, bridge, northern mines GPT-5.5 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged Claude Sonnet 4.6 (Reasoning) 100.00%
Passive voice → active voice Gemini 3.1 Pro (Preview) 99.23%
POV shift: 3rd person to 1st person (Elena's perspective) GPT-5.4 Mini (Reasoning) 100.00%
Tense rewriting: past to present Mistral Large 2 100.00%