Tests

This page shows the performance of each test and its scenarios.

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario Best Model Score
Long: The Spire of Echoes (Dense) Gemini 3 Pro (Preview) 98.80%
Medium: The Hollow (Inferred) Claude Opus 4.6 99.14%
Medium: Through the Thornveil (Scattered) Claude Opus 4.6 (Reasoning) 98.74%
Short: The Rusty Lantern (Explicit) Qwen 3.5 Plus (2026-02-15) 99.48%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Best Model Score
Create alternate prose sections ByteDance Seed 2.0 Lite 100.00%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario Best Model Score
Generic Prompt
Avoid said/asked/replied/answered Mistral Medium 3.1 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Z.AI GLM 4.5 100.00%
Combined: 3rd person past → 1st person present Z.AI GLM 5 Turbo 99.83%
Expand all contractions Gemini 3.1 Pro (Preview) 100.00%
Location rename: market square, outer ring, bridge, northern mines Claude Opus 4 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged Z.AI GLM 5 100.00%
Passive voice → active voice Claude Opus 4.6 97.80%
POV shift: 3rd person to 1st person (Elena's perspective) Claude Sonnet 4.6 100.00%
Tense rewriting: past to present Claude Sonnet 4.5 99.91%
Specific Prompt
Avoid said/asked/replied/answered GPT-4o, May 13th (temp=0) 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Grok 4 100.00%
Combined: 3rd person past → 1st person present Claude Opus 4 100.00%
Expand all contractions Gemini 2.5 Flash (Reasoning) 100.00%
Location rename: market square, outer ring, bridge, northern mines GPT-4.1 Mini 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged Qwen 3.5 Flash 100.00%
Passive voice → active voice Gemini 3.1 Pro (Preview) 99.23%
POV shift: 3rd person to 1st person (Elena's perspective) GPT-4o, Aug. 6th (temp=0) 100.00%
Tense rewriting: past to present Llama 3.1 70B 100.00%