Tests

This page shows the performance of each test and its scenarios.

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario Best Model Score
Long: The Spire of Echoes (Dense) Gemini 3 Pro (Preview) 98.80%
Medium: The Hollow (Inferred) Claude Opus 4.6 99.14%
Medium: Through the Thornveil (Scattered) Claude Opus 4.6 (Reasoning) 98.74%
Short: The Rusty Lantern (Explicit) Qwen 3.5 Plus (2026-02-15) 99.48%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Best Model Score
Create alternate prose sections Claude 3.5 Sonnet 100.00%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario Best Model Score
Generic Prompt
Avoid said/asked/replied/answered Z.AI GLM 5 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric DeepSeek-V2 Chat 100.00%
Combined: 3rd person past → 1st person present Claude Sonnet 4 99.42%
Expand all contractions GPT-5.1 100.00%
Location rename: market square, outer ring, bridge, northern mines Gemini 2.5 Flash (Reasoning) 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged GPT-4o, Aug. 6th (temp=0) 100.00%
Passive voice → active voice Claude Opus 4.6 97.80%
POV shift: 3rd person to 1st person (Elena's perspective) Arcee AI: Trinity Large (Preview) 100.00%
Tense rewriting: past to present Claude Sonnet 4.5 99.91%
Specific Prompt
Avoid said/asked/replied/answered Claude 3.5 Sonnet 100.00%
Character rename: Elena->Mirabel, Gregor->Aldric Gemma 3 4B 100.00%
Combined: 3rd person past → 1st person present Claude Opus 4 100.00%
Expand all contractions Gemini 2.5 Flash 100.00%
Location rename: market square, outer ring, bridge, northern mines ByteDance Seed 2.0 Mini 100.00%
Multi-character gender swap: Priya(F)->Rohan(M), Mara unchanged GPT-5.4 (Reasoning) 100.00%
Passive voice → active voice Gemini 3.1 Pro (Preview) 99.23%
POV shift: 3rd person to 1st person (Elena's perspective) Z.AI GLM 5 100.00%
Tense rewriting: past to present GPT-5 100.00%