x-ai/grok-4.20

Grok 4.20 (Reasoning)

Release Date

Mar 31st, 2026

Context Size

2m

Reasoning

Yes

Benchmark Cost

$9.06

Speed

93.4 tok/s

Categories

20%40%60%80%100%Creative Writing86.2%Tooling100.0%Language96.6%Utility92.6%Reasoning79.1%Text Editing98.8%Rule Following82.0%Hallucination95.7%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
888684848084%
928683828285%
908888888788%
928787848086%
918989858287%
898988838286%
Detailed Writing Rules86.19%
genre
868379787480%
828180797780%
888685838085%
888681797682%
858382787681%
848378696375%
genre80.48%
Novelcrafter Default Prompt
898988858587%
898685848486%
969695949094%
989292908992%
908787858386%
919190898990%
Novelcrafter Default Prompt89.32%
85.33%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989797979697%
999898989798%
989898989698%
10010098989899%
97.83%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898989897969597%
100100100100100100100100%
100100999696929297%
Generic Prompt99.18%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
1001009999999999100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9999999898989899%
100100100100100100100100%
100100100100100999699%
Specific Prompt99.73%
99.46%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%