x-ai/grok-4.20-beta

Grok 4.20 (Beta, Reasoning)

Release Date

Mar 12th, 2026

Context Size

2m

Reasoning

Yes

Benchmark Cost

$17.59

Speed

244.9 tok/s

Categories

20%40%60%80%100%Creative Writing84.5%Tooling100.0%Language99.1%Utility95.4%Reasoning82.6%Text Editing98.7%Rule Following75.3%Hallucination96.3%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
838381797781%
898679777681%
888585857684%
888787858386%
868585848485%
858483807982%
Detailed Writing Rules83.18%
genre
777673726973%
827877737276%
868583818083%
868683827683%
827979787478%
868377767479%
genre78.83%
Novelcrafter Default Prompt
898887878487%
908787857885%
949292928691%
929089898890%
909089878187%
938988878087%
Novelcrafter Default Prompt88.07%
83.36%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989797969497%
989797979797%
989797969697%
1009999999899%
97.35%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898989795959496%
100100100100100100100100%
10097969392919194%
Generic Prompt98.87%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9999999998989899%
100100100100100100100100%
10010010010010010099100%
Specific Prompt99.77%
99.32%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%