x-ai/grok-4.20

Grok 4.20

Release Date

Mar 31st, 2026

Context Size

2m

Reasoning

No

Benchmark Cost

$2.88

Speed

69.4 tok/s

Categories

20%40%60%80%100%Creative Writing83.4%Tooling94.0%Language78.9%Utility84.1%Reasoning84.8%Text Editing95.6%Rule Following59.7%Hallucination73.0%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
848281808081%
868585838084%
919089858388%
918984848386%
908989878187%
918786848186%
Detailed Writing Rules85.32%
genre
838180787680%
848380767079%
868484838083%
858481817882%
828280777479%
878180797781%
genre80.53%
Novelcrafter Default Prompt
858180766878%
888787848386%
888780797882%
898685807783%
898886867986%
868684797883%
Novelcrafter Default Prompt82.96%
82.94%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
979796969696%
979693939394%
979594939294%
999997948795%
95.04%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9292929191918891%
10099999999999999%
100100100100100100100100%
100100100100100837494%
9494939292919193%
1001001001001001009499%
9696969695959596%
Generic Prompt96.84%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9999999998989899%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898989898989697%
100100100100100100100100%
10010010010010010099100%
Specific Prompt99.57%
98.21%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100090%
90.00%