x-ai/grok-4.20-beta

Grok 4.20 (Beta)

Release Date

Mar 12th, 2026

Context Size

2m

Reasoning

No

Benchmark Cost

$3.80

Speed

179.0 tok/s

Categories

20%40%60%80%100%Creative Writing82.8%Tooling100.0%Language91.2%Utility82.1%Reasoning87.0%Text Editing95.5%Rule Following53.9%Hallucination78.3%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
858478787580%
878685858586%
949089878188%
908985848386%
908887868487%
878483818183%
Detailed Writing Rules85.17%
genre
807978767478%
838280787880%
888482828183%
887878757278%
838277757378%
807978777377%
genre79.19%
Novelcrafter Default Prompt
767573727073%
918785838286%
898887857585%
908888848387%
868483797982%
878686817984%
Novelcrafter Default Prompt82.76%
82.37%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
969696959596%
969492929193%
979695959395%
999898969597%
95.16%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9992929292919193%
1001001009999996695%
1001001001001001009699%
1001001008080747487%
9995959594949295%
100100100100100949498%
9696959595958694%
Generic Prompt95.70%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9999999998968196%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898989898989798%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.31%
97.50%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%