x-ai/grok-4

Grok 4

Release Date

Jul 9th, 2025

Context Size

256k

Reasoning

Yes

Benchmark Cost

$25.47

Speed

38.0 tok/s

Categories

20%40%60%80%100%Creative Writing77.3%Tooling100.0%Language90.6%Utility89.7%Reasoning96.0%Text Editing98.8%Rule Following63.1%Hallucination89.5%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
787473737174%
807875747276%
848079787880%
817979777678%
838180807379%
807979777578%
Detailed Writing Rules77.46%
genre
717070676569%
777773737274%
828281817881%
807978787779%
838276757378%
907775756977%
genre76.23%
Novelcrafter Default Prompt
787673737174%
837676767377%
818179777378%
848281767279%
818078787578%
817676747176%
Novelcrafter Default Prompt77.01%
76.90%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
989898989697%
999998989798%
999999989899%
100100100999999%
98.38%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999799%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9898989797949296%
100100100100100100100100%
10010010099999999100%
Generic Prompt99.41%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
10010010010010010099100%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9999999998989899%
100100100100100100100100%
10099999999999999%
Specific Prompt99.77%
99.59%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%