x-ai/grok-4

Grok 4

Release Date

Jul 9th, 2025

Context Size

256k

Benchmark Cost

$15.86

Speed

35.0 tok/s

Creative writing

69.01%

Rule following

79.98%

Utility

89.92%

Mathematics

100.00%

Tooling

95.83%

Language

91.23%

Logic

92.34%

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
0-shot Creative writingRule following
787473737174%
0-shot Creative writingRule following
807875747276%
0-shot Creative writingRule following
848079787880%
0-shot Creative writingRule following
817979777678%
0-shot Creative writingRule following
838180807379%
0-shot Creative writingRule following
807979777578%
Detailed Writing Rules77.47%
genre
0-shot Creative writingRule following
717070676569%
0-shot Creative writingRule following
777773737274%
0-shot Creative writingRule following
828281817881%
0-shot Creative writingRule following
807978787779%
0-shot Creative writingRule following
838276757378%
0-shot Creative writingRule following
907775756977%
genre76.24%
Novelcrafter Default Prompt
0-shot Creative writingRule following
787673737174%
0-shot Creative writingRule following
837676767377%
0-shot Creative writingRule following
818179777378%
0-shot Creative writingRule following
848281767279%
0-shot Creative writingRule following
818078787578%
0-shot Creative writingRule following
817776747176%
Novelcrafter Default Prompt77.02%
76.91%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
matrix
0-shot ToolingUtilityLogicRule following
10099999999989797969598%
0-shot ToolingUtilityLogicRule following
10010010010098989898989899%
0-shot ToolingUtilityLogicRule following
10097979795929279797590%
0-shot ToolingUtilityLogicRule following
100100979797979797979797%
matrix96.10%
tiers
0-shot ToolingUtilityLogicRule following
10010010010010010010091919197%
0-shot ToolingUtilityLogicRule following
1001001001001001001001001009299%
0-shot ToolingUtilityLogicRule following
100100979797979797948996%
0-shot ToolingUtilityLogicRule following
100100100100100100100100100100100%
tiers98.27%
97.18%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
dialogue-200
0-shot Creative writingRule following
505050505050494943044%
0-shot Creative writingRule following
50505050493026183033%
0-shot Creative writingRule following
10099929084746754514876%
dialogue-20050.87%
dialogue-500
0-shot Creative writingRule following
48414138343030222028%
0-shot Creative writingRule following
504743302222210022%
0-shot Creative writingRule following
725654474629282321538%
dialogue-50029.43%
Ungrouped
0-shot Creative writingRule following
100100100100100100100100100100100%
48.70%

Language Comprehension

Does the model understand more than just English?

Scenario #1 #2 #3 #4 #5 Total
0-shot Language
1001001000060%
0-shot Language
100100100100080%
0-shot Language
100100100100100100%
0-shot Language
100100100100100100%
85.00%

Language Writing

Can the model generate text in different languages?

Scenario #1 #2 #3 #4 #5 Total
0-shot Language
100100100100100100%
0-shot Language
100100100100100100%
0-shot Language
100100100565081%
0-shot Language
100100100100100100%
0-shot Language
100100100100100100%
96.22%

N-Length Sentences

Write sentences with exactly N words

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
0-shot Rule following
10010010010010010010010010098100%
0-shot Rule following
10095848484848277745682%
0-shot Rule following
8078767269686464553966%
82.78%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
outline-count
0-shot ToolingUtility
100100100100100100100100100100100%
0-shot ToolingUtility
100100100100100100100100100100100%
0-shot ToolingUtility
100100100100100100100100100100100%
outline-count100.00%
pov-count
0-shot ToolingUtility
100100100100505050500060%
0-shot ToolingUtility
100100100100100100100100100100100%
0-shot ToolingUtility
100100100100100100100100100100100%
pov-count86.67%
93.33%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
0-shot ToolingUtility
100100100100100100100100100100100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
0-shot Utility
100100100100100100000060%
0-shot Utility
10000000000010%
1-shot Utility
1001001001001001001001000080%
Few-shot Utility
100100100100100100100100100090%
0-shot Utility
100100100100100100100100100100100%
68.00%

Write N of X

Write exactly N words/sentences/paragraphs...

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
paragraphs
0-shot Rule following
100100100100100100100100100100100%
0-shot Rule following
100100100100100100100100100100100%
0-shot Rule following
100100100100100100100100100100100%
paragraphs100.00%
sentences
0-shot Rule following
100100100100100100100100100100100%
0-shot Rule following
100100100100100100100100100100100%
0-shot Rule following
100100100100100100100100100100100%
0-shot Rule following
1001001001001001001001000080%
0-shot Rule following
100100100100100100000060%
sentences88.00%
words
0-shot Rule following
100100100100100100100100100100100%
0-shot Rule following
1001001001001001001001001009299%
0-shot Rule following
10010010010010010010098929298%
0-shot Rule following
1009892929292542727968%
0-shot Rule following
1001009892925454272062%
words85.54%
89.83%