aion-labs/aion-2.0

Aion 2.0

Release Date

Feb 23rd, 2026

Context Size

131k

Reasoning

Yes

Benchmark Cost

$3.78

Speed

38.6 tok/s

Categories

20%40%60%80%100%Creative Writing80.2%Tooling99.5%Language96.2%Utility90.9%Reasoning94.1%Text Editing95.3%Rule Following63.8%Hallucination93.6%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
Creative WritingHallucination
808077767578%
Creative WritingHallucination
868683837583%
Creative WritingHallucination
868681787682%
Creative WritingHallucination
848482818182%
Creative WritingHallucination
858382787681%
Creative WritingHallucination
888584827884%
Detailed Writing Rules81.43%
genre
Creative WritingHallucination
737272717072%
Creative WritingHallucination
817774747175%
Creative WritingHallucination
817878757477%
Creative WritingHallucination
817974726975%
Creative WritingHallucination
848180797680%
Creative WritingHallucination
838180797580%
genre76.49%
Novelcrafter Default Prompt
Creative WritingHallucination
757373736772%
Creative WritingHallucination
868585847683%
Creative WritingHallucination
808079757578%
Creative WritingHallucination
848180807480%
Creative WritingHallucination
808080797679%
Creative WritingHallucination
888784827984%
Novelcrafter Default Prompt79.29%
79.07%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
ToolingReasoning
989898989697%
ToolingReasoning
999898979497%
ToolingReasoning
989898989497%
ToolingReasoning
10010098989698%
97.46%

Codex Red Herring (False Positive Detection)

Tests whether models correctly report "no violations" when a codex is fully consistent with the prose passage. Models that hallucinate false violations (false positives) fail. Uses a 2×2 matrix of text length × codex size, with bare and detailed-entry variants.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
basic entries
Hallucination
100100100100100100100100100100100%
Hallucination
1001001001001001001001001002593%
Hallucination
1001001001001001001001001002593%
Hallucination
10010010010010010010025252578%
basic entries90.63%
detailed entries
Hallucination
100100100100100100100100100100100%
Hallucination
10010010010025252525252555%
Hallucination
1001001001001001001001001002593%
Hallucination
1001001001001001001001001002593%
detailed entries85.00%
87.81%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
matrix
ToolingReasoning
1009998979696969594387%
ToolingReasoning
9898969696969594939095%
ToolingReasoning
1001001009797979797928896%
ToolingReasoning
10010010010010010010010097090%
matrix92.22%
tiers
ToolingReasoning
100100100100100100100100100100100%
ToolingReasoning
1001001001001001001001001009299%
ToolingReasoning
100100979796949289888894%
ToolingReasoning
100100100100100979797979798%
tiers97.94%
95.08%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
dialogue-200
Rule FollowingUtility
1009172494844200041%
Rule FollowingUtility
7347431000000017%
Rule FollowingUtility
5955515047434239321543%
dialogue-20033.69%
dialogue-500
Rule FollowingUtility
382110000004%
Rule FollowingUtility
50504830105000019%
Rule FollowingUtility
615050483434530028%
dialogue-50017.29%
Ungrouped
Rule Following
100100100100100100100100100100100%
36.13%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
outline-count
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
outline-count100.00%
pov-count
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
pov-count100.00%
100.00%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
9999999999999999%
Text Editing
1001001001001009999100%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text EditingHallucination
9898969696959496%
Text Editing
1001001001001001009699%
Text Editing
10097979392898193%
Generic Prompt98.57%
Specific Prompt
Text Editing
1001001001001001007296%
Text Editing
100100100100100100100100%
Text Editing
10010010010010010099100%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text EditingHallucination
999898989796084%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Specific Prompt97.77%
98.17%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Tooling
100100100100100100100100100100100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Rule Following
10010010010010010010000070%
Rule Following
100100100100100100000060%
1-shot Rule Following
1001001001001000000050%
Few-shot Rule Following
100100100100100100000060%
Rule Following
100100100100100100100100100090%
66.00%