google/gemini-2.5-flash-lite

Gemini 2.5 Flash Lite (Reasoning)

Release Date

Jul 22nd, 2025

Context Size

1m

Reasoning

Yes

Benchmark Cost

$1.28

Speed

243.6 tok/s

Categories

20%40%60%80%100%Creative Writing71.6%Tooling99.5%Language74.4%Utility89.6%Reasoning93.9%Text Editing94.5%Rule Following66.8%Hallucination95.6%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
Creative WritingHallucination
727069666468%
Creative WritingHallucination
777672716973%
Creative WritingHallucination
767675736874%
Creative WritingHallucination
727168666568%
Creative WritingHallucination
787169646269%
Creative WritingHallucination
797876747076%
Detailed Writing Rules71.32%
genre
Creative WritingHallucination
696766626065%
Creative WritingHallucination
787271706872%
Creative WritingHallucination
817971716473%
Creative WritingHallucination
797875747476%
Creative WritingHallucination
787775706773%
Creative WritingHallucination
838281807781%
genre73.22%
Novelcrafter Default Prompt
Creative WritingHallucination
686867646366%
Creative WritingHallucination
767470626169%
Creative WritingHallucination
797372717173%
Creative WritingHallucination
797877777577%
Creative WritingHallucination
767473706672%
Creative WritingHallucination
887979717177%
Novelcrafter Default Prompt72.41%
72.32%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
ToolingReasoning
989795949295%
ToolingReasoning
979695959495%
ToolingReasoning
939392888891%
ToolingReasoning
999794929195%
93.98%

Codex Red Herring (False Positive Detection)

Tests whether models correctly report "no violations" when a codex is fully consistent with the prose passage. Models that hallucinate false violations (false positives) fail. Uses a 2×2 matrix of text length × codex size, with bare and detailed-entry variants.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
basic entries
Hallucination
1001001001001001001001001002593%
Hallucination
1001001001001001001001001002593%
Hallucination
100100100100100100100100100100100%
Hallucination
1001001001001001001001001002593%
basic entries94.38%
detailed entries
Hallucination
1001001001001001001001001002593%
Hallucination
10010010010010010010025252578%
Hallucination
100100100100100100100100252585%
Hallucination
100100100100100100100100100100100%
detailed entries88.75%
91.56%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
matrix
ToolingReasoning
9392919191898886868589%
ToolingReasoning
9595929290908989878390%
ToolingReasoning
9090888888878484808086%
ToolingReasoning
9797979797979797938495%
matrix90.11%
tiers
ToolingReasoning
10010010010010010010093939398%
ToolingReasoning
100100949494949492929295%
ToolingReasoning
9494949492919188868691%
ToolingReasoning
1001001009797979797939097%
tiers95.14%
92.63%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
dialogue-200
Rule FollowingUtility
9695605352504700045%
Rule FollowingUtility
99505050503028244039%
Rule FollowingUtility
9958505049464240321848%
dialogue-20044.09%
dialogue-500
Rule FollowingUtility
5048391775000017%
Rule FollowingUtility
695014200000014%
Rule FollowingUtility
494847474443100028%
dialogue-50019.37%
Ungrouped
Rule Following
100100100100100100100100616192%
40.36%

Language Comprehension

Does the model understand more than just English?

Scenario #1 #2 #3 #4 #5 Total
Language
100100100100080%
Language
100100100100080%
Language
000000%
Language
100100100100100100%
65.00%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
outline-count
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100090%
outline-count96.67%
pov-count
Utility
1001001001001001001001001005095%
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
pov-count98.33%
97.50%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
10099999990908495%
Text Editing
100100999999999999%
Text Editing
100100100100100100100100%
Text Editing
10074747474737377%
Text EditingHallucination
9796959594949395%
Text Editing
10010010010010010099100%
Text Editing
9793898582818187%
Generic Prompt94.74%
Specific Prompt
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
10099999693939096%
Text Editing
10099999999999999%
Text Editing
100100100100100100100100%
Text Editing
10010099999386083%
Text EditingHallucination
9998989896958195%
Text Editing
100100100100100100100100%
Text Editing
1001001009996676590%
Specific Prompt95.79%
95.27%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Tooling
100100100100100100100100100100100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Rule Following
100100100100100100100100100090%
Rule Following
10010010010000000040%
1-shot Rule Following
100100100000000030%
Few-shot Rule Following
100100100100100100100100100090%
Rule Following
100100100100100100100100100100100%
70.00%