google/gemini-3-flash-preview

Gemini 3 Flash (Preview, Reasoning)

Release Date

Dec 17th, 2025

Context Size

1m

Reasoning

Yes

Benchmark Cost

$9.60

Speed

162.3 tok/s

Categories

20%40%60%80%100%Creative Writing75.9%Tooling100.0%Language94.9%Utility97.2%Reasoning98.1%Text Editing98.1%Rule Following74.5%Hallucination85.3%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
Creative WritingHallucination
838277766978%
Creative WritingHallucination
787777756775%
Creative WritingHallucination
807977767678%
Creative WritingHallucination
807875757276%
Creative WritingHallucination
858279797680%
Creative WritingHallucination
837876736976%
Detailed Writing Rules76.99%
genre
Creative WritingHallucination
717170696870%
Creative WritingHallucination
747370696871%
Creative WritingHallucination
767574717173%
Creative WritingHallucination
787372686672%
Creative WritingHallucination
747372716571%
Creative WritingHallucination
737271717071%
genre71.35%
Novelcrafter Default Prompt
Creative WritingHallucination
807977767377%
Creative WritingHallucination
818076767678%
Creative WritingHallucination
828282777579%
Creative WritingHallucination
807571706973%
Creative WritingHallucination
828079767578%
Creative WritingHallucination
787776756775%
Novelcrafter Default Prompt76.68%
75.01%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
ToolingReasoning
999999989899%
ToolingReasoning
999898989798%
ToolingReasoning
999898969597%
ToolingReasoning
999999969598%
97.89%

Codex Red Herring (False Positive Detection)

Tests whether models correctly report "no violations" when a codex is fully consistent with the prose passage. Models that hallucinate false violations (false positives) fail. Uses a 2×2 matrix of text length × codex size, with bare and detailed-entry variants.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
basic entries
Hallucination
3825171717131313131017%
Hallucination
10025252525252525251732%
Hallucination
10010010010025252525251754%
Hallucination
100100100100100252525251762%
basic entries41.19%
detailed entries
Hallucination
1001001001001001001001001001091%
Hallucination
1001001002525252517171345%
Hallucination
100100100100100100100100252585%
Hallucination
100100100100100252525252563%
detailed entries70.77%
55.98%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
matrix
ToolingReasoning
9594939392929291908892%
ToolingReasoning
10098989796969494939196%
ToolingReasoning
100100100100100979795929297%
ToolingReasoning
100100100100100100100100100100100%
matrix96.22%
tiers
ToolingReasoning
100100100100100100100100100100100%
ToolingReasoning
100100100100100100100100100100100%
ToolingReasoning
1001001009696969292888895%
ToolingReasoning
10010010010010010010092929298%
tiers98.08%
97.15%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
dialogue-200
Rule FollowingUtility
1001001001001001001001009998100%
Rule FollowingUtility
10010010010010010010064545087%
Rule FollowingUtility
10098968986795049494374%
dialogue-20086.78%
dialogue-500
Rule FollowingUtility
100827147407000035%
Rule FollowingUtility
9466411600000022%
Rule FollowingUtility
504918000000012%
dialogue-50022.68%
Ungrouped
Rule Following
100100100100100100100100100100100%
61.20%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
outline-count
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
outline-count100.00%
pov-count
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
Utility
100100100100100100100100100100100%
pov-count100.00%
100.00%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
9999999999999899%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text EditingHallucination
9696969594949495%
Text Editing
100100100100100100100100%
Text Editing
9999999999969298%
Generic Prompt99.07%
Specific Prompt
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
1001009999999999100%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Text Editing
10010010010010010098100%
Text EditingHallucination
9897979797979697%
Text Editing
100100100100100100100100%
Text Editing
100100100100100100100100%
Specific Prompt99.57%
99.32%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Tooling
100100100100100100100100100100100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
Rule Following
1001001001001001001001000080%
Rule Following
100100100000000030%
1-shot Rule Following
100100100100100100100100100090%
Few-shot Rule Following
10010010010010010010000070%
Rule Following
100100100100100100100100100090%
72.00%