Google

Comparing 21 models from Google.

Model	Total ▼	Released	Context	CoT	Tooling	Creative Writing	Language	Utility	Reasoning	Text Editing	Rule Following	Hallucination
Gemini 3.1 Pro (Preview)	94.08%	Feb 19, 26	1m	✓	99.92%	85.44%	94.90%	99.91%	88.20%	98.51%	91.21%	94.53%
Gemini 3.5 Flash (Reasoning)	93.35%	May 19, 26	1m	✓	100.00%	79.87%	94.41%	98.86%	88.22%	97.78%	92.04%	95.61%
Gemini 3 Flash (Preview, Reasoning)	89.93%	Dec 17, 25	1m	✓	100.00%	75.87%	94.93%	97.20%	86.16%	98.12%	74.48%	92.65%
Gemma 4 31B (Reasoning)	89.64%	Apr 3, 26	256k	✓	98.50%	78.13%	83.82%	96.32%	80.14%	98.83%	85.00%	96.37%
Gemma 4 26B (Reasoning)	89.02%	Apr 3, 26	256k	✓	99.04%	76.38%	95.20%	95.69%	74.05%	98.26%	74.75%	98.79%
Gemini 3 Pro (Preview)	88.79%	Nov 18, 25	1m	✓	99.98%	77.77%	89.64%	96.14%	95.24%	98.86%	64.47%	88.23%
Gemini 2.5 Pro	88.44%	Jun 17, 25	1m	✓	100.00%	81.03%	92.57%	92.18%	89.21%	98.58%	60.89%	93.05%
Gemini 3.1 Flash Lite (Reasoning)	85.91%	May 7, 26	1m	✓	99.81%	76.31%	96.70%	92.32%	75.32%	96.90%	62.26%	87.63%
Gemini 3.5 Flash (Reasoning, Minimal)	85.88%	May 19, 26	1m	–	100.00%	78.55%	97.50%	83.90%	78.93%	98.26%	60.38%	89.55%
Gemini 3 Flash (Preview)	85.47%	Dec 17, 25	1m	–	98.04%	75.04%	95.00%	86.39%	80.99%	97.54%	65.14%	85.61%
Gemini 3.1 Flash Lite (Preview)	85.41%	Feb 19, 26	1m	–	99.98%	75.78%	94.98%	94.00%	75.75%	96.46%	59.04%	87.29%
Gemma 4 31B	85.23%	Apr 3, 26	256k	–	100.00%	75.59%	75.00%	86.69%	78.18%	98.56%	72.72%	95.09%
Gemini 3.1 Flash Lite	85.09%	May 7, 26	1m	–	99.97%	76.01%	90.90%	92.77%	74.62%	97.35%	61.21%	87.86%
Gemma 4 26B	84.89%	Apr 3, 26	256k	–	91.94%	75.17%	92.02%	83.17%	73.83%	97.04%	71.75%	94.22%
Gemini 2.5 Flash (Reasoning)	84.14%	Jun 17, 25	1m	✓	100.00%	76.30%	86.06%	82.25%	72.63%	98.12%	59.97%	97.79%
Gemini 2.5 Flash Lite (Reasoning)	83.10%	Jul 22, 25	1m	✓	98.28%	71.64%	74.36%	89.63%	72.51%	94.54%	66.81%	96.99%
Gemini 2.5 Flash	80.61%	Jun 17, 25	1m	–	99.97%	77.57%	86.23%	61.45%	79.09%	97.83%	57.47%	85.27%
Gemini 2.5 Flash Lite	79.91%	Jul 22, 25	1m	–	96.00%	75.05%	82.75%	80.14%	65.19%	92.13%	59.96%	88.08%
Gemma 3 12B	76.07%	Mar 12, 25	128k	–	94.16%	75.38%	80.10%	79.28%	56.28%	85.18%	61.05%	77.13%
Gemma 3 27B	75.70%	Mar 12, 25	128k	–	96.15%	78.79%	77.21%	76.82%	61.51%	86.63%	47.98%	80.53%
Gemma 3 4B	66.33%	Mar 12, 25	128k	–	92.40%	72.10%	72.28%	60.30%	52.86%	78.38%	26.37%	75.94%

Model Performance

Cost vs Performance

Compares total benchmark cost against overall score for Google models. Quadrant lines are drawn at the median values.

1 low-scoring outlier hidden: Gemma 3 4B (66.3%).

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.

USD Tokens