Google

Comparing 21 models from Google.

Model Total ▼ Released Context CoTTooling Creative Writing Language Utility Reasoning Text Editing Rule Following Hallucination
Gemini 3.1 Pro (Preview)94.08%Feb 19, 261m99.92%85.44%94.90%99.91%88.20%98.51%91.21%94.53%
Gemini 3.5 Flash (Reasoning)93.35%May 19, 261m100.00%79.87%94.41%98.86%88.22%97.78%92.04%95.61%
Gemini 3 Flash (Preview, Reasoning)89.93%Dec 17, 251m100.00%75.87%94.93%97.20%86.16%98.12%74.48%92.65%
Gemma 4 31B (Reasoning)89.64%Apr 3, 26256k98.50%78.13%83.82%96.32%80.14%98.83%85.00%96.37%
Gemma 4 26B (Reasoning)89.02%Apr 3, 26256k99.04%76.38%95.20%95.69%74.05%98.26%74.75%98.79%
Gemini 3 Pro (Preview)88.79%Nov 18, 251m99.98%77.77%89.64%96.14%95.24%98.86%64.47%88.23%
Gemini 2.5 Pro88.44%Jun 17, 251m100.00%81.03%92.57%92.18%89.21%98.58%60.89%93.05%
Gemini 3.1 Flash Lite (Reasoning)85.91%May 7, 261m99.81%76.31%96.70%92.32%75.32%96.90%62.26%87.63%
Gemini 3.5 Flash (Reasoning, Minimal)85.88%May 19, 261m100.00%78.55%97.50%83.90%78.93%98.26%60.38%89.55%
Gemini 3 Flash (Preview)85.47%Dec 17, 251m98.04%75.04%95.00%86.39%80.99%97.54%65.14%85.61%
Gemini 3.1 Flash Lite (Preview)85.41%Feb 19, 261m99.98%75.78%94.98%94.00%75.75%96.46%59.04%87.29%
Gemma 4 31B85.23%Apr 3, 26256k100.00%75.59%75.00%86.69%78.18%98.56%72.72%95.09%
Gemini 3.1 Flash Lite85.09%May 7, 261m99.97%76.01%90.90%92.77%74.62%97.35%61.21%87.86%
Gemma 4 26B84.89%Apr 3, 26256k91.94%75.17%92.02%83.17%73.83%97.04%71.75%94.22%
Gemini 2.5 Flash (Reasoning)84.14%Jun 17, 251m100.00%76.30%86.06%82.25%72.63%98.12%59.97%97.79%
Gemini 2.5 Flash Lite (Reasoning)83.10%Jul 22, 251m98.28%71.64%74.36%89.63%72.51%94.54%66.81%96.99%
Gemini 2.5 Flash80.61%Jun 17, 251m99.97%77.57%86.23%61.45%79.09%97.83%57.47%85.27%
Gemini 2.5 Flash Lite79.91%Jul 22, 251m96.00%75.05%82.75%80.14%65.19%92.13%59.96%88.08%
Gemma 3 12B76.07%Mar 12, 25128k94.16%75.38%80.10%79.28%56.28%85.18%61.05%77.13%
Gemma 3 27B75.70%Mar 12, 25128k96.15%78.79%77.21%76.82%61.51%86.63%47.98%80.53%
Gemma 3 4B66.33%Mar 12, 25128k92.40%72.10%72.28%60.30%52.86%78.38%26.37%75.94%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for Google models. Quadrant lines are drawn at the median values.

1 low-scoring outlier hidden: Gemma 3 4B (66.3%).

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.