Google

Comparing 13 models from Google.

Model Total ▼ Released Context CoTTooling Creative Writing Language Utility Reasoning Text Editing Rule Following Hallucination
Gemini 3.1 Pro (Preview)94.37%Feb 19, 261m99.90%85.44%94.90%99.91%96.01%98.51%91.21%89.06%
Gemini 3 Flash (Preview, Reasoning)90.50%Dec 17, 251m100.00%75.87%94.93%97.20%98.05%98.12%74.48%85.32%
Gemini 3 Pro (Preview)88.79%Nov 18, 251m99.98%77.77%89.64%96.14%95.24%98.86%64.47%88.23%
Gemini 2.5 Pro88.53%Jun 17, 251m100.00%81.03%92.57%92.18%96.91%98.58%60.89%86.11%
Gemini 2.5 Flash (Reasoning)86.51%Jun 17, 251m100.00%76.30%86.06%82.25%93.81%98.12%59.97%95.60%
Gemini 3.1 Flash Lite (Preview)85.87%Feb 19, 261m99.98%75.78%94.98%94.00%92.15%96.46%59.04%74.58%
Gemini 2.5 Flash Lite (Reasoning)85.75%Jul 22, 251m99.54%71.64%74.36%89.63%93.86%94.54%66.81%95.59%
Gemini 3 Flash (Preview)85.35%Dec 17, 251m97.64%75.04%95.00%86.39%94.79%97.54%65.14%71.24%
Gemini 2.5 Flash Lite81.08%Jul 22, 251m96.60%75.05%82.75%80.14%85.80%92.13%59.96%76.17%
Gemini 2.5 Flash80.60%Jun 17, 251m99.96%77.57%86.23%61.45%92.60%97.83%57.47%71.70%
Gemma 3 12B78.41%Mar 12, 25128k97.69%75.38%80.10%79.28%79.42%85.18%61.05%69.15%
Gemma 3 27B77.85%Mar 12, 25128k99.88%78.79%77.21%76.82%86.74%86.63%47.98%68.74%
Gemma 3 4B68.57%Mar 12, 25128k97.88%72.10%72.28%60.30%73.64%78.38%26.37%67.60%
Model Performance
Cost vs Performance

Compares total benchmark cost against overall score for Google models. Quadrant lines are drawn at the median values.

1 low-scoring outlier hidden: Gemma 3 4B (68.6%).

Cost Breakdown

Total benchmark cost per model, broken down by input, reasoning, and output tokens. Toggle between USD and token views.