Recall

Test: Codex Extraction

Avg. Score
92.2%
Scenarios
4

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3 Flash (Preview)99.7%$0.00273.9s97%
2Mistral Small Creative98.6%$0.00063.9s95%
3Qwen 3.5 Plus (2026-02-15)100.0%$0.003010.6s100%
4Mistral Medium 3.198.5%$0.00265.8s96%
5Ministral 3 8B97.3%$0.00063.3s94%
6Grok 4 Fast97.3%$0.00128.7s92%
7Ministral 3 14B96.7%$0.00096.0s90%
8Z.AI GLM 4.598.4%$0.002816.8s96%
9Gemini 2.5 Flash Lite95.1%$0.00051.9s88%
10Ministral 8B95.1%$0.00043.8s88%
11Claude Haiku 4.597.5%$0.00734.5s92%
12Mistral Large 396.5%$0.00278.2s91%
13Grok 4.1 Fast99.0%$0.001722.1s96%
14GPT-4.197.0%$0.00734.7s91%
15Gemma 3 27B96.8%$0.000514.5s91%
16Claude Sonnet 4.5100.0%$0.0226.6s100%
17Mistral Small 3.2 24B94.3%$0.00054.4s86%
18Claude 3.7 Sonnet100.0%$0.0217.8s100%
19Claude Sonnet 4100.0%$0.0227.9s100%
20Gemini 2.5 Flash Lite (Reasoning)97.0%$0.002115.6s91%
21Claude Sonnet 4.699.8%$0.0227.4s99%
22Gemini 3 Flash (Preview, Reasoning)99.1%$0.009622.2s96%
23GPT-4o, Aug. 6th (temp=0)95.6%$0.01004.0s87%
24GPT-4o, May 13th (temp=0)98.5%$0.0254.3s97%
25Mistral Large 296.3%$0.0118.3s89%
26GPT-4o, Aug. 6th (temp=1)94.4%$0.00923.8s85%
27Gemini 2.5 Flash95.8%$0.00232.5s74%
28DeepSeek-V2 Chat94.3%$0.001914.8s83%
29DeepSeek V3 (2025-03-24)96.4%$0.001427.0s87%
30Claude 3 Haiku91.3%$0.00175.5s80%
31Ministral 3B90.7%$0.00021.8s76%
32DeepSeek V3.297.6%$0.001136.7s91%
33Qwen 2.5 72B91.5%$0.000811.0s81%
34o4 Mini97.7%$0.01421.5s92%
35GPT-5.297.6%$0.01816.8s92%
36DeepSeek V3 (2024-12-26)92.8%$0.001713.5s80%
37GPT-4.1 Mini90.0%$0.00156.6s78%
38Z.AI GLM 4.698.1%$0.005738.8s93%
39Minimax M2.595.3%$0.002330.1s87%
40Hermes 3 405B93.0%$0.004017.8s83%
41GPT-4o, May 13th (temp=1)95.4%$0.0254.0s89%
42Claude Opus 4.599.3%$0.0377.7s97%
43Claude Opus 4.699.0%$0.0378.1s97%
44WizardLM 2 8x22b94.7%$0.002631.3s86%
45Writer: Palmyra X588.9%$0.00527.5s78%
46Z.AI GLM 599.2%$0.009151.4s97%
47GPT-5 Mini97.3%$0.007045.5s92%
48Ministral 3 3B85.2%$0.00041.8s69%
49Claude 3.5 Sonnet99.2%$0.04313.9s96%
50Gemini 2.5 Pro98.4%$0.03422.8s95%
51Grok 499.3%$0.03137.2s97%
52o4 Mini High98.4%$0.02540.5s95%
53Arcee AI: Trinity Mini79.7%$0.00035.8s69%
54Gemini 2.5 Flash (Reasoning)90.4%$0.008211.8s65%
55Gemma 3 4B79.2%$0.00026.7s66%
56Aion 2.098.5%$0.00801.2m93%
57Claude Opus 4.6 (Reasoning)99.5%$0.05521.4s98%
58GPT-4o Mini (temp=0)76.6%$0.00057.9s68%
59Hermes 3 70B81.2%$0.001215.8s67%
60GPT-5.197.9%$0.03543.6s94%
61GPT-5 Nano97.2%$0.00431.3m91%
62ByteDance Seed 1.697.2%$0.00731.3m91%
63Mistral Large90.8%$0.0117.6s55%
64ByteDance Seed 1.6 Flash86.4%$0.001139.3s72%
65DeepSeek V3.191.3%$0.001226.1s55%
66GPT-4o Mini (temp=1)75.3%$0.00068.0s62%
67Claude Sonnet 4.6 (Reasoning)99.0%$0.05236.0s96%
68Z.AI GLM 4.7 Flash93.2%$0.00191.2m83%
69Gemini 3 Pro (Preview)99.1%$0.05535.6s94%
70Cohere Command R+ (Aug. 2024)81.8%$0.01417.9s64%
71Llama 3.1 70B81.1%$0.001616.0s52%
72Llama 3.1 Nemotron 70B84.5%$0.005017.3s50%
73Z.AI GLM 4.798.3%$0.00981.7m93%
74Gemma 3 12B78.3%$0.000314.1s47%
75Arcee AI: Trinity Large (Preview)84.8%$0.000020.5s41%
76GPT-4.1 Nano68.6%$0.00032.8s50%
77Qwen 3.5 397B A17B97.6%$0.0121.8m93%
78GPT-599.4%$0.0491.3m98%
79Claude Opus 4100.0%$0.11013.5s100%
80Llama 3.1 8B66.4%$0.000110.0s39%
81MoonshotAI: Kimi K2.598.1%$0.0132.4m93%
82Gemini 3.1 Pro (Preview)95.2%$0.0651.0m67%
83Rocinante 12B40.3%$0.001325.6s13%
84Mistral NeMO23.9%$0.00061.4s0%
92.22%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Mistral Large 3100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Mistral Small Creative100100100100100100.0%
GPT-5 Mini1001001001009899.6%
MoonshotAI: Kimi K2.51001001001009899.6%
Grok 4 Fast1001001001009899.6%
Gemini 2.5 Flash Lite (Reasoning)1001001001009899.6%
Z.AI GLM 4.61001001001009899.5%
Claude 3.5 Sonnet1001001001009899.5%
Qwen 3.5 397B A17B100100100989899.2%
Gemini 2.5 Pro100100100989899.2%
DeepSeek V3 (2025-03-24)100100100989899.1%
GPT-5.21001001001009498.8%
Z.AI GLM 4.7100100100989698.8%
GPT-5 Nano1001001001009398.7%
GPT-5.110010098989698.5%
Aion 2.0100100100989498.4%
Z.AI GLM 4.5100100100969598.3%
o4 Mini High100100100959598.1%
Ministral 3 8B100100100959598.1%
Gemini 2.5 Flash (Reasoning)1009898989698.1%
GPT-51009898989698.1%
Claude 3 Haiku10010098959597.7%
Ministral 3 14B10010098969397.5%
Gemma 3 27B989898989597.5%
o4 Mini10010095959597.2%
Ministral 3 3B1009898959396.8%
GPT-4o, May 13th (temp=0)989898959596.7%
GPT-4.1989896969496.5%
Gemini 2.5 Flash Lite1009895959396.4%
Ministral 3B100100100938896.3%
Z.AI GLM 4.7 Flash10010095949196.1%
Minimax M2.51009595959395.9%
GPT-4o, May 13th (temp=1)989898988795.5%
Llama 3.1 Nemotron 70B989696959195.2%
DeepSeek-V2 Chat1009893939194.9%
Ministral 8B989595959194.9%
GPT-4o, Aug. 6th (temp=0)1009393939394.4%
WizardLM 2 8x22b10010095957994.0%
GPT-4.1 Mini1009593918893.6%
Mistral Small 3.2 24B959595919193.5%
DeepSeek V3 (2024-12-26)1009593918893.5%
GPT-4o, Aug. 6th (temp=1)959393919192.6%
Qwen 2.5 72B989593888892.6%
Gemma 3 12B959391918991.9%
ByteDance Seed 1.6 Flash969490888690.9%
Hermes 3 405B959591848489.8%
Cohere Command R+ (Aug. 2024)969191868489.6%
Writer: Palmyra X5959593917088.8%
Gemini 2.5 Flash1001001001004087.9%
Gemma 3 4B888686858285.5%
Llama 3.1 70B989681797084.9%
Gemini 3.1 Pro (Preview)10010098982383.9%
GPT-4o Mini (temp=0)868479797480.5%
Arcee AI: Trinity Mini868679777280.1%
DeepSeek V3.1100100100100080.0%
Mistral Large100100100100080.0%
Hermes 3 70B888179777279.5%
GPT-4o Mini (temp=1)817979676574.5%
Arcee AI: Trinity Large (Preview)95958888073.5%
GPT-4.1 Nano857775685872.8%
Llama 3.1 8B84797775564.0%
Rocinante 12B8786740049.5%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
GPT-5.1100100100100100100.0%
GPT-5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Qwen 3.5 397B A17B1001001001009799.3%
o4 Mini1001001001009799.3%
Gemma 3 27B1001001001009799.3%
GPT-5.21001001001009699.3%
GPT-4.11001001001009699.3%
Gemini 2.5 Pro1001001001009699.3%
Gemini 3.1 Pro (Preview)100100100979798.6%
Claude Sonnet 4.6 (Reasoning)100100100979798.6%
Gemini 2.5 Flash100100100979798.6%
Mistral Small Creative100100100979798.6%
Claude Opus 4.6 (Reasoning)10010097979797.9%
Claude Opus 4.610010097979797.9%
Grok 410010097979797.9%
Z.AI GLM 4.71001001001009097.9%
Z.AI GLM 4.510010097979797.9%
GPT-5 Mini10010097969697.9%
Mistral Large 31009797979797.2%
MoonshotAI: Kimi K2.51001001001008697.2%
Claude Opus 4.51009797979797.2%
Aion 2.01001001001008697.2%
Claude 3.5 Sonnet10010097979397.2%
Ministral 3 14B10010097969397.2%
Grok 4.1 Fast1009797979396.6%
Z.AI GLM 4.6100100100978696.6%
Gemini 3 Pro (Preview)100100100978696.6%
Claude Haiku 4.5979797979796.6%
GPT-5 Nano979797979796.6%
GPT-4o, May 13th (temp=1)100100100909095.9%
GPT-4o, Aug. 6th (temp=1)100100100909095.9%
Mistral Large 2100100100909095.9%
Mistral Medium 3.1979797979395.9%
Ministral 3 8B10010093939395.9%
Z.AI GLM 4.7 Flash10010093939395.8%
Gemini 2.5 Flash Lite10010097968695.8%
Grok 4 Fast979797939395.2%
ByteDance Seed 1.61009797968695.1%
Ministral 8B979796939395.1%
DeepSeek V3.110010097908694.5%
WizardLM 2 8x22b1009797909094.5%
DeepSeek V3.210010093909094.5%
Mistral Large1009796909094.5%
Gemini 2.5 Flash (Reasoning)969696938994.3%
GPT-4o, Aug. 6th (temp=0)1009797908693.8%
Minimax M2.5979793868691.7%
Mistral Small 3.2 24B939090909090.3%
ByteDance Seed 1.6 Flash1009090868389.6%
Claude 3 Haiku939386868689.0%
Hermes 3 405B939390838388.3%
DeepSeek V3 (2025-03-24)1009086837987.6%
DeepSeek-V2 Chat909090907286.2%
Qwen 2.5 72B938686837985.5%
Llama 3.1 70B868686867984.8%
GPT-4.1 Mini938383837984.1%
Writer: Palmyra X5938383797983.4%
DeepSeek V3 (2024-12-26)909083767282.1%
Cohere Command R+ (Aug. 2024)938676696978.5%
Hermes 3 70B838379727277.9%
Ministral 3B867976767277.9%
Llama 3.1 8B838276767277.8%
Gemma 3 12B96969393075.8%
Gemma 3 4B767676767275.0%
Arcee AI: Trinity Mini797672727274.5%
Llama 3.1 Nemotron 70B868683793473.8%
GPT-4o Mini (temp=0)767676696973.1%
Ministral 3 3B797272726973.1%
GPT-4.1 Nano868669665973.1%
Arcee AI: Trinity Large (Preview)97908683071.0%
GPT-4o Mini (temp=1)727269696669.7%
Rocinante 12B72724834045.4%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001009799.3%
GPT-51001001001009799.3%
Z.AI GLM 51001001001009799.3%
Claude Sonnet 4.61001001001009799.3%
Grok 4.1 Fast1001001001009799.3%
Aion 2.01001001001009799.3%
Z.AI GLM 4.71001001001009799.3%
Grok 41001001001009799.3%
Arcee AI: Trinity Large (Preview)1001001001009799.3%
Ministral 3 14B1001001001009799.3%
GPT-5.2100100100979798.7%
Gemini 3 Flash (Preview)1001001001009398.7%
DeepSeek-V2 Chat100100100979798.7%
DeepSeek V3.21001001001009398.7%
Gemma 3 27B1001001001009398.7%
Mistral Small Creative100100100979798.7%
Claude Opus 4.610010097979798.0%
Gemini 2.5 Pro1001001001009098.0%
MoonshotAI: Kimi K2.510010097979397.3%
Gemini 3 Flash (Preview, Reasoning)100100100939397.3%
o4 Mini High1009797979797.3%
Z.AI GLM 4.610010097979397.3%
Minimax M2.5100100100979097.3%
Z.AI GLM 4.51009797979797.3%
Grok 4 Fast100100100979097.3%
GPT-4o, May 13th (temp=0)1009797979797.3%
DeepSeek V3 (2024-12-26)100100100978796.7%
Gemini 2.5 Flash10010097939396.7%
GPT-4.110010097939096.0%
GPT-5.1979797979396.0%
o4 Mini979797979396.0%
Ministral 3 8B1009797939095.3%
GPT-5 Mini1009793939395.3%
ByteDance Seed 1.61009797939095.3%
GPT-5 Nano979797939395.3%
DeepSeek V3.11009797939095.3%
GPT-4o, May 13th (temp=1)979793939394.7%
Gemini 2.5 Flash Lite (Reasoning)1009797908794.0%
Hermes 3 405B979797938794.0%
GPT-4o, Aug. 6th (temp=1)1009793909094.0%
GPT-4o, Aug. 6th (temp=0)979393939394.0%
Mistral Large 2979393939394.0%
Gemini 2.5 Flash Lite979797938794.0%
WizardLM 2 8x22b979797938794.0%
Mistral Large 3939393939393.3%
Mistral Large939393939393.3%
Qwen 3.5 397B A17B979790909092.7%
Writer: Palmyra X5979793908792.7%
GPT-4.1 Mini979393908792.0%
Z.AI GLM 4.7 Flash979393908391.3%
Ministral 8B979390908791.3%
Qwen 2.5 72B979390878790.7%
Ministral 3B979393878390.7%
Gemma 3 12B939087878788.7%
Ministral 3 3B939087878388.0%
Gemini 2.5 Flash (Reasoning)100100100874386.0%
GPT-4o Mini (temp=1)908787808084.7%
Claude 3 Haiku939087777384.0%
Arcee AI: Trinity Mini938783837384.0%
GPT-4o Mini (temp=0)878080808081.3%
ByteDance Seed 1.6 Flash938077735776.0%
Hermes 3 70B878377676074.7%
Gemma 3 4B807773736373.3%
Cohere Command R+ (Aug. 2024)937777674371.3%
Llama 3.1 Nemotron 70B908383831070.0%
Llama 3.1 70B87838070064.0%
GPT-4.1 Nano777773672062.7%
Llama 3.1 8B73605733044.7%
Mistral NeMO7373700043.3%
Rocinante 12B87803020043.3%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001009599.1%
Qwen 3.5 397B A17B1001001001009599.1%
Gemini 3 Flash (Preview, Reasoning)1001001001009599.1%
Aion 2.01001001001009599.1%
Z.AI GLM 4.61001001001009599.1%
DeepSeek V3 (2024-12-26)1001001001009599.1%
DeepSeek V3 (2025-03-24)1001001001009599.1%
Llama 3.1 Nemotron 70B1001001001009599.1%
Ministral 8B1001001001009599.1%
MoonshotAI: Kimi K2.5100100100959598.2%
ByteDance Seed 1.6100100100959598.2%
o4 Mini100100100959598.2%
Mistral Medium 3.1100100100959598.2%
o4 Mini High1001001001009098.1%
GPT-5 Nano1001001001009098.1%
Ministral 3B1001001001009098.1%
Claude Sonnet 4.6 (Reasoning)10010095959597.3%
GPT-5.110010095959597.3%
Z.AI GLM 510010095959597.3%
Gemini 2.5 Pro10010095959597.3%
Grok 4 Fast10010095959597.3%
DeepSeek-V2 Chat10010095959597.3%
Qwen 2.5 72B100100100959197.3%
Mistral Small Creative100100100959197.3%
DeepSeek V3.21001001001008697.2%
Z.AI GLM 4.71001001001008697.2%
GPT-5 Mini1009595959596.4%
GPT-4.11009595959596.4%
Minimax M2.510010095959196.3%
WizardLM 2 8x22b10010095959196.3%
Mistral Large 3959595959595.5%
GPT-4o, May 13th (temp=1)959595959595.5%
Mistral Large 2959595959595.5%
DeepSeek V3.1959595959595.5%
Mistral Large959595959595.5%
Arcee AI: Trinity Large (Preview)10010095909095.3%
GPT-4o, Aug. 6th (temp=1)100100100908695.2%
Claude 3 Haiku10010095918694.5%
Gemini 2.5 Flash Lite (Reasoning)10010095918694.5%
Gemini 2.5 Flash Lite1009595909094.3%
GPT-5.210010091918693.6%
Claude Haiku 4.5959595958693.6%
Mistral Small 3.2 24B1009591909093.5%
Ministral 3 14B959591919192.7%
Hermes 3 70B1009191919092.5%
Gemma 3 27B959591918691.8%
Writer: Palmyra X5919191919190.7%
Llama 3.1 70B959190908690.6%
GPT-4.1 Mini10010090867690.5%
Z.AI GLM 4.7 Flash1009586868189.7%
ByteDance Seed 1.6 Flash959591867789.0%
Cohere Command R+ (Aug. 2024)959190867687.8%
Gemini 2.5 Flash (Reasoning)10010091824383.1%
Gemma 3 4B908686767683.0%
Ministral 3 3B908181818182.9%
Arcee AI: Trinity Mini908181767280.3%
Llama 3.1 8B868181777279.3%
GPT-4o Mini (temp=1)767272727272.6%
GPT-4o Mini (temp=0)727272727271.6%
GPT-4.1 Nano817667574865.8%
Gemma 3 12B685954544956.7%
Mistral NeMO9090810052.4%
Rocinante 12B625200022.9%