Precision

Test: Codex Extraction

Avg. Score
95.7%
Scenarios
4

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Ministral 3 8B99.5%$0.00063.3s97%
2Gemini 3 Flash (Preview)99.7%$0.00273.9s97%
3GPT-4.1 Mini99.5%$0.00156.6s97%
4Qwen 3.5 Plus (2026-02-15)100.0%$0.003010.6s100%
5Gemini 2.5 Flash Lite98.0%$0.00051.9s94%
6GPT-4o, Aug. 6th (temp=1)100.0%$0.00923.8s100%
7GPT-4o, Aug. 6th (temp=0)100.0%$0.01004.0s100%
8Ministral 8B98.2%$0.00043.8s94%
9Mistral Medium 3.199.2%$0.00265.8s95%
10Ministral 3B98.1%$0.00021.8s91%
11Gemini 2.5 Flash98.5%$0.00232.5s93%
12Mistral Small 3.2 24B98.3%$0.00054.4s92%
13Mistral Small Creative97.7%$0.00063.9s92%
14Grok 4 Fast98.2%$0.00128.7s94%
15Ministral 3 3B96.4%$0.00041.8s90%
16Grok 4.1 Fast100.0%$0.001722.1s100%
17Z.AI GLM 4.599.6%$0.002816.8s97%
18DeepSeek V3 (2024-12-26)98.8%$0.001713.5s94%
19Qwen 2.5 72B98.1%$0.000811.0s92%
20GPT-4o Mini (temp=1)97.1%$0.00068.0s91%
21GPT-4o Mini (temp=0)97.2%$0.00057.9s90%
22Hermes 3 405B99.6%$0.004017.8s97%
23Claude 3 Haiku97.6%$0.00175.5s89%
24DeepSeek-V2 Chat98.4%$0.001914.8s94%
25Writer: Palmyra X597.0%$0.00527.5s92%
26Claude Haiku 4.597.5%$0.00734.5s91%
27DeepSeek V3 (2025-03-24)99.6%$0.001427.0s96%
28Claude Sonnet 4.5100.0%$0.0226.6s100%
29Claude 3.7 Sonnet100.0%$0.0217.8s100%
30Hermes 3 70B96.9%$0.001215.8s91%
31Arcee AI: Trinity Mini94.7%$0.00035.8s87%
32Claude Sonnet 4100.0%$0.0227.9s100%
33GPT-4o, May 13th (temp=0)100.0%$0.0254.3s100%
34Llama 3.1 Nemotron 70B97.3%$0.005017.3s93%
35Gemini 2.5 Flash Lite (Reasoning)96.9%$0.002115.6s89%
36Claude Sonnet 4.699.7%$0.0227.4s97%
37Gemma 3 27B96.0%$0.000514.5s87%
38Mistral Large 395.0%$0.00278.2s86%
39GPT-4.195.4%$0.00734.7s87%
40Ministral 3 14B94.2%$0.00096.0s82%
41Gemini 3 Flash (Preview, Reasoning)98.9%$0.009622.2s95%
42Minimax M2.598.3%$0.002330.1s94%
43o4 Mini99.2%$0.01421.5s95%
44WizardLM 2 8x22b97.4%$0.002631.3s92%
45DeepSeek V3.298.8%$0.001136.7s92%
46Mistral Large 295.0%$0.0118.3s86%
47GPT-4o, May 13th (temp=1)97.3%$0.0254.0s92%
48Z.AI GLM 4.699.3%$0.005738.8s95%
49Claude Opus 4.5100.0%$0.0377.7s100%
50Gemma 3 4B88.1%$0.00026.7s76%
51Claude Opus 4.699.1%$0.0378.1s96%
52GPT-5.295.9%$0.01816.8s87%
53Claude 3.5 Sonnet100.0%$0.04313.9s100%
54o4 Mini High100.0%$0.02540.5s100%
55Z.AI GLM 598.8%$0.009151.4s94%
56Grok 4100.0%$0.03137.2s100%
57GPT-5 Mini95.9%$0.007045.5s88%
58Gemini 2.5 Flash (Reasoning)90.8%$0.008211.8s71%
59GPT-4.1 Nano86.8%$0.00032.8s63%
60Gemini 2.5 Pro97.4%$0.03422.8s92%
61Claude Opus 4.6 (Reasoning)100.0%$0.05521.4s100%
62Aion 2.098.8%$0.00801.2m94%
63Cohere Command R+ (Aug. 2024)91.3%$0.01417.9s71%
64Llama 3.1 70B91.7%$0.001616.0s56%
65ByteDance Seed 1.697.9%$0.00731.3m93%
66Z.AI GLM 4.7 Flash95.5%$0.00191.2m87%
67DeepSeek V3.192.6%$0.001226.1s57%
68Gemini 3 Pro (Preview)100.0%$0.05535.6s100%
69Mistral Large89.6%$0.0117.6s53%
70GPT-5.196.4%$0.03543.6s90%
71Llama 3.1 8B84.5%$0.000110.0s51%
72GPT-5 Nano96.4%$0.00431.3m86%
73Claude Sonnet 4.6 (Reasoning)98.8%$0.05236.0s94%
74ByteDance Seed 1.6 Flash86.3%$0.001139.3s66%
75Z.AI GLM 4.799.1%$0.00981.7m95%
76Arcee AI: Trinity Large (Preview)89.3%$0.000020.5s40%
77Gemma 3 12B79.7%$0.000314.1s47%
78Qwen 3.5 397B A17B97.6%$0.0121.8m94%
79GPT-598.8%$0.0491.3m96%
80Gemini 3.1 Pro (Preview)98.9%$0.0651.0m95%
81Claude Opus 4100.0%$0.11013.5s100%
82MoonshotAI: Kimi K2.598.3%$0.0132.4m94%
83Rocinante 12B60.4%$0.001325.6s9%
84Mistral NeMO28.2%$0.00061.4s0%
95.65%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 8B100100100100100100.0%
GPT-5 Mini1001001001009699.3%
MoonshotAI: Kimi K2.51001001001009699.3%
Grok 4 Fast1001001001009699.3%
Gemini 2.5 Flash Lite (Reasoning)1001001001009699.3%
Gemini 2.5 Flash Lite1001001001009699.3%
Mistral Small Creative1001001001009699.3%
Claude 3 Haiku1001001001009699.3%
GPT-4.1 Mini1001001001009699.2%
Ministral 3 8B1001001001009699.2%
Arcee AI: Trinity Mini1001001001009599.1%
GPT-4o Mini (temp=1)1001001001009599.0%
Z.AI GLM 4.51001001001009398.6%
Gemini 3.1 Pro (Preview)100100100969698.5%
Qwen 3.5 397B A17B100100100969698.5%
Gemini 2.5 Pro100100100969698.5%
Z.AI GLM 4.7100100100969698.5%
Ministral 3 3B100100100969698.5%
Minimax M2.5100100100969698.4%
GPT-5.21001001001009097.9%
Aion 2.01001001001009097.9%
Ministral 3B100100100969397.8%
Gemma 3 27B10010096969697.8%
GPT-4o, May 13th (temp=1)1001001001008897.7%
Llama 3.1 70B10010096959497.1%
GPT-5.110010096969397.1%
Z.AI GLM 4.7 Flash100100100968997.1%
Llama 3.1 8B10010095959597.0%
Llama 3.1 Nemotron 70B1009696969696.9%
Gemma 3 12B10010096969296.9%
GPT-51009696969396.3%
Gemini 2.5 Flash (Reasoning)1009696969396.3%
Ministral 3 14B10010096939095.7%
Gemma 3 4B1009595959195.5%
Cohere Command R+ (Aug. 2024)10010096929095.5%
GPT-4.1969693939093.6%
GPT-5 Nano100100100966993.0%
GPT-4.1 Nano1009590908391.7%
ByteDance Seed 1.6 Flash939384838186.7%
DeepSeek V3.1100100100100080.0%
Mistral Large100100100100080.0%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Rocinante 12B100100920058.3%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Aion 2.0100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 2.5 Flash Lite (Reasoning)100100100100100100.0%
Mistral Large 3100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Mistral Large 2100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3B100100100100100100.0%
ByteDance Seed 1.61001001001009398.7%
GPT-5.21001001001009398.7%
GPT-4.11001001001009398.7%
Gemini 2.5 Pro1001001001009398.7%
Mistral Large1001001001009398.7%
Mistral Small Creative1001001001009398.7%
Z.AI GLM 4.7 Flash1001001001009398.6%
ByteDance Seed 1.6 Flash1001001001009298.5%
Ministral 3 3B1001001001009098.0%
GPT-5 Mini100100100939397.3%
Gemini 2.5 Flash Lite100100100939397.2%
Ministral 3 14B100100100939397.2%
Ministral 8B100100100939397.2%
GPT-4.1 Nano1001001001008196.3%
Arcee AI: Trinity Mini100100100919096.2%
Cohere Command R+ (Aug. 2024)939292918590.5%
Gemini 2.5 Flash (Reasoning)939393888290.0%
Llama 3.1 8B1009186797686.3%
Gemma 3 4B919185837985.7%
Arcee AI: Trinity Large (Preview)100100100100080.0%
Gemma 3 12B93939393074.5%
Rocinante 12B100858378069.1%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
o4 Mini100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Gemini 3.1 Pro (Preview)1001001001009498.8%
GPT-51001001001009498.8%
Claude Sonnet 4.61001001001009498.8%
Aion 2.01001001001009498.8%
DeepSeek-V2 Chat1001001001009498.8%
Z.AI GLM 4.61001001001009398.7%
Gemini 3 Flash (Preview)1001001001009398.7%
GPT-4.1 Mini1001001001009398.7%
DeepSeek V3.11001001001009398.7%
DeepSeek V3.21001001001009398.7%
Arcee AI: Trinity Large (Preview)1001001001009398.7%
Ministral 3 8B1001001001009398.7%
Grok 4 Fast1001001001009398.6%
DeepSeek V3 (2024-12-26)1001001001009398.6%
Hermes 3 405B1001001001009398.6%
GPT-4o Mini (temp=1)1001001001009198.2%
GPT-5.2100100100949497.5%
Gemini 2.5 Pro1001001001008897.5%
Mistral Small Creative100100100949497.5%
Ministral 3 14B100100100949497.5%
MoonshotAI: Kimi K2.5100100100949397.4%
Gemini 3 Flash (Preview, Reasoning)100100100939397.3%
Gemini 2.5 Flash Lite100100100949397.3%
Writer: Palmyra X5100100100939397.2%
Qwen 2.5 72B100100100939297.0%
Ministral 8B100100100939297.0%
Arcee AI: Trinity Mini100100100939196.8%
ByteDance Seed 1.6100100100948896.4%
Claude Opus 4.610010094949496.3%
Gemini 2.5 Flash Lite (Reasoning)100100100948896.3%
Ministral 3B1001001001008196.3%
GPT-4.1100100100938896.2%
WizardLM 2 8x22b10010093938894.9%
Gemini 2.5 Flash10010094888894.0%
Llama 3.1 Nemotron 70B1009392929294.0%
GPT-5 Mini1009493938893.7%
GPT-5.1949494949393.7%
Qwen 3.5 397B A17B949493939393.5%
Hermes 3 70B1009292919093.0%
GPT-5 Nano949494938892.6%
Z.AI GLM 4.7 Flash949393938892.2%
Ministral 3 3B939387878789.2%
Gemma 3 12B938888888888.8%
Mistral Large 3888888888888.2%
Mistral Large 2888888888888.2%
Mistral Large888888888888.2%
Cohere Command R+ (Aug. 2024)100100100864786.5%
Gemini 2.5 Flash (Reasoning)100100100794785.2%
Gemma 3 4B868080797379.5%
Llama 3.1 70B100939291075.1%
Rocinante 12B1001008785074.3%
ByteDance Seed 1.6 Flash1007468685272.5%
Llama 3.1 8B1001008069069.8%
GPT-4.1 Nano837270674166.6%
Mistral NeMO10090830054.7%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Ministral 3 3B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)1001001001009298.3%
Aion 2.01001001001009298.3%
Gemini 3.1 Pro (Preview)1001001001009298.3%
Qwen 3.5 397B A17B1001001001009298.3%
Z.AI GLM 4.61001001001009298.3%
DeepSeek V3 (2025-03-24)1001001001009298.3%
Gemini 2.5 Flash Lite1001001001009298.3%
Llama 3.1 Nemotron 70B1001001001009298.3%
Arcee AI: Trinity Large (Preview)1001001001009298.3%
Ministral 8B1001001001009298.3%
Ministral 3B1001001001009298.3%
Z.AI GLM 4.71001001001009098.0%
DeepSeek V3.21001001001008396.7%
MoonshotAI: Kimi K2.5100100100929296.7%
ByteDance Seed 1.6100100100929296.7%
o4 Mini100100100929296.7%
DeepSeek V3 (2024-12-26)100100100929296.7%
Mistral Medium 3.1100100100929296.7%
Qwen 2.5 72B100100100928595.3%
Mistral Small Creative100100100928595.3%
Claude Sonnet 4.6 (Reasoning)10010092929295.0%
GPT-5.110010092929295.0%
Z.AI GLM 510010092929295.0%
Gemini 2.5 Pro10010092929295.0%
Grok 4 Fast10010092929295.0%
DeepSeek-V2 Chat10010092929295.0%
Minimax M2.510010092929194.8%
WizardLM 2 8x22b10010092929194.8%
Hermes 3 70B10010091919194.5%
Llama 3.1 70B10010092919094.5%
Z.AI GLM 4.7 Flash10010092909094.3%
GPT-5 Mini1009292929293.3%
GPT-4.11009292929293.3%
Mistral Small 3.2 24B10010092908593.3%
GPT-4.1 Nano100100100897592.8%
Cohere Command R+ (Aug. 2024)1009291919092.7%
Gemini 2.5 Flash Lite (Reasoning)10010092858391.9%
Mistral Large 3929292929291.7%
GPT-4o, May 13th (temp=1)929292929291.7%
Mistral Large 2929292929291.7%
DeepSeek V3.1929292929291.7%
Mistral Large929292929291.7%
Gemini 2.5 Flash (Reasoning)100100100857391.6%
Gemma 3 4B1009090898991.6%
GPT-4o Mini (temp=1)1008989898991.1%
Claude 3 Haiku10010092857991.0%
Writer: Palmyra X5919191919190.9%
Claude Haiku 4.5929292928390.0%
GPT-5.210010085857989.6%
GPT-4o Mini (temp=0)898989898988.9%
ByteDance Seed 1.6 Flash10010091796987.6%
Arcee AI: Trinity Mini1008983828086.8%
Gemma 3 27B929285857986.2%
Ministral 3 14B929285857986.2%
Llama 3.1 8B909082828084.7%
Gemma 3 12B676056565658.8%
Mistral NeMO100100900058.0%
Rocinante 12B10010000040.0%