Accuracy

Test: Codex Extraction

Avg. Score
80.6%
Scenarios
4

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3 Flash (Preview)90.9%$0.00273.9s85%
2Grok 4 Fast89.1%$0.00128.7s80%
3Qwen 3.5 Plus (2026-02-15)90.4%$0.003010.6s81%
4Mistral Small Creative87.4%$0.00063.9s77%
5Gemini 3 Flash (Preview, Reasoning)93.5%$0.009622.2s86%
6Mistral Medium 3.187.8%$0.00265.8s76%
7Claude Haiku 4.586.8%$0.00734.5s80%
8Mistral Large 387.8%$0.00278.2s76%
9Claude Sonnet 4.691.4%$0.0227.4s86%
10Grok 4.1 Fast89.5%$0.001722.1s79%
11Gemini 2.5 Flash82.8%$0.00232.5s75%
12DeepSeek-V2 Chat86.5%$0.001914.8s76%
13Mistral Large 288.0%$0.0118.3s78%
14Claude Opus 4.695.1%$0.0378.1s90%
15Gemini 2.5 Flash (Reasoning)86.6%$0.008211.8s79%
16Claude Opus 4.594.9%$0.0377.7s89%
17Z.AI GLM 4.586.9%$0.002816.8s77%
18DeepSeek V3 (2024-12-26)84.0%$0.001713.5s76%
19Writer: Palmyra X580.9%$0.00527.5s74%
20Mistral Small 3.2 24B78.9%$0.00054.4s70%
21Gemini 2.5 Flash Lite (Reasoning)84.3%$0.002115.6s72%
22Ministral 3 14B80.7%$0.00096.0s68%
23Claude Sonnet 4.587.9%$0.0226.6s75%
24Ministral 3 8B79.7%$0.00063.3s65%
25Claude 3.7 Sonnet86.3%$0.0217.8s76%
26Ministral 8B77.5%$0.00043.8s67%
27DeepSeek V3.286.4%$0.001136.7s76%
28GPT-4o, Aug. 6th (temp=0)77.9%$0.01004.0s74%
29Gemini 2.5 Flash Lite77.6%$0.00051.9s65%
30Z.AI GLM 4.688.3%$0.005738.8s78%
31Claude Sonnet 486.7%$0.0227.9s74%
32Hermes 3 405B83.4%$0.004017.8s69%
33Minimax M2.584.9%$0.002330.1s73%
34Gemini 2.5 Pro92.4%$0.03422.8s85%
35DeepSeek V3 (2025-03-24)84.7%$0.001427.0s70%
36Grok 494.3%$0.03137.2s88%
37Z.AI GLM 591.1%$0.009151.4s83%
38Qwen 2.5 72B77.2%$0.000811.0s67%
39GPT-5 Mini86.9%$0.007045.5s80%
40GPT-5.283.0%$0.01816.8s74%
41WizardLM 2 8x22b82.5%$0.002631.3s70%
42GPT-4.175.6%$0.00734.7s66%
43o4 Mini83.8%$0.01421.5s72%
44GPT-4o, Aug. 6th (temp=1)74.4%$0.00923.8s66%
45Aion 2.092.6%$0.00801.2m86%
46GPT-4o, May 13th (temp=0)79.8%$0.0254.3s72%
47Claude Opus 4.6 (Reasoning)94.5%$0.05521.4s89%
48GPT-4.1 Mini73.5%$0.00156.6s60%
49o4 Mini High89.0%$0.02540.5s81%
50Gemma 3 4B70.1%$0.00026.7s63%
51Claude 3 Haiku70.8%$0.00175.5s62%
52GPT-5.192.1%$0.03543.6s86%
53Arcee AI: Trinity Mini69.4%$0.00035.8s61%
54Hermes 3 70B76.6%$0.001215.8s59%
55DeepSeek V3.185.4%$0.001226.1s54%
56Mistral Large83.3%$0.0117.6s52%
57Ministral 3B65.4%$0.00021.8s59%
58GPT-4o, May 13th (temp=1)76.2%$0.0254.0s67%
59Claude 3.5 Sonnet83.1%$0.04313.9s76%
60Claude Sonnet 4.6 (Reasoning)91.8%$0.05236.0s86%
61Ministral 3 3B63.6%$0.00041.8s57%
62GPT-4o Mini (temp=0)66.1%$0.00057.9s57%
63GPT-4o Mini (temp=1)66.7%$0.00068.0s56%
64Gemma 3 27B67.8%$0.000514.5s58%
65Llama 3.1 70B71.9%$0.001616.0s51%
66ByteDance Seed 1.6 Flash72.1%$0.001139.3s61%
67Gemini 3 Pro (Preview)90.1%$0.05535.6s80%
68Z.AI GLM 4.790.0%$0.00981.7m81%
69Llama 3.1 Nemotron 70B69.9%$0.005017.3s49%
70GPT-594.0%$0.0491.3m89%
71ByteDance Seed 1.681.1%$0.00731.3m70%
72Qwen 3.5 397B A17B91.3%$0.0121.8m82%
73Arcee AI: Trinity Large (Preview)72.5%$0.000020.5s39%
74Gemma 3 12B63.0%$0.000314.1s46%
75Cohere Command R+ (Aug. 2024)66.1%$0.01417.9s51%
76GPT-4.1 Nano51.8%$0.00032.8s42%
77Llama 3.1 8B56.2%$0.000110.0s37%
78Claude Opus 491.4%$0.11013.5s84%
79GPT-5 Nano71.8%$0.00431.3m59%
80Gemini 3.1 Pro (Preview)88.3%$0.0651.0m78%
81Z.AI GLM 4.7 Flash69.1%$0.00191.2m52%
82MoonshotAI: Kimi K2.588.9%$0.0132.4m81%
83Rocinante 12B39.8%$0.001325.6s20%
84Mistral NeMO22.4%$0.00061.4s0%
80.55%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3 Pro (Preview)969695959595.2%
Gemini 3 Flash (Preview, Reasoning)979695939294.3%
Aion 2.0989493929093.6%
Claude Opus 4.5959393939393.1%
Claude Opus 4.6939392929292.2%
Claude Opus 4929292929191.7%
Claude Opus 4.6 (Reasoning)969191918991.5%
Gemini 3 Flash (Preview)949391908991.3%
o4 Mini High949490898891.2%
GPT-5939391908891.1%
Z.AI GLM 5939391888890.5%
Grok 4929190908689.9%
Claude Sonnet 4.6 (Reasoning)929089898889.7%
Gemini 2.5 Pro929189888689.4%
Gemini 3.1 Pro (Preview)929190898288.9%
Qwen 3.5 Plus (2026-02-15)898888888687.4%
GPT-5.1898887878587.3%
Claude Sonnet 4.6908987858587.2%
DeepSeek-V2 Chat919087868186.9%
Mistral Medium 3.1878787868586.7%
Gemini 2.5 Flash (Reasoning)908987858386.6%
MoonshotAI: Kimi K2.5918887867886.0%
Z.AI GLM 4.7919185838085.8%
Mistral Small Creative898785848385.6%
Qwen 3.5 397B A17B898685848385.4%
Z.AI GLM 4.5888685858285.4%
Grok 4 Fast888686858185.3%
GPT-5.2888785848185.1%
Gemini 2.5 Flash888885838285.1%
Claude Haiku 4.5888584848384.8%
Claude 3.7 Sonnet898484838384.6%
Gemini 2.5 Flash Lite (Reasoning)928881807984.2%
Mistral Large 2868685847984.1%
o4 Mini938987826884.1%
Z.AI GLM 4.6878786818084.0%
GPT-5 Mini898383838083.7%
Claude Sonnet 4918885787683.6%
DeepSeek V3.2898483817983.2%
Mistral Large 3878580808082.7%
Hermes 3 405B888583807782.6%
WizardLM 2 8x22b888584807181.6%
Mistral Small 3.2 24B858483817481.3%
Grok 4.1 Fast888880767380.9%
Claude 3.5 Sonnet858482777580.7%
Claude Sonnet 4.5828181807880.5%
DeepSeek V3 (2024-12-26)848278777779.8%
GPT-4o, Aug. 6th (temp=0)818181817679.6%
Hermes 3 70B858382796879.2%
Writer: Palmyra X5847776757577.3%
GPT-4o, May 13th (temp=0)807878777377.2%
Minimax M2.5848280756577.1%
Ministral 3 14B797777767476.7%
Llama 3.1 Nemotron 70B818078766676.1%
ByteDance Seed 1.6837774747276.0%
DeepSeek V3 (2025-03-24)838374706775.3%
Llama 3.1 70B797874737074.6%
GPT-4.1 Mini797777706874.1%
GPT-4.1767474746973.3%
GPT-4o, Aug. 6th (temp=1)777674716973.3%
Gemini 2.5 Flash Lite787670696571.7%
DeepSeek V3.193938684071.3%
GPT-4o Mini (temp=1)757474706371.2%
Ministral 8B737272696871.0%
GPT-4o, May 13th (temp=1)767471696570.9%
Ministral 3 8B737371696870.8%
GPT-5 Nano797877724770.3%
GPT-4o Mini (temp=0)757270696470.1%
Qwen 2.5 72B767272666570.0%
Cohere Command R+ (Aug. 2024)857267646269.9%
Gemma 3 4B737271676569.6%
Arcee AI: Trinity Mini757169676569.3%
Claude 3 Haiku716966656467.0%
ByteDance Seed 1.6 Flash787770634567.0%
Mistral Large87838280066.4%
Ministral 3B696563625963.7%
Gemma 3 12B686767645363.7%
Ministral 3 3B666361616162.2%
Z.AI GLM 4.7 Flash806160594260.5%
Gemma 3 27B686758584759.4%
Arcee AI: Trinity Large (Preview)74726862055.4%
Llama 3.1 8B76726656054.1%
GPT-4.1 Nano584846393244.6%
Rocinante 12B7771680043.1%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6999999999998.6%
Claude Opus 4.6 (Reasoning)999999979397.3%
Claude Opus 4.5999996969697.1%
GPT-5969595959594.9%
Grok 4979494949194.2%
GPT-5.1969595919193.5%
Gemini 3.1 Pro (Preview)969695918993.3%
Gemini 3 Pro (Preview)959493918892.3%
Claude Sonnet 4.6929292929192.1%
Gemini 3 Flash (Preview, Reasoning)969392918691.7%
Aion 2.0959292908991.5%
Z.AI GLM 4.7989490888691.3%
Claude Sonnet 4.6 (Reasoning)929291919091.2%
Mistral Large 2949191908890.9%
Mistral Large 3929292928690.8%
Gemini 2.5 Pro939290908590.1%
Z.AI GLM 4.6969492848389.8%
Qwen 3.5 397B A17B969291878289.5%
DeepSeek V3.1909089898989.4%
Mistral Large929090888689.1%
Claude Opus 4959190868489.0%
DeepSeek-V2 Chat919191888188.4%
DeepSeek V3.2919189878488.3%
GPT-5 Mini929188858487.9%
Qwen 3.5 Plus (2026-02-15)919088858487.5%
Grok 4.1 Fast909088868387.4%
Grok 4 Fast939189858087.4%
Gemini 3 Flash (Preview)918986868587.3%
Claude 3.5 Sonnet908988868387.2%
DeepSeek V3 (2025-03-24)888787868586.6%
MoonshotAI: Kimi K2.5898989837985.8%
Gemini 2.5 Flash (Reasoning)898785858285.6%
DeepSeek V3 (2024-12-26)918988827885.6%
Minimax M2.5898786848085.3%
o4 Mini High908887827985.3%
Claude Haiku 4.5868684848485.2%
Claude Sonnet 4.5868685858385.1%
Z.AI GLM 4.5938583827984.5%
Gemini 2.5 Flash Lite (Reasoning)908884827684.0%
Mistral Small Creative858584838384.0%
Z.AI GLM 5878584828183.7%
GPT-5.2898582818083.6%
WizardLM 2 8x22b888883817683.2%
GPT-5 Nano888682817882.9%
Mistral Small 3.2 24B848484817782.2%
Claude 3.7 Sonnet858282818182.1%
ByteDance Seed 1.6868381807982.0%
Mistral Medium 3.1848482808081.9%
Writer: Palmyra X5858482797881.7%
Ministral 3 8B828282827981.6%
Claude Sonnet 4868181807881.2%
GPT-4o, May 13th (temp=0)838382807881.1%
Ministral 3 14B848282817781.1%
Gemini 2.5 Flash848482797680.8%
Qwen 2.5 72B828281807680.3%
Ministral 8B828181797780.1%
GPT-4o, Aug. 6th (temp=0)838080807780.0%
o4 Mini878583757079.9%
Gemini 2.5 Flash Lite878281806779.3%
GPT-4.1 Mini867979777679.2%
GPT-4o, May 13th (temp=1)848281806878.7%
GPT-4o, Aug. 6th (temp=1)848280796578.0%
Hermes 3 405B827977777478.0%
GPT-4.1818080767277.9%
Llama 3.1 70B808080796677.0%
Hermes 3 70B817774747275.4%
Claude 3 Haiku817974706874.5%
Z.AI GLM 4.7 Flash828170686673.4%
GPT-4o Mini (temp=1)837669676571.8%
ByteDance Seed 1.6 Flash827867676271.1%
Gemma 3 4B737272706871.1%
Arcee AI: Trinity Mini787469686169.8%
Arcee AI: Trinity Large (Preview)86868278066.5%
GPT-4o Mini (temp=0)737067626066.3%
Cohere Command R+ (Aug. 2024)726665635864.8%
Llama 3.1 8B706861605662.8%
Ministral 3B666363626062.8%
Gemma 3 27B727062575362.7%
Ministral 3 3B636362616161.6%
GPT-4.1 Nano665756545357.3%
Llama 3.1 Nemotron 70B73716966055.9%
Rocinante 12B76717060055.4%
Gemma 3 12B71716765054.9%
Mistral NeMO000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Grok 4999695939295.2%
Claude Opus 4.6 (Reasoning)969595959594.9%
Claude Opus 4.6959595959394.3%
Gemini 3.1 Pro (Preview)969594949294.3%
Gemini 3 Flash (Preview, Reasoning)989594939294.3%
Z.AI GLM 5969595929193.9%
Grok 4.1 Fast969494939293.7%
Gemini 3 Flash (Preview)959593939193.6%
Qwen 3.5 397B A17B959593928992.9%
Claude Opus 4.5949494948892.9%
GPT-5949393929292.7%
Gemini 2.5 Pro949393919092.2%
GPT-5.1959292928892.0%
Gemini 3 Pro (Preview)949492918891.6%
Claude Sonnet 4.6939191919191.4%
MoonshotAI: Kimi K2.5959390898991.2%
Claude Sonnet 4.6 (Reasoning)929191909090.7%
GPT-5 Mini939090908990.6%
Aion 2.0929191918690.3%
Grok 4 Fast949290888790.1%
Claude Haiku 4.5929090908889.7%
Z.AI GLM 4.6949290878389.3%
o4 Mini High929088888789.0%
Qwen 3.5 Plus (2026-02-15)918989888788.9%
Z.AI GLM 4.7959189858488.8%
Claude Sonnet 4.5939087878588.4%
Claude Opus 4929088868688.4%
GPT-5.2929288878087.8%
Gemini 2.5 Flash908989888387.7%
DeepSeek V3.1898888858487.0%
Minimax M2.5928786868587.0%
Claude 3.5 Sonnet888887868587.0%
o4 Mini908888838286.2%
Claude Sonnet 4918686858386.2%
Gemini 2.5 Flash (Reasoning)898884848385.7%
Writer: Palmyra X5898987818085.4%
Claude 3.7 Sonnet908885847985.2%
DeepSeek V3.2878785848185.1%
Mistral Medium 3.1878684848485.0%
Mistral Small Creative888884838184.7%
Z.AI GLM 4.5878584837983.4%
Mistral Large 2878281818182.4%
Mistral Large838382818182.0%
DeepSeek V3 (2024-12-26)898878777681.7%
Mistral Large 3838181818181.6%
GPT-4.1868483817181.1%
Arcee AI: Trinity Large (Preview)888379777680.6%
DeepSeek V3 (2025-03-24)848279797880.5%
DeepSeek-V2 Chat898883716879.9%
GPT-4o, May 13th (temp=0)818079797979.9%
ByteDance Seed 1.6 Flash838381767479.3%
GPT-4o, May 13th (temp=1)838280767479.1%
ByteDance Seed 1.6888077767278.6%
Hermes 3 405B827979777778.5%
GPT-4o, Aug. 6th (temp=0)787878787878.1%
GPT-4.1 Mini868178727177.5%
GPT-4o, Aug. 6th (temp=1)817877767577.3%
Gemini 2.5 Flash Lite (Reasoning)807979766976.9%
WizardLM 2 8x22b867975736675.8%
Mistral Small 3.2 24B777777757075.3%
GPT-4o Mini (temp=0)767676767175.0%
Ministral 3 8B797675727074.3%
Qwen 2.5 72B777574747174.2%
Gemma 3 27B797971717074.1%
Ministral 3 14B777575727274.1%
Ministral 8B777372717072.8%
Claude 3 Haiku827575666672.6%
GPT-5 Nano817973676372.4%
Gemini 2.5 Flash Lite757471706971.8%
Hermes 3 70B817373686271.6%
GPT-4o Mini (temp=1)767169676469.4%
Z.AI GLM 4.7 Flash847573694268.5%
Llama 3.1 Nemotron 70B766867656568.3%
Ministral 3 3B747371645968.1%
Ministral 3B747070655667.3%
Arcee AI: Trinity Mini777169615867.0%
Gemma 3 12B676762616063.6%
Gemma 3 4B756361595562.4%
Cohere Command R+ (Aug. 2024)696560585361.0%
Llama 3.1 70B80796963058.1%
GPT-4.1 Nano535351484550.1%
Llama 3.1 8B66665655048.3%
Mistral NeMO8272720045.3%
Rocinante 12B72605616040.9%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 2.5 Pro989898989697.9%
Qwen 3.5 Plus (2026-02-15)989898989697.9%
Grok 4989898979697.7%
Claude Sonnet 4.5989898969697.5%
Mistral Medium 3.1989897979697.5%
GPT-5989797979797.3%
Qwen 3.5 397B A17B989898979497.3%
Claude Opus 4.5989898949496.7%
DeepSeek V3 (2025-03-24)999797959496.5%
Claude Opus 4989898949396.5%
Mistral Large 3989895959596.3%
Z.AI GLM 5979797959496.2%
Grok 4.1 Fast989696969496.0%
Claude Sonnet 4969696969696.0%
Claude Sonnet 4.6 (Reasoning)989696949495.7%
GPT-5.1989696959295.6%
Mistral Large989895959195.6%
Claude Opus 4.6959595959595.3%
Mistral Small Creative959595959595.2%
Aion 2.0989897919094.9%
Claude Sonnet 4.6969696939394.8%
Hermes 3 405B959595949394.6%
Mistral Large 2989595929294.6%
Z.AI GLM 4.5959595959294.3%
Claude Opus 4.6 (Reasoning)949494949494.1%
DeepSeek V3.1979796918993.9%
Gemini 3 Flash (Preview, Reasoning)989797977993.9%
Z.AI GLM 4.7989494939093.9%
Grok 4 Fast979594948893.7%
Claude 3.7 Sonnet949494929293.2%
MoonshotAI: Kimi K2.5969492909092.4%
Ministral 3 8B949494919092.3%
Gemini 2.5 Flash Lite (Reasoning)969493928792.3%
Gemini 3 Flash (Preview)949492928591.5%
Ministral 3 14B959292888891.0%
DeepSeek-V2 Chat969292908490.9%
o4 Mini High979594907890.7%
Minimax M2.5969695897590.2%
Z.AI GLM 4.6969689858490.1%
WizardLM 2 8x22b979593877589.4%
DeepSeek V3.2989592897088.9%
DeepSeek V3 (2024-12-26)929090878688.9%
Gemini 2.5 Flash (Reasoning)989190887588.4%
ByteDance Seed 1.6989286818187.7%
Claude Haiku 4.5949090907487.7%
Gemini 2.5 Flash Lite928887868587.5%
Arcee AI: Trinity Large (Preview)959287857887.5%
Ministral 8B958783838286.1%
GPT-5 Mini939292797385.6%
o4 Mini969493726984.8%
Qwen 2.5 72B908783818184.4%
Gemini 3 Pro (Preview)997878777581.4%
GPT-4o, May 13th (temp=0)967977777681.1%
Hermes 3 70B1009188734880.0%
Llama 3.1 Nemotron 70B828279797579.3%
Writer: Palmyra X5818179797579.3%
Llama 3.1 70B828178777077.9%
Gemini 2.5 Flash787878777777.7%
Claude 3.5 Sonnet797777777777.5%
Gemma 3 4B817876767577.3%
Gemini 3.1 Pro (Preview)787878757576.9%
Mistral Small 3.2 24B898174726876.8%
GPT-4o, May 13th (temp=1)897877756276.3%
GPT-5.2777675757375.3%
Gemma 3 27B797776727275.0%
GPT-4o, Aug. 6th (temp=0)747474747273.8%
Z.AI GLM 4.7 Flash927870686173.7%
Arcee AI: Trinity Mini797471706371.5%
ByteDance Seed 1.6 Flash817670676271.2%
GPT-4.1777472656270.0%
Gemma 3 12B797571715469.9%
GPT-4o, Aug. 6th (temp=1)817065656468.9%
Claude 3 Haiku757169686268.9%
Cohere Command R+ (Aug. 2024)888171673668.6%
Ministral 3B727068676368.0%
GPT-4.1 Mini777370633463.1%
Ministral 3 3B656463625862.6%
GPT-5 Nano787058564761.7%
Llama 3.1 8B706363604259.6%
GPT-4.1 Nano796653413755.1%
GPT-4o Mini (temp=1)645454505054.5%
GPT-4o Mini (temp=0)545454545052.9%
Mistral NeMO8180610044.3%
Rocinante 12B692900019.7%