Matches word count

Test: Write N of X

Avg. Score
64.8%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Stealth: Aurora Alpha99.5%2.4s94%
2Gemini 3 Flash (Preview)97.8%$0.00111.5s89%
3GPT-5 Nano97.9%$0.000819.2s89%
4GPT-5.196.4%$0.00739.2s78%
5GPT-5.296.9%$0.00879.6s74%
6GPT-5 Mini94.2%$0.002411.6s62%
7GPT-4o, Aug. 6th (temp=0)91.5%$0.00482.2s61%
8o4 Mini93.1%$0.006914.8s63%
9GPT-4o Mini (temp=1)94.1%$0.000342.7s64%
10GPT-4o, Aug. 6th (temp=1)88.6%$0.00482.0s52%
11o4 Mini High95.3%$0.01036.6s69%
12Z.AI GLM 599.0%$0.0161.6m96%
13GPT-4.185.9%$0.00202.5s44%
14Gemini 2.5 Pro91.5%$0.01310.9s55%
15GPT-591.0%$0.01318.1s57%
16GPT-4o Mini (temp=0)85.6%$0.000310.7s38%
17Gemini 3 Pro (Preview)98.8%$0.04628.1s93%
18Grok 485.5%$0.01013.3s43%
19Z.AI GLM 4.7 Flash94.9%$0.00261.8m67%
20GPT-4o, May 13th (temp=1)84.6%$0.009117.6s35%
21Z.AI GLM 4.799.6%$0.00983.0m98%
22Grok 4 Fast77.4%$0.00043.4s23%
23GPT-4.1 Mini76.6%$0.00042.3s21%
24MoonshotAI: Kimi K2.596.1%$0.0162.0m75%
25Mistral Large 374.9%$0.00102.7s20%
26Claude Sonnet 4.574.8%$0.00704.1s27%
27GPT-4.1 Nano71.6%$0.00012.2s19%
28Gemini 2.5 Flash Lite71.1%$0.0002711ms17%
29Mistral Medium 3.170.6%$0.00092.9s19%
30Grok 4.1 Fast73.5%$0.00076.3s18%
31Claude Opus 4.579.5%$0.0124.4s24%
32GPT-4o, May 13th (temp=0)79.6%$0.009118.7s26%
33Claude Opus 4.676.6%$0.0126.0s24%
34Claude 3.7 Sonnet72.8%$0.00694.8s19%
35DeepSeek V3 (2025-03-24)66.8%$0.00056.5s16%
36Minimax M2.570.1%$0.001217.6s15%
37Gemini 3.1 Pro (Preview)100.0%$0.0751.0m100%
38Claude Sonnet 4.667.4%$0.00713.5s16%
39Llama 3.1 Nemotron 70B62.1%$0.00063.3s12%
40DeepSeek V3 (2024-12-26)62.9%$0.00063.8s11%
41Z.AI GLM 4.561.5%$0.00063.7s11%
42Claude 3.5 Sonnet67.2%$0.006710.3s16%
43Claude Sonnet 465.5%$0.00703.8s14%
44Z.AI GLM 4.676.0%$0.003648.5s20%
45Ministral 3 14B60.0%$0.00031.3s8%
46Llama 3.1 8B58.6%$0.0003694ms8%
47Ministral 3 8B57.4%$0.00031.0s9%
48Gemma 3 12B57.4%$0.00013.1s9%
49DeepSeek V3.156.8%$0.00048.5s11%
50Hermes 3 405B60.9%$0.000011.0s7%
51Qwen 3.5 397B A17B99.9%$0.0303.5m99%
52Llama 3.1 70B52.6%$0.00151.3s9%
53Qwen 3.5 Plus (2026-02-15)57.3%$0.00096.6s4%
54Ministral 3 3B52.4%$0.0002716ms5%
55Qwen 2.5 72B51.3%$0.00073.1s5%
56Gemma 3 27B49.0%$0.00023.8s7%
57Claude Haiku 4.550.4%$0.00232.3s6%
58Mistral Small 3.2 24B50.3%$0.00022.3s2%
59DeepSeek V3.247.7%$0.00056.1s3%
60DeepSeek-V2 Chat48.9%$0.00038.1s2%
61ByteDance Seed 1.667.3%$0.00611.0m17%
62Arcee AI: Trinity Large (Preview)45.8%$0.00003.1s2%
63Claude Opus 475.4%$0.03521.6s22%
64Mistral Small Creative39.9%$0.0002927ms0%
65Claude 3.5 Haiku42.1%$0.00183.3s0%
66ByteDance Seed 1.6 Flash46.0%$0.001017.0s0%
67Cohere Command R+ (Aug. 2024)43.1%$0.00512.7s0%
68Gemini 2.5 Flash35.7%$0.00051.1s0%
69Arcee AI: Trinity Mini31.5%$0.00011.7s0%
70Claude 3 Haiku32.9%$0.000517.9s0%
71Writer: Palmyra X525.8%$0.00188.3s0%
72Ministral 8B18.3%$0.00021.4s0%
73Mistral NeMO18.7%$0.00032.4s0%
74Mistral Large 222.8%$0.00423.0s0%
75Ministral 3B13.7%$0.0001710ms0%
76Hermes 3 70B15.8%$0.00074.8s0%
77WizardLM 2 8x22b17.7%$0.001519.8s0%
78Gemma 3 4B2.7%$0.00011.9s0%
79Rocinante 12B5.2%$0.000617.7s0%
80Mistral Large18.2%$0.01729.8s0%
64.83%

Individual Scenarios

words

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Grok 4 Fast100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Claude Sonnet 4.510010010010010010010010010010099.9%
Gemini 2.5 Flash Lite10010010010010010010010010010099.9%
Claude Opus 410010010010010010010010010010099.9%
Mistral Small Creative10010010010010010010010010010099.9%
Gemma 3 12B10010010010010010010010010010099.9%
Claude Opus 4.610010010010010010010010010010099.9%
Minimax M2.51001001001001001001001001009899.8%
GPT-4.1 Nano1001001001001001001001001009899.8%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001001009899.8%
Claude Haiku 4.51001001001001001001001001009899.8%
Mistral Medium 3.11001001001001001001001001009899.8%
Claude 3.5 Haiku100100100100100100100100989899.6%
GPT-4.1 Mini100100100100100100100100989899.6%
GPT-4o, May 13th (temp=1)10010010010010010010098989899.5%
DeepSeek-V2 Chat10010010010010010010098989899.5%
Z.AI GLM 4.510010010010010010010098989899.5%
GPT-4o, May 13th (temp=0)10010010010010010010098989899.5%
Mistral Large 310010010010010010010098989899.5%
Claude Opus 4.51001001001001001009898989899.3%
DeepSeek V3.1100100100100100989898989899.2%
Gemma 3 27B10010010010098989898989899.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100989299.0%
DeepSeek V3 (2025-03-24)100100100100100100100100989299.0%
Llama 3.1 8B10010010010010010010098989298.8%
Ministral 3 14B100100989898989898989898.7%
DeepSeek V3.21001001001001001009898989298.7%
Ministral 3 8B100100989898989898989298.1%
Claude Sonnet 4.61001001001001001009898929298.1%
DeepSeek V3 (2024-12-26)10010010010098989898929297.8%
Ministral 3 3B10010010010098989898987796.9%
Llama 3.1 Nemotron 70B1001001009898989892927795.5%
Claude 3 Haiku1001001009898989892927795.5%
Gemini 2.5 Flash10010010010098989292927795.0%
Hermes 3 405B100100100100100100100100982792.5%
Writer: Palmyra X510010010010010010010098922791.7%
Arcee AI: Trinity Mini9292929292929292927790.7%
Qwen 2.5 72B10098989898989292772788.1%
Llama 3.1 70B9898989292929277775487.2%
Arcee AI: Trinity Large (Preview)100100100100100100989827082.4%
WizardLM 2 8x22b1001001009898772700060.1%
Hermes 3 70B1001009892920000048.3%
Mistral NeMO1009292777727900047.6%
Mistral Large10010010077779200046.5%
Ministral 8B10098775490000033.8%
Ministral 3B9898922792000032.7%
Mistral Large 298920000000019.1%
Rocinante 12B92770000000017.0%
Gemma 3 4B5427200000008.3%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)10010010010010010010010010010099.9%
GPT-5.110010010010010010010010010010099.9%
Mistral Large 310010010010010010010010010010099.9%
GPT-4o, May 13th (temp=0)10010010010010010010010010010099.9%
o4 Mini High1001001001001001001001001009899.8%
Gemini 2.5 Pro1001001001001001001001001009899.8%
GPT-4o, May 13th (temp=1)1001001001001001001001001009899.8%
GPT-4.11001001001001001001001001009899.8%
Gemini 3 Flash (Preview)1001001001001001001001001009899.8%
Ministral 3 3B1001001001001001001001001009899.8%
GPT-4.1 Nano1001001001001001001001001009899.8%
GPT-4.1 Mini100100100100100100100100989899.6%
DeepSeek V3 (2025-03-24)100100100100100100100100989899.6%
Claude Sonnet 4.6100100100100100100100100989899.6%
DeepSeek V3 (2024-12-26)100100100100100100100100989899.6%
Claude 3.7 Sonnet100100100100100100100100989899.6%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100989899.6%
Grok 4.1 Fast10010010010010010010098989899.5%
Z.AI GLM 4.510010010010010010010098989899.5%
Qwen 3.5 Plus (2026-02-15)1001001001001001009898989899.4%
Z.AI GLM 4.61001001001001001009898989899.3%
Grok 41001001001001001001001001009299.2%
ByteDance Seed 1.6 Flash1001001001001001001001001009299.2%
Grok 4 Fast1001001001001001001001001009299.2%
Claude 3.5 Haiku100100100100100989898989899.2%
Claude Opus 4100100100100100989898989899.2%
Minimax M2.5100100100100100100100100989299.0%
ByteDance Seed 1.6100100100100100100100100989299.0%
Mistral Medium 3.110010010010010010010098989298.8%
GPT-4o, Aug. 6th (temp=0)9898989898989898989898.4%
Claude Sonnet 41001001009898989898989298.2%
Gemini 2.5 Flash Lite10010010010098989898929297.8%
Claude 3.5 Sonnet1001001001001001009898987797.2%
Ministral 3 14B1001001009898989892929297.0%
DeepSeek-V2 Chat100100100100100100100100927796.9%
Mistral Small 3.2 24B10098989898989892929296.7%
Gemma 3 12B10010010010098989892927795.7%
Qwen 2.5 72B100100100100100100100100925494.5%
Claude Opus 4.510010010010098989898925493.9%
Ministral 3 8B1001001009892929292927793.7%
Cohere Command R+ (Aug. 2024)100100989898989277777791.8%
Claude Opus 4.69898989892929292925490.8%
Llama 3.1 8B100100989898927777777789.7%
Hermes 3 405B100100100100100100989892088.9%
Claude Sonnet 4.510098989292929277775487.4%
DeepSeek V3.11001001001001001009892542787.1%
Mistral Small Creative9898929292929277775486.6%
Arcee AI: Trinity Large (Preview)10010010098989892772076.6%
Llama 3.1 Nemotron 70B100100100100987754549069.2%
Claude 3 Haiku1001001001009854545427268.8%
Claude Haiku 4.510010010098775454279962.8%
Gemma 3 27B98989277775454279258.9%
Gemini 2.5 Flash9892925454545499251.7%
DeepSeek V3.21009898925427220047.3%
WizardLM 2 8x22b989292000000028.3%
Ministral 3B927754000000022.3%
Writer: Palmyra X55454272792000017.3%
Arcee AI: Trinity Mini98540000000015.2%
Hermes 3 70B54540000000010.7%
Ministral 8B10000000000010.0%
Mistral NeMO920000000009.2%
Rocinante 12B920000000009.2%
Llama 3.1 70B772200000008.1%
Mistral Large279900000004.6%
Gemma 3 4B00000000000.0%
Mistral Large 200000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
GPT-5 Mini10010010010010010010010010010099.9%
GPT-4o Mini (temp=0)10010010010010010010010010010099.9%
Z.AI GLM 51001001001001001001001001009899.8%
MoonshotAI: Kimi K2.51001001001001001001001001009899.8%
GPT-5.2100100100100100100100100989899.7%
GPT-5 Nano10010010010010010010098989899.5%
Gemini 2.5 Pro10010010010010010010098989899.5%
Gemini 3 Flash (Preview)10010010010010010010098989899.5%
GPT-4o, Aug. 6th (temp=0)10010010010010010010098989899.5%
GPT-51001001001001001009898989899.3%
GPT-5.11001001001001001009898989298.7%
GPT-4o, May 13th (temp=0)9898989898989898989898.4%
Grok 410010010010010010010098929298.2%
Claude Opus 4.510010010010010010010098929298.2%
Claude Opus 41001001001001001009898929298.1%
GPT-4o Mini (temp=1)100100100100100989898929297.9%
GPT-4o, May 13th (temp=1)100100100100100989898987797.1%
o4 Mini10010010010010010010098927796.7%
Claude Opus 4.6100100989898989898987796.6%
GPT-4.1 Mini100100100100100989892927795.8%
Grok 4 Fast1001001009898929292929295.8%
Z.AI GLM 4.61001001001001001009898777795.1%
Llama 3.1 Nemotron 70B100100100100100989898985494.7%
GPT-4o, Aug. 6th (temp=1)100100100100100989898925494.0%
Grok 4.1 Fast1001001001001001009898922791.6%
Llama 3.1 70B100100100100100927777775487.8%
Mistral Medium 3.11001001009892927777775486.8%
GPT-4.1 Nano1001001009898989892542786.7%
Gemini 2.5 Flash Lite100100100989892929277285.2%
ByteDance Seed 1.610010010010098927777772785.0%
GPT-4.11001001009898929277542783.9%
Qwen 3.5 Plus (2026-02-15)1001001001009898929227981.7%
Claude Sonnet 4.6100100100989292777777281.6%
Claude Sonnet 4.510010010010010098777727078.0%
Claude 3.7 Sonnet10098989898927754272777.1%
Gemma 3 27B10010098989277775454976.0%
Minimax M2.5100100100989892545454074.9%
Mistral Large 392929292929292542270.2%
Claude 3.5 Sonnet100100100987754545454969.9%
DeepSeek V3.1987777777777545427962.9%
Hermes 3 405B1001001009892922792062.1%
Claude Haiku 4.510010010092775454279061.3%
Z.AI GLM 4.51001009892922727279257.5%
Llama 3.1 8B10098989277545400057.3%
DeepSeek V3 (2025-03-24)10092927777542790052.9%
Mistral Small 3.2 24B10010010098929922051.2%
Ministral 3 14B10010010010054272700050.8%
Claude Sonnet 49892927754542790050.4%
DeepSeek V3.2100100927727272799047.0%
DeepSeek V3 (2024-12-26)100989277549900044.0%
Qwen 2.5 72B1009877775427200043.6%
Mistral Large 2989277545427000040.3%
Ministral 8B10092927799000038.0%
Mistral NeMO98929254272000036.5%
ByteDance Seed 1.6 Flash10010098000000029.8%
Arcee AI: Trinity Mini9277772700000027.4%
Arcee AI: Trinity Large (Preview)1007754000000023.1%
Ministral 3 3B927727000000019.7%
Ministral 3 8B98779920000019.6%
DeepSeek-V2 Chat542727999000013.5%
Gemma 3 12B5427220000008.5%
Ministral 3B272220000003.2%
Claude 3.5 Haiku272000000002.9%
Mistral Large90000000000.9%
Gemma 3 4B00000000000.0%
Writer: Palmyra X500000000000.0%
Gemini 2.5 Flash00000000000.0%
Rocinante 12B00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
Hermes 3 70B00000000000.0%
Claude 3 Haiku00000000000.0%
Mistral Small Creative00000000000.0%
WizardLM 2 8x22b00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100989899.6%
Gemini 3 Pro (Preview)10010010010010010010098989298.9%
GPT-5 Mini10010010010010010010098989298.9%
MoonshotAI: Kimi K2.5100100100100100989898989298.6%
Z.AI GLM 5100100100100100989898989298.6%
GPT-4o, Aug. 6th (temp=0)10010010010098989898989298.4%
Gemini 3 Flash (Preview)10010010010098989898989298.4%
GPT-5.21001001009898989898989298.2%
o4 Mini High1001001009898989898929297.6%
GPT-5.1100100100100100989892929297.3%
Gemini 2.5 Pro100100989898989892929296.9%
GPT-5 Nano10010010010010010010098927796.8%
Z.AI GLM 4.7 Flash1001001009898989292929296.4%
Claude Opus 4.5100100100100100989892927795.8%
o4 Mini100100100100100989292927795.2%
GPT-4.11001001001001001009898777795.1%
GPT-4o Mini (temp=0)10092929292929292929293.0%
Grok 4 Fast10010010010098989877775490.3%
GPT-4o, May 13th (temp=0)100100100100100989892772789.3%
Claude Opus 4.6100100100100100989877545488.1%
GPT-4.1 Mini100100989898929292545487.9%
Z.AI GLM 4.61001001009898929292542785.4%
Claude 3.7 Sonnet10010010010098989877542785.3%
GPT-510010010010098989254545485.0%
GPT-4o, Aug. 6th (temp=1)10010010010098929254545484.3%
Llama 3.1 70B10010098989892777754279.7%
GPT-4o, May 13th (temp=1)100100100989892927727979.5%
Claude Sonnet 498989898777777779071.2%
Claude 3.5 Sonnet100989892927777542069.1%
Gemini 2.5 Flash Lite1001009892929254549069.1%
Grok 41009892929292542727968.5%
Claude Sonnet 4.51001009892927754279065.0%
Claude Opus 41009898927754542727062.8%
Hermes 3 405B100100989892922720061.0%
Gemma 3 12B100100989892772792060.4%
DeepSeek V3 (2025-03-24)100929277777754272059.9%
Claude Sonnet 4.6100927777775454279957.7%
Minimax M2.510010098989277200056.8%
Mistral Large 2100100100927777000054.7%
Z.AI GLM 4.51001009877542727270051.2%
Llama 3.1 Nemotron 70B1009892925454992051.0%
ByteDance Seed 1.610098925454542799249.8%
Llama 3.1 8B100989292779000046.9%
Grok 4.1 Fast98929277779990046.5%
GPT-4.1 Nano1009877775454220046.4%
DeepSeek V3 (2024-12-26)1009292545427992043.9%
Mistral Large10010010054279000039.0%
Mistral Large 3100100987790000038.5%
Mistral Medium 3.198987777272000038.1%
Ministral 3 8B100987754279900037.5%
Ministral 3 3B98929254270000036.4%
Gemini 2.5 Flash100100772700000030.5%
Qwen 2.5 72B10010092920000030.3%
Ministral 3 14B1009898000000029.7%
DeepSeek V3.2989277200000027.0%
DeepSeek-V2 Chat989227220000022.1%
DeepSeek V3.192929999000022.1%
Arcee AI: Trinity Large (Preview)1001009000000020.9%
Hermes 3 70B98929200000020.1%
Claude Haiku 4.5100540000000015.4%
Arcee AI: Trinity Mini545427220000013.8%
Gemma 3 27B54542200000011.0%
Ministral 3B10000000000010.0%
Ministral 8B980000000009.9%
Cohere Command R+ (Aug. 2024)5427920000009.2%
Claude 3.5 Haiku779000000008.6%
Qwen 3.5 Plus (2026-02-15)542000000005.5%
Mistral Small Creative540000000005.4%
Mistral Small 3.2 24B279000000003.7%
Writer: Palmyra X599000000001.8%
ByteDance Seed 1.6 Flash90000000000.9%
Mistral NeMO00000000000.0%
Rocinante 12B00000000000.0%
Gemma 3 4B00000000000.0%
Claude 3 Haiku00000000000.0%
WizardLM 2 8x22b00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100989899.7%
Z.AI GLM 4.7100100989898989898989298.1%
Stealth: Aurora Alpha1001001001001001001001001007797.7%
Z.AI GLM 5100100989898989892929296.9%
Gemini 3 Pro (Preview)10010010010098989292927795.1%
GPT-5 Nano100100989898989292777793.3%
Gemini 3 Flash (Preview)10010010010098929277777791.5%
GPT-5.210010098989898929277986.4%
GPT-5.110098989898929277772786.0%
MoonshotAI: Kimi K2.59898989898927777542782.0%
o4 Mini High100100100989898775454978.9%
Z.AI GLM 4.7 Flash989898989892925454078.4%
o4 Mini1001001001009892545427973.4%
GPT-4o Mini (temp=1)1001001001009277775427273.0%
GPT-5 Mini10010098989292772727972.2%
GPT-5100989892777777779270.9%
Mistral Large 31001001009898927700066.7%
GPT-4o, Aug. 6th (temp=1)1001009892927754279265.2%
Grok 41001009892925454272061.9%
Gemini 2.5 Pro1001009877777754270061.2%
GPT-4o, Aug. 6th (temp=0)9898925454545454272761.1%
GPT-4.1100100927754542722050.8%
GPT-4o, May 13th (temp=1)100989292779000046.9%
Claude Sonnet 4.51009877775427000043.4%
Ministral 3 8B9898929200000038.1%
GPT-4o Mini (temp=0)989892999999935.2%
Grok 4.1 Fast989892900000029.8%
Mistral Medium 3.192925427272000029.4%
DeepSeek V3 (2024-12-26)1009892000000029.1%
Arcee AI: Trinity Large (Preview)10098272790000026.2%
GPT-4.1 Nano10092272790000025.6%
Ministral 3 14B1009227992000023.9%
Gemma 3 12B9254542700000022.7%
DeepSeek V3 (2025-03-24)1009827000000022.6%
Minimax M2.5100929000000020.1%
DeepSeek V3.2100779000000018.7%
Writer: Palmyra X5100772200000018.1%
Claude Opus 492770000000017.0%
Cohere Command R+ (Aug. 2024)77772000000015.6%
DeepSeek V3.1100272000000012.9%
Claude Haiku 4.5100270000000012.7%
DeepSeek-V2 Chat98270000000012.6%
GPT-4o, May 13th (temp=0)542727000000010.9%
Arcee AI: Trinity Mini10020000000010.2%
Claude Opus 4.59820000000010.0%
Ministral 3 3B920000000009.2%
Claude Opus 4.6770000000007.8%
Mistral Small Creative770000000007.8%
Claude Sonnet 4770000000007.7%
Gemma 3 4B540000000005.4%
Gemini 2.5 Flash Lite279000000003.6%
ByteDance Seed 1.6272000000003.0%
Grok 4 Fast99200000002.0%
Claude 3.7 Sonnet99000000001.8%
Gemini 2.5 Flash92200000001.2%
Gemma 3 27B00000000000.0%
GPT-4.1 Mini00000000000.0%
ByteDance Seed 1.6 Flash00000000000.0%
Z.AI GLM 4.600000000000.0%
Qwen 3.5 Plus (2026-02-15)00000000000.0%
Claude 3.5 Haiku00000000000.0%
Claude Sonnet 4.600000000000.0%
Qwen 2.5 72B00000000000.0%
Claude 3 Haiku00000000000.0%
Rocinante 12B00000000000.0%
Claude 3.5 Sonnet00000000000.0%
Z.AI GLM 4.500000000000.0%
Mistral Large 200000000000.0%
Hermes 3 405B00000000000.0%
Llama 3.1 70B00000000000.0%
Llama 3.1 Nemotron 70B00000000000.0%
Mistral Large00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Hermes 3 70B00000000000.0%
Llama 3.1 8B00000000000.0%
Mistral NeMO00000000000.0%
Ministral 8B00000000000.0%
WizardLM 2 8x22b00000000000.0%
Ministral 3B00000000000.0%