Parse dialogue

Test: Language Writing

Avg. Score
86.1%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Stealth: Aurora Alpha100.0%2.0s100%
2GPT-4o Mini (temp=0)100.0%$0.00034.8s100%
3GPT-4o Mini (temp=1)100.0%$0.00035.6s100%
4Gemini 3 Flash (Preview)100.0%$0.00205.6s100%
5GPT-4.1 Mini99.3%$0.00043.4s93%
6DeepSeek-V2 Chat100.0%$0.000116.1s100%
7GPT-4o, Aug. 6th (temp=0)100.0%$0.00526.1s100%
8GPT-4o, Aug. 6th (temp=1)100.0%$0.00566.5s100%
9Z.AI GLM 4.599.7%$0.001314.5s97%
10Hermes 3 405B99.6%$0.000021.0s96%
11DeepSeek V3.199.6%$0.000722.8s96%
12GPT-4o, May 13th (temp=0)99.7%$0.00878.0s97%
13o4 Mini100.0%$0.007116.7s100%
14Gemini 2.5 Flash97.5%$0.00266.0s83%
15Claude 3.5 Haiku95.2%$0.00155.7s83%
16Claude Sonnet 4.6100.0%$0.01013.7s100%
17GPT-4.197.8%$0.00416.8s79%
18GPT-4.1 Nano92.9%$0.00014.0s69%
19Minimax M2.595.8%$0.001321.5s78%
20Gemma 3 12B95.1%$0.000233.9s80%
21Claude 3.5 Sonnet97.0%$0.009515.7s80%
22Claude Opus 4.599.3%$0.01915.7s96%
23Claude Haiku 4.588.9%$0.00377.8s70%
24Claude Sonnet 495.6%$0.009211.1s76%
25Gemma 3 27B91.4%$0.000327.9s76%
26Grok 4 Fast89.5%$0.00057.6s60%
27Claude 3.7 Sonnet97.6%$0.01214.8s80%
28Grok 4.1 Fast90.6%$0.000715.1s62%
29GPT-5 Mini98.0%$0.006637.2s82%
30o4 Mini High99.5%$0.01536.6s97%
31Gemini 2.5 Flash Lite87.0%$0.00055.1s54%
32GPT-5 Nano99.4%$0.00281.2m96%
33Mistral Large87.3%$0.00329.6s57%
34Mistral Large 386.1%$0.001417.3s61%
35Hermes 3 70B88.7%$0.000316.6s54%
36Qwen 3.5 Plus (2026-02-15)90.2%$0.001927.2s65%
37Llama 3.1 70B85.4%$0.000720.5s62%
38GPT-5.297.4%$0.01626.0s82%
39Claude 3 Haiku81.0%$0.00073.8s49%
40ByteDance Seed 1.6 Flash84.9%$0.000614.6s51%
41Claude Sonnet 4.589.8%$0.01113.2s62%
42Grok 496.2%$0.01631.8s74%
43GPT-4o, May 13th (temp=1)90.9%$0.00928.2s46%
44Arcee AI: Trinity Large (Preview)79.1%$0.000011.2s42%
45Cohere Command R+ (Aug. 2024)83.4%$0.006213.4s50%
46ByteDance Seed 1.691.3%$0.004050.4s65%
47Arcee AI: Trinity Mini81.2%$0.000215.9s32%
48Ministral 3 3B76.6%$0.00012.6s21%
49Llama 3.1 8B73.1%$0.00014.6s26%
50DeepSeek V3 (2024-12-26)83.8%$0.000719.4s27%
51Gemma 3 4B74.8%$0.000113.4s30%
52Gemini 2.5 Pro95.1%$0.02422.5s61%
53MoonshotAI: Kimi K2.594.2%$0.00821.2m68%
54Z.AI GLM 4.693.2%$0.00441.2m60%
55DeepSeek V3 (2025-03-24)76.4%$0.000514.6s21%
56Ministral 3 8B74.0%$0.00026.1s16%
57Ministral 3B59.7%$0.00002.9s30%
58GPT-5.197.3%$0.02752.6s81%
59Mistral Small 3.2 24B72.3%$0.000311.0s20%
60Z.AI GLM 589.6%$0.00591.3m65%
61DeepSeek V3.290.0%$0.00041.1m45%
62Z.AI GLM 4.7 Flash80.2%$0.001252.0s45%
63Qwen 2.5 72B69.6%$0.000419.8s21%
64Mistral NeMO66.6%$0.00014.3s10%
65Mistral Large 274.7%$0.004218.6s20%
66Gemini 3.1 Pro (Preview)94.8%$0.03740.3s71%
67Claude Opus 4.692.5%$0.03634.3s68%
68Ministral 8B54.2%$0.00016.6s13%
69Mistral Medium 3.161.7%$0.002016.7s16%
70WizardLM 2 8x22b61.1%$0.000815.6s12%
71Z.AI GLM 4.792.6%$0.00541.9m66%
72Mistral Small Creative52.0%$0.00046.3s12%
73Gemini 3 Pro (Preview)84.5%$0.02821.6s41%
74Rocinante 12B56.0%$0.000322.8s19%
75Claude Opus 491.1%$0.04824.7s65%
76GPT-598.0%$0.0441.3m81%
77Qwen 3.5 397B A17B90.0%$0.0152.0m60%
78Writer: Palmyra X547.2%$0.005113.6s0%
79Llama 3.1 Nemotron 70B32.1%$0.000322.3s0%
80Ministral 3 14B18.0%$0.00039.0s0%
86.07%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5.1100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Minimax M2.5100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Grok 4 Fast100100100100100100.0%
Claude 3.5 Sonnet100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
GPT-5.21001001001009999.8%
Gemini 2.5 Pro1001001001009599.1%
GPT-5 Mini1001001001009498.9%
Gemini 3 Pro (Preview)1001001001009498.9%
DeepSeek V3 (2024-12-26)1001001001009498.8%
Gemini 2.5 Flash1001001001009498.8%
GPT-5 Nano1001001001009398.6%
DeepSeek V3.21001001001009298.5%
Gemma 3 12B1001001001009298.3%
DeepSeek V3 (2025-03-24)1001001001008997.8%
Mistral Large1001001001008697.1%
Z.AI GLM 4.7 Flash100100100949096.8%
GPT-4.1 Mini1001001001008296.4%
ByteDance Seed 1.6 Flash100100100919096.2%
Gemma 3 27B100100100918996.0%
Qwen 3.5 Plus (2026-02-15)1009494939394.8%
Llama 3.1 70B10010092918593.6%
Gemini 2.5 Flash Lite10010093918393.5%
GPT-4.1 Nano100100100907392.5%
Gemma 3 4B1001001001006092.0%
Z.AI GLM 51009490888691.7%
Claude Haiku 4.5100100100777790.8%
Cohere Command R+ (Aug. 2024)10010093926389.5%
Claude 3.5 Haiku10010088867188.9%
Mistral Large 31009282797685.9%
Arcee AI: Trinity Large (Preview)10010091785584.6%
Mistral Large 2100100100100080.0%
Ministral 3 3B100100100100080.0%
Ministral 3 8B10010010093078.7%
Llama 3.1 8B10010010090078.0%
Writer: Palmyra X510010010086077.1%
Rocinante 12B1009467635074.6%
Mistral Medium 3.11009167505071.5%
Ministral 8B1001007367067.9%
Ministral 3B1001007150064.3%
Mistral NeMO1001001000060.0%
Qwen 2.5 72B100100960059.1%
Claude 3 Haiku1005050504358.6%
WizardLM 2 8x22b100100710054.3%
Mistral Small Creative100505050050.0%
Mistral Small 3.2 24B1009400038.8%
Ministral 3 14B100000020.0%
Llama 3.1 Nemotron 70B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Opus 4100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Claude 3.7 Sonnet100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Large100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Claude Opus 4.51001001001009498.9%
o4 Mini High1001001001009498.8%
Gemini 2.5 Pro1001001001009498.8%
Mistral Large 31001001001009498.8%
Claude Sonnet 41001001001009398.6%
DeepSeek V3.21001001001009298.3%
Hermes 3 405B1001001001009198.2%
GPT-5.21009998979698.1%
GPT-5.11001001001009097.9%
Claude 3.5 Sonnet100100100959497.8%
DeepSeek V3 (2025-03-24)1001001001008997.8%
Gemini 2.5 Flash Lite1001001001008997.8%
Grok 4 Fast1001001001008697.1%
GPT-4o, May 13th (temp=1)100100100939196.8%
Claude Sonnet 4.510010095949396.5%
Gemma 3 4B100100100928695.6%
Z.AI GLM 510010095958895.6%
Gemma 3 27B100100100928395.1%
Gemma 3 12B10010091919094.4%
Llama 3.1 70B100100100838092.7%
Claude 3.5 Haiku100100100887592.5%
Claude Haiku 4.510010094887992.2%
GPT-4.1 Nano1001001001006092.0%
Minimax M2.5100100100936791.9%
ByteDance Seed 1.6 Flash100100100807891.6%
Z.AI GLM 4.7 Flash100100100886289.8%
Gemini 3 Pro (Preview)100100100100080.0%
DeepSeek V3 (2024-12-26)100100100100080.0%
Mistral Small 3.2 24B100100100100080.0%
Z.AI GLM 4.61001009594077.8%
Llama 3.1 8B1001009190076.2%
Rocinante 12B10010010067073.3%
Hermes 3 70B10010010060072.0%
Cohere Command R+ (Aug. 2024)867570625769.9%
Claude 3 Haiku10010050504368.6%
Mistral Medium 3.11001005050060.0%
Mistral Large 21001001000060.0%
Ministral 3 3B100100900058.0%
Llama 3.1 Nemotron 70B100100860057.1%
Ministral 8B10083820053.0%
Ministral 3 8B100100630052.6%
WizardLM 2 8x22b10088630050.0%
Ministral 3B67645858049.3%
Mistral NeMO100100330046.7%
Writer: Palmyra X510010000040.0%
Qwen 2.5 72B10010000040.0%
Mistral Small Creative7550500035.0%
Ministral 3 14B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
o4 Mini High100100100100100100.0%
o4 Mini100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
GPT-5 Nano100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
Claude 3.5 Haiku100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Claude Opus 4.51001001001009599.0%
GPT-4o, May 13th (temp=0)1001001001009298.5%
Z.AI GLM 4.6100100100956792.3%
GPT-5.21001001001005691.2%
GPT-5 Mini1001001001005591.0%
GPT-51001001001005390.5%
GPT-5.11001001001005190.3%
Claude 3.5 Sonnet1001001001005090.0%
Claude 3.7 Sonnet1001001001005090.0%
Gemini 2.5 Flash100100100925890.0%
GPT-4.1 Nano1001001001005090.0%
Minimax M2.5100100100935389.2%
GPT-4.11001001001004589.1%
Gemma 3 27B10010087757086.3%
Hermes 3 70B100100100716086.3%
Gemma 3 12B10010089815584.9%
Claude Haiku 4.5100100100605382.5%
Claude Sonnet 4100100100575382.1%
Grok 4100100100565081.1%
Gemini 2.5 Pro100100100100080.0%
GPT-4o, May 13th (temp=1)100100100100080.0%
Mistral NeMO100100100100080.0%
Gemini 2.5 Flash Lite938987646078.5%
Mistral Small Creative10010092505078.5%
Gemini 3.1 Pro (Preview)10010067535074.0%
Z.AI GLM 4.7 Flash10010057505071.4%
DeepSeek V3 (2025-03-24)10010010056071.1%
MoonshotAI: Kimi K2.510010054544771.0%
Mistral Medium 3.11001008667070.5%
Qwen 3.5 Plus (2026-02-15)10010054504670.0%
Cohere Command R+ (Aug. 2024)1001008855068.4%
Ministral 3 3B1001006767066.7%
Llama 3.1 70B1006057565465.3%
Claude Opus 4.61005955555464.5%
Mistral Large 21001006256063.6%
Z.AI GLM 4.71005554545363.0%
Mistral Large 31006050505062.0%
Arcee AI: Trinity Mini1001005550060.9%
Z.AI GLM 51005750504760.8%
Llama 3.1 8B717056545060.2%
DeepSeek V3 (2024-12-26)1001001000060.0%
Claude Opus 4935850504759.6%
Qwen 2.5 72B1005350484659.3%
ByteDance Seed 1.6 Flash100716054057.1%
WizardLM 2 8x22b100646357056.8%
ByteDance Seed 1.6636055555056.3%
Arcee AI: Trinity Large (Preview)100100800056.0%
DeepSeek V3.2100100670053.3%
Grok 4.1 Fast565655505053.1%
Ministral 3B605350505052.7%
Claude Sonnet 4.5625350504752.4%
Grok 4 Fast575650454450.5%
Qwen 3.5 397B A17B535050504750.1%
Mistral Large505050505050.0%
Gemma 3 4B67605845046.1%
Gemini 3 Pro (Preview)60565454044.8%
Mistral Small 3.2 24B59555050042.7%
Ministral 3 8B10010000040.0%
Ministral 8B6456500034.0%
Llama 3.1 Nemotron 70B675000023.3%
Rocinante 12B675000023.3%
Ministral 3 14B100000020.0%
Writer: Palmyra X5000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
o4 Mini High100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Minimax M2.5100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
GPT-5 Nano100100100100100100.0%
Grok 4 Fast100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
Mistral Large 2100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Ministral 3 8B100100100100100100.0%
GPT-51001001001009899.5%
GPT-5.210010099999899.1%
GPT-5.1100100100999799.1%
Gemini 2.5 Pro1001001001009599.0%
Gemini 3 Pro (Preview)1001001001009598.9%
Claude 3.7 Sonnet1001001001009498.8%
Claude Sonnet 41001001001009498.8%
Claude Opus 4.51001001001009398.6%
Claude 3.5 Sonnet1001001001009398.6%
Z.AI GLM 4.51001001001009298.3%
Gemma 3 12B1001001001008997.8%
Claude Opus 4100100100949397.5%
Claude 3.5 Haiku1001001001008697.1%
ByteDance Seed 1.6 Flash100100100939296.9%
Claude Haiku 4.51001001001008396.7%
Mistral Large 3100100100938495.4%
Gemini 2.5 Flash Lite10010092908693.5%
Hermes 3 70B1001001001005891.7%
Cohere Command R+ (Aug. 2024)100100100887091.5%
Qwen 2.5 72B1001001001005390.6%
GPT-4.1 Nano1001001001005090.0%
Mistral Large1001001001004689.2%
Llama 3.1 70B10010093836788.6%
Gemma 3 27B908683837583.5%
DeepSeek V3 (2024-12-26)100100100100080.0%
GPT-4o, May 13th (temp=1)100100100100080.0%
Mistral NeMO100100100100080.0%
Arcee AI: Trinity Large (Preview)10010073715078.8%
Ministral 3 3B10010010093078.6%
Claude 3 Haiku100100100504078.0%
Z.AI GLM 4.7 Flash10010095504377.6%
Mistral Small Creative1001009290076.5%
Llama 3.1 8B100978883073.5%
Ministral 3B1008669575673.4%
Gemma 3 4B1001008573071.5%
WizardLM 2 8x22b10010010056071.1%
Llama 3.1 Nemotron 70B10010010050070.0%
Ministral 8B100918564068.0%
Writer: Palmyra X51001001000060.0%
Arcee AI: Trinity Mini1001001000060.0%
DeepSeek V3 (2025-03-24)100100890057.8%
Mistral Medium 3.19150500038.2%
Rocinante 12B7168500038.0%
Ministral 3 14B1005000030.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
GPT-5 Mini100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
GPT-5100100100100100100.0%
o4 Mini100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100.0%
Grok 4100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Grok 4.1 Fast100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-4.1100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Stealth: Aurora Alpha100100100100100100.0%
Grok 4 Fast100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
Hermes 3 405B100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100.0%
Gemma 3 12B100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small 3.2 24B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 3B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
GPT-5.1100100100989799.1%
Claude 3.7 Sonnet1001001001009598.9%
Gemini 2.5 Pro1001001001009498.9%
o4 Mini High1001001001009498.8%
Claude 3.5 Sonnet1001001001009498.8%
Qwen 2.5 72B1001001001009498.8%
Ministral 3 8B1001001001009498.8%
GPT-5.21009999999798.7%
Gemini 2.5 Flash1001001001009398.6%
Claude Opus 41001001001009298.5%
Claude Sonnet 41001001001009298.5%
Claude Opus 4.6100100100969598.2%
GPT-5 Nano1001001001009198.2%
DeepSeek V3.11001001001008997.9%
Minimax M2.5100100100969397.9%
Cohere Command R+ (Aug. 2024)1001001001008997.8%
GPT-4o, May 13th (temp=1)100100100959397.7%
Claude 3.5 Haiku1001001001008897.5%
Gemma 3 27B10010094949396.2%
Z.AI GLM 4.61009696959395.9%
Hermes 3 70B100100100888093.5%
Mistral Large 310010090866788.5%
Llama 3.1 70B10010086757386.7%
Qwen 3.5 Plus (2026-02-15)100100100755686.3%
Arcee AI: Trinity Mini100100100755085.0%
ByteDance Seed 1.6 Flash949389895083.0%
Claude Haiku 4.510010082656382.0%
Llama 3.1 8B10010010089077.8%
Arcee AI: Trinity Large (Preview)10010062605976.1%
WizardLM 2 8x22b10010010067073.3%
Gemini 2.5 Flash Lite100918780071.5%
Rocinante 12B928364645070.6%
Mistral Large 210010010050070.0%
Gemma 3 4B1001008362069.0%
Mistral Medium 3.11009150505068.2%
Mistral NeMO1001009240066.3%
Z.AI GLM 4.7 Flash1001006463065.2%
Ministral 3B865755504658.7%
Writer: Palmyra X5100100930058.7%
DeepSeek V3 (2025-03-24)100100880057.5%
Ministral 8B10082590048.2%
Mistral Small Creative505000020.0%
Ministral 3 14B100000020.0%
Llama 3.1 Nemotron 70B50000010.0%