XML structure

Test: Relationship tree

Avg. Score
90.8%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00333.2s100%
2Gemini 3.1 Flash Lite100.0%$0.00313.3s100%
3Gemini 3.1 Flash Lite (Preview)100.0%$0.00463.2s100%
4GPT-5.4 Nano100.0%$0.002611.2s100%
5Gemini 2.5 Flash100.0%$0.00756.1s100%
6Gemini 3 Flash (Preview)100.0%$0.00877.1s100%
7GPT-5.4 Nano (Reasoning, Low)100.0%$0.003813.8s100%
8Inception Mercury 2100.0%$0.009612.8s100%
9DeepSeek V4 Pro100.0%$0.008317.3s100%
10Grok 4.20 (Beta)100.0%$0.0195.5s100%
11GPT-5.4 Mini (Reasoning, Low)100.0%$0.01414.4s100%
12Grok 4.20100.0%$0.0207.5s100%
13Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.0246.3s100%
14DeepSeek V3.2100.0%$0.004633.9s100%
15GPT-4o, Aug. 6th (temp=0)100.0%$0.0328.4s100%
16GPT-5.1100.0%$0.02517.7s100%
17MiniMax M2.5100.0%$0.003645.0s100%
18GPT-5.4 Nano (Reasoning)100.0%$0.008841.7s100%
19GPT-5.4100.0%$0.03317.4s100%
20Gemini 2.5 Flash (Reasoning)100.0%$0.02337.3s100%
21Grok 4.20 (Beta, Reasoning)100.0%$0.03230.1s100%
22Qwen 3.6 Flash100.0%$0.01551.7s100%
23Grok 4.20 (Reasoning)100.0%$0.02740.0s100%
24Gemini 3 Flash (Preview, Reasoning)100.0%$0.03040.3s100%
25ByteDance Seed 1.6100.0%$0.0141.2m100%
26o4 Mini100.0%$0.03745.7s100%
27GPT-5.4 (Reasoning, Low)100.0%$0.05335.5s100%
28Claude Sonnet 4100.0%$0.07217.5s100%
29Gemma 4 31B100.0%$0.00281.9m100%
30Claude Sonnet 4.6100.0%$0.08123.3s100%
31Xiaomi MIMO v2.5 Pro100.0%$0.00761.9m100%
32GPT-5.2100.0%$0.06250.7s100%
33GPT-OSS 120B100.0%$0.00402.2m100%
34GPT-5 Mini100.0%$0.0192.0m100%
35GPT-5 Nano100.0%$0.00802.2m100%
36o4 Mini High100.0%$0.0591.3m100%
37Xiaomi MIMO v2.597.0%$0.00271.1m88%
38DeepSeek V4 Pro (Reasoning)100.0%$0.0202.3m100%
39GPT-5.5100.0%$0.10537.1s100%
40DeepSeek-V2 Chat94.0%$0.004934.0s85%
41Gemini 2.5 Pro100.0%$0.09958.2s100%
42Claude Opus 4.5100.0%$0.13119.8s100%
43Gemini 3.5 Flash (Reasoning)100.0%$0.11349.4s100%
44Gemma 4 26B (Reasoning)100.0%$0.00413.1m100%
45Qwen 3.6 27B100.0%$0.0342.7m100%
46MiniMax M2.796.0%$0.006321.4s76%
47Gemma 4 26B96.0%$0.001827.2s76%
48Gemini 2.5 Flash Lite93.0%$0.00234.8s75%
49Qwen 3.5 35B98.5%$0.0152.0m91%
50Qwen 3.5 397B A17B100.0%$0.0452.6m100%
51Mistral Large 385.0%$0.009117.5s85%
52Claude Opus 4.6100.0%$0.15431.4s100%
53Claude Haiku 4.585.0%$0.0248.2s85%
54GPT-4o, Aug. 6th (temp=1)94.5%$0.0299.1s75%
55ByteDance Seed 2.0 Lite100.0%$0.0143.6m100%
56DeepSeek V4 Flash85.0%$0.002051.4s85%
57Mistral Small 488.0%$0.00307.4s75%
58Claude Opus 4.7100.0%$0.18224.3s100%
59Mistral Large 285.0%$0.03720.5s85%
60DeepSeek V3 (2024-12-26)91.5%$0.004540.2s75%
61Claude Opus 4.7 (Reasoning)100.0%$0.18425.0s100%
62Qwen 3.5 27B100.0%$0.0293.7m100%
63Qwen 3.6 35B96.0%$0.0141.1m76%
64Qwen 3.5 Plus (2026-02-15)100.0%$0.0213.9m100%
65GPT-5.4 Mini92.0%$0.00877.9s68%
66Qwen3.7 Max100.0%$0.0453.5m100%
67Mistral Large85.0%$0.03641.1s85%
68GPT-4.1 Mini92.0%$0.004626.0s68%
69GPT-4.192.0%$0.0248.5s68%
70ByteDance Seed 1.6 Flash92.0%$0.001938.2s68%
71Gemini 3.1 Pro (Preview)100.0%$0.1721.5m100%
72Z.AI GLM 4.592.0%$0.007548.1s68%
73DeepSeek V4 Flash (Reasoning)97.0%$0.00413.5m88%
74GPT-5.4 Mini (Reasoning)100.0%$0.1172.8m100%
75Gemini 2.5 Flash Lite (Reasoning)92.0%$0.00731.0m68%
76WizardLM 2 8x22b96.0%$0.00982.3m76%
77Qwen 3.5 Flash97.0%$0.00403.8m88%
78Qwen3.6 Max Preview100.0%$0.0843.7m100%
79Claude Sonnet 4.586.5%$0.07718.6s77%
80Ministral 8B80.0%$0.001615.6s68%
81Mistral Small 3.2 24B81.5%$0.001713.8s65%
82Grok 4.384.0%$0.00947.8s61%
83MoonshotAI: Kimi K2.5100.0%$0.0315.4m100%
84GPT-5.4 (Reasoning)100.0%$0.1752.6m100%
85Ministral 3 8B77.5%$0.002217.0s66%
86Llama 3.1 70B84.0%$0.006434.0s61%
87Mistral Medium 3.177.5%$0.01021.2s66%
88Z.AI GLM 4.5 Air94.5%$0.00713.2m75%
89MoonshotAI: Kimi K2.6100.0%$0.0825.0m100%
90GPT-4o, May 13th (temp=1)85.5%$0.0865.4s67%
91GPT-4o, May 13th (temp=0)87.0%$0.0885.5s66%
92Grok 4.3 (Reasoning)91.0%$0.01313.9s46%
93Qwen 2.5 72B84.0%$0.00561.2m61%
94GPT-5.5 (Reasoning)100.0%$0.2512.0m100%
95Z.AI GLM 5 Turbo96.0%$0.0443.6m76%
96DeepSeek V3.191.0%$0.005552.0s46%
97Qwen 3.5 122B96.0%$0.0354.0m76%
98Qwen 3.5 Plus (2026-04-20)92.0%$0.0233.1m68%
99Mistral Small 4 (Reasoning)85.0%$0.005124.7s44%
100Z.AI GLM 4.7 Flash86.5%$0.00502.8m64%
101Writer: Palmyra X580.0%$0.01618.4s48%
102Gemma 3 27B77.5%$0.001627.5s47%
103Z.AI GLM 4.792.0%$0.0253.7m68%
104GPT-592.0%$0.0922.3m68%
105Z.AI GLM 4.688.0%$0.0183.0m63%
106Mistral NeMO60.0%$0.002415.4s60%
107Qwen 3 32B84.0%$0.00422.9m61%
108Gemma 3 12B76.5%$0.001430.4s44%
109Arcee AI: Trinity Mini88.0%$0.00371.9m47%
110ByteDance Seed 2.0 Mini100.0%$0.00628.7m100%
111Ministral 3B70.0%$0.00079.8s45%
112Z.AI GLM 5.1100.0%$0.0817.3m100%
113Ministral 3 3B67.5%$0.00079.9s46%
114Hermes 3 405B83.0%$0.0151.0m42%
115GPT-5.5 (Reasoning, Low)91.0%$0.09742.6s46%
116Claude 3 Haiku73.0%$0.005611.6s40%
117Hermes 3 70B77.5%$0.004542.7s40%
118Ministral 3 14B67.5%$0.003125.4s46%
119Gemma 3 4B65.0%$0.000727.5s48%
120GPT-4.1 Nano64.0%$0.00096.9s46%
121Claude Opus 4.8 (Reasoning)100.0%$0.3811.9m100%
122Claude Opus 4.6 (Reasoning)100.0%$0.3562.6m100%
123Claude Opus 4100.0%$0.3682.4m100%
124MiniMax M3100.0%$0.0369.3m100%
125Llama 3.1 8B64.0%$0.000551.5s46%
126Qwen 3.5 9B96.0%$0.00407.8m76%
127Claude Opus 4.8 (Reasoning, Low)98.5%$0.3831.9m91%
128Cydonia 24B V4.169.5%$0.00401.0m28%
129Qwen3 235B A22B Instruct 250767.0%$0.00171.0m29%
130Z.AI GLM 591.0%$0.0374.6m46%
131Nemotron 3 Nano82.0%$0.00392.8m28%
132Gemma 4 31B (Reasoning)91.0%$0.00435.8m46%
133Skyfall 36B V254.0%$0.005426.5s30%
134Rocinante 12B47.5%$0.003332.1s29%
135Cohere Command R+ (Aug. 2024)66.0%$0.05756.5s20%
136DeepSeek V3 (2025-03-24)50.5%$0.003750.3s17%
137Aion 2.073.0%$0.0243.0m18%
138Claude Sonnet 4.6 (Reasoning)100.0%$0.4485.9m100%
139Nemotron 3 Super10.0%$0.00003.2m
140LFM2 24B10.0%$0.00047.8s10%
90.84%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.7 Max100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
Claude Opus 4.8 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.8 (Reasoning, Low)100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MiniMax M3100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.2100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Opus 4100100100100100100.0%
GPT-4.1100100100100100100.0%
MiniMax M2.5100100100100100100.0%
o4 Mini100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
Gemma 4 26B100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
GPT-5.4100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
Inception Mercury 2100100100100100100.0%
Grok 4.20100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Nemotron 3 Nano100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Hermes 3 70B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Qwen 3.5 35B1001001001008597.0%
DeepSeek V3 (2024-12-26)1001001001008597.0%
GPT-4o, Aug. 6th (temp=1)1001001001008597.0%
DeepSeek V4 Flash (Reasoning)100100100858594.0%
DeepSeek-V2 Chat100100100858594.0%
Xiaomi MIMO v2.5100100100858594.0%
Z.AI GLM 4.71001001001006092.0%
Z.AI GLM 4.61001001001006092.0%
MiniMax M2.71001001001006092.0%
Z.AI GLM 4.51001001001006092.0%
Qwen 3.5 9B1001001001006092.0%
Gemini 2.5 Flash Lite (Reasoning)1001001001006092.0%
Z.AI GLM 4.7 Flash1001001001006092.0%
GPT-4.1 Mini1001001001006092.0%
Z.AI GLM 4.5 Air1001001001006092.0%
ByteDance Seed 1.6 Flash1001001001006092.0%
Claude Sonnet 4.51008585858588.0%
GPT-4o, May 13th (temp=1)1008585858588.0%
Mistral Large 3858585858585.0%
Claude Haiku 4.5858585858585.0%
DeepSeek V4 Flash858585858585.0%
Mistral Large 2858585858585.0%
Mistral Large858585858585.0%
Mistral Small 4858585858585.0%
GPT-5100100100606084.0%
Qwen 3 32B100100100606084.0%
Qwen3 235B A22B Instruct 2507100100100606084.0%
Cohere Command R+ (Aug. 2024)100100100606084.0%
GPT-4o, May 13th (temp=0)1008585856083.0%
Mistral Small 3.2 24B1008585856083.0%
Grok 4.3 (Reasoning)1001001001001082.0%
DeepSeek V3.11001001001001082.0%
Cydonia 24B V4.110010085606081.0%
Ministral 8B858585856080.0%
Mistral Small 4 (Reasoning)10010085851076.0%
Writer: Palmyra X510010060606076.0%
Llama 3.1 70B10010060606076.0%
Claude 3 Haiku10010060606076.0%
Mistral Medium 3.1858585606075.0%
Ministral 3 14B858585606075.0%
Ministral 3 8B858585606075.0%
Ministral 3 3B858585606075.0%
Ministral 3B858585606075.0%
Gemma 3 27B858585851070.0%
Gemma 3 4B858560606070.0%
Grok 4.31006060606068.0%
GPT-4.1 Nano1006060606068.0%
Llama 3.1 8B1006060606068.0%
Gemma 3 12B858585601065.0%
Mistral NeMO606060606060.0%
Skyfall 36B V21006060601058.0%
DeepSeek V3 (2025-03-24)10010060101056.0%
Rocinante 12B856060601055.0%
Aion 2.010010010101046.0%
LFM2 24B101010101010.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.7 Max100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
Claude Opus 4.8 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
GPT-5100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MiniMax M3100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.2100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Aion 2.0100100100100100100.0%
o4 Mini100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
GPT-5.4100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Gemini 2.5 Flash (Reasoning)100100100100100100.0%
Qwen 3.5 9B100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100.0%
GPT-5.4 Mini (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
Inception Mercury 2100100100100100100.0%
Grok 4.20100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Grok 4.3100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
GPT-5.4 Nano100100100100100100.0%
Claude Opus 4.8 (Reasoning, Low)1001001001008597.0%
Z.AI GLM 4.5 Air1001001001008597.0%
Qwen 3.5 Flash100100100858594.0%
DeepSeek-V2 Chat100100100858594.0%
Mistral Small 4 (Reasoning)100100100858594.0%
Z.AI GLM 5 Turbo1001001001006092.0%
Qwen 3.5 122B1001001001006092.0%
Z.AI GLM 4.71001001001006092.0%
Qwen 3.6 35B1001001001006092.0%
Z.AI GLM 4.51001001001006092.0%
Gemma 4 26B1001001001006092.0%
Gemini 2.5 Flash Lite (Reasoning)1001001001006092.0%
GPT-4.1 Mini1001001001006092.0%
GPT-4o, Aug. 6th (temp=1)1001001001006092.0%
Llama 3.1 70B1001001001006092.0%
WizardLM 2 8x22b1001001001006092.0%
GPT-4o, May 13th (temp=0)10010085858591.0%
Mistral Small 410010085858591.0%
Gemma 3 12B1008585858588.0%
Gemini 2.5 Flash Lite10010085856086.0%
DeepSeek V3 (2024-12-26)10010085856086.0%
Claude Sonnet 4.5858585858585.0%
Mistral Large 3858585858585.0%
Claude Haiku 4.5858585858585.0%
DeepSeek V4 Flash858585858585.0%
Mistral Large 2858585858585.0%
Mistral Large858585858585.0%
Gemma 3 27B858585858585.0%
Qwen 3.5 Plus (2026-04-20)100100100606084.0%
Z.AI GLM 4.6100100100606084.0%
GPT-4.1100100100606084.0%
GPT-5.4 Mini100100100606084.0%
Qwen 3 32B100100100606084.0%
Writer: Palmyra X5100100100606084.0%
GPT-4o, May 13th (temp=1)1008585856083.0%
GPT-5.5 (Reasoning, Low)1001001001001082.0%
Gemma 4 31B (Reasoning)1001001001001082.0%
Z.AI GLM 51001001001001082.0%
Z.AI GLM 4.7 Flash10010085606081.0%
Mistral Small 3.2 24B858585856080.0%
Mistral Medium 3.1858585856080.0%
Ministral 3 8B858585856080.0%
Ministral 8B858585856080.0%
Arcee AI: Trinity Mini10010085851076.0%
Claude 3 Haiku858560606070.0%
Qwen 2.5 72B1006060606068.0%
Hermes 3 405B10010060601066.0%
Ministral 3B856060606065.0%
Nemotron 3 Nano100100100101064.0%
Ministral 3 14B606060606060.0%
GPT-4.1 Nano606060606060.0%
Gemma 3 4B606060606060.0%
Ministral 3 3B606060606060.0%
Mistral NeMO606060606060.0%
Llama 3.1 8B606060606060.0%
Cydonia 24B V4.11006060601058.0%
Hermes 3 70B856060601055.0%
Qwen3 235B A22B Instruct 2507606060601050.0%
Skyfall 36B V2606060601050.0%
Cohere Command R+ (Aug. 2024)1006060101048.0%
DeepSeek V3 (2025-03-24)856060101045.0%
Rocinante 12B606060101040.0%
Nemotron 3 Super1010.0%