Character precision

Test: Relationship tree

Avg. Score
95.3%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00333.2s100%
2Gemini 3.1 Flash Lite100.0%$0.00313.3s100%
3Gemini 2.5 Flash Lite100.0%$0.00234.8s100%
4LFM2 24B100.0%$0.00047.8s100%
5Gemini 3.1 Flash Lite (Preview)100.0%$0.00463.2s100%
6Gemini 3 Flash (Preview)100.0%$0.00877.1s100%
7Grok 4.3100.0%$0.00947.8s100%
8Inception Mercury 2100.0%$0.009612.8s100%
9DeepSeek V4 Pro100.0%$0.008317.3s100%
10Mistral Large 3100.0%$0.009117.5s100%
11Ministral 3 14B100.0%$0.003125.4s100%
12MiniMax M2.7100.0%$0.006321.4s100%
13Gemma 4 26B100.0%$0.001827.2s100%
14Grok 4.3 (Reasoning)100.0%$0.01313.9s100%
15Gemma 3 27B100.0%$0.001627.5s100%
16Grok 4.20 (Beta)100.0%$0.0195.5s100%
17Grok 4.20100.0%$0.0207.5s100%
18Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.0246.3s100%
19GPT-4.1100.0%$0.0248.5s100%
20Claude Haiku 4.5100.0%$0.0248.2s100%
21DeepSeek V3.2100.0%$0.004633.9s100%
22DeepSeek-V2 Chat100.0%$0.004934.0s100%
23DeepSeek V3 (2024-12-26)100.0%$0.004540.2s100%
24GPT-4o, Aug. 6th (temp=0)100.0%$0.0328.4s100%
25MiniMax M2.5100.0%$0.003645.0s100%
26GPT-5.4 Nano (Reasoning)100.0%$0.008841.7s100%
27DeepSeek V4 Flash100.0%$0.002051.4s100%
28Z.AI GLM 4.5100.0%$0.007548.1s100%
29GPT-5.4100.0%$0.03317.4s100%
30DeepSeek V3.1100.0%$0.005552.0s100%
31Mistral Large 2100.0%$0.03720.5s100%
32Grok 4.20 (Beta, Reasoning)100.0%$0.03230.1s100%
33Qwen 3.6 Flash100.0%$0.01551.7s100%
34Gemini 2.5 Flash98.6%$0.00756.1s92%
35Xiaomi MIMO v2.5100.0%$0.00271.1m100%
36Claude 3 Haiku98.6%$0.005611.6s92%
37GPT-5.4 Mini98.6%$0.00877.9s92%
38Grok 4.20 (Reasoning)100.0%$0.02740.0s100%
39Ministral 3 8B98.5%$0.002217.0s91%
40Qwen 3.6 35B100.0%$0.0141.1m100%
41Mistral Large100.0%$0.03641.1s100%
42ByteDance Seed 1.6100.0%$0.0141.2m100%
43o4 Mini100.0%$0.03745.7s100%
44Ministral 8B97.3%$0.001615.6s89%
45Writer: Palmyra X598.5%$0.01618.4s91%
46GPT-5.4 (Reasoning, Low)100.0%$0.05335.5s100%
47GPT-5.198.6%$0.02517.7s92%
48GPT-4o, Aug. 6th (temp=1)98.4%$0.0299.1s91%
49GPT-4.1 Mini97.2%$0.004626.0s89%
50GPT-4o, May 13th (temp=1)100.0%$0.0865.4s100%
51Claude Sonnet 4.5100.0%$0.07718.6s100%
52GPT-4o, May 13th (temp=0)100.0%$0.0885.5s100%
53Qwen3 235B A22B Instruct 250798.6%$0.00171.0m92%
54Gemma 4 31B100.0%$0.00281.9m100%
55Claude Sonnet 4.6100.0%$0.08123.3s100%
56GPT-5.2100.0%$0.06250.7s100%
57Gemini 3 Flash (Preview, Reasoning)98.6%$0.03040.3s92%
58DeepSeek V3 (2025-03-24)97.1%$0.003750.3s88%
59GPT-OSS 120B100.0%$0.00402.2m100%
60Qwen 3.5 35B100.0%$0.0152.0m100%
61Mistral Medium 3.197.3%$0.01021.2s84%
62Mistral Small 3.2 24B94.7%$0.001713.8s83%
63o4 Mini High100.0%$0.0591.3m100%
64GPT-5.5 (Reasoning, Low)100.0%$0.09742.6s100%
65DeepSeek V4 Pro (Reasoning)100.0%$0.0202.3m100%
66Qwen 2.5 72B96.9%$0.00561.2m87%
67GPT-5.5100.0%$0.10537.1s100%
68Mistral NeMO96.1%$0.002415.4s77%
69Gemma 3 4B94.7%$0.000727.5s79%
70Gemini 2.5 Pro100.0%$0.09958.2s100%
71Claude Opus 4.5100.0%$0.13119.8s100%
72Gemini 3.5 Flash (Reasoning)100.0%$0.11349.4s100%
73GPT-5 Mini98.6%$0.0192.0m92%
74Qwen 3.6 27B100.0%$0.0342.7m100%
75Z.AI GLM 4.6100.0%$0.0183.0m100%
76Aion 2.0100.0%$0.0243.0m100%
77Qwen 3.5 397B A17B100.0%$0.0452.6m100%
78GPT-5.4 Nano (Reasoning, Low)95.1%$0.003813.8s71%
79Qwen 3.5 Plus (2026-04-20)100.0%$0.0233.1m100%
80DeepSeek V4 Flash (Reasoning)100.0%$0.00413.5m100%
81Arcee AI: Trinity Mini96.5%$0.00371.9m86%
82Claude Opus 4.6100.0%$0.15431.4s100%
83Xiaomi MIMO v2.5 Pro97.4%$0.00761.9m84%
84ByteDance Seed 2.0 Lite100.0%$0.0143.6m100%
85Claude Opus 4.7100.0%$0.18224.3s100%
86GPT-4.1 Nano88.3%$0.00096.9s69%
87WizardLM 2 8x22b97.4%$0.00982.3m85%
88Z.AI GLM 4.7100.0%$0.0253.7m100%
89GPT-5100.0%$0.0922.3m100%
90Claude Opus 4.7 (Reasoning)100.0%$0.18425.0s100%
91GPT-5.4 Mini (Reasoning, Low)92.5%$0.01414.4s68%
92Nemotron 3 Super100.0%$0.00003.2m
93Qwen 3.5 Plus (2026-02-15)100.0%$0.0213.9m100%
94Z.AI GLM 5 Turbo100.0%$0.0443.6m100%
95Qwen 3.5 122B100.0%$0.0354.0m100%
96Ministral 3B86.5%$0.00079.8s65%
97Qwen 3.5 Flash98.6%$0.00403.8m92%
98GPT-5 Nano95.2%$0.00802.2m79%
99Z.AI GLM 4.7 Flash95.9%$0.00502.8m82%
100Claude Sonnet 495.1%$0.07217.5s71%
101Gemini 3.1 Pro (Preview)100.0%$0.1721.5m100%
102Z.AI GLM 5100.0%$0.0374.6m100%
103Qwen3.6 Max Preview100.0%$0.0843.7m100%
104Llama 3.1 70B91.6%$0.006434.0s58%
105GPT-5.4 Nano83.1%$0.002611.2s57%
106GPT-5.4 (Reasoning)100.0%$0.1752.6m100%
107Gemini 2.5 Flash Lite (Reasoning)84.9%$0.00731.0m60%
108Hermes 3 405B92.3%$0.0151.0m54%
109Ministral 3 3B81.7%$0.00079.9s50%
110MoonshotAI: Kimi K2.6100.0%$0.0825.0m100%
111Mistral Small 4 (Reasoning)87.9%$0.005124.7s45%
112MoonshotAI: Kimi K2.598.6%$0.0315.4m92%
113GPT-5.4 Mini (Reasoning)97.3%$0.1172.8m84%
114GPT-5.5 (Reasoning)100.0%$0.2512.0m100%
115Qwen 3 32B88.7%$0.00422.9m64%
116Mistral Small 480.3%$0.00307.4s39%
117ByteDance Seed 2.0 Mini100.0%$0.00628.7m100%
118Z.AI GLM 5.1100.0%$0.0817.3m100%
119Cydonia 24B V4.179.4%$0.00401.0m38%
120Z.AI GLM 4.5 Air91.4%$0.00713.2m49%
121Gemma 4 31B (Reasoning)95.1%$0.00435.8m71%
122Claude Opus 4.8 (Reasoning, Low)100.0%$0.3831.9m100%
123Claude Opus 4.8 (Reasoning)100.0%$0.3811.9m100%
124Claude Opus 4.6 (Reasoning)100.0%$0.3562.6m100%
125MiniMax M3100.0%$0.0369.3m100%
126Skyfall 36B V268.9%$0.005426.5s29%
127Rocinante 12B75.2%$0.003332.1s22%
128ByteDance Seed 1.6 Flash69.5%$0.001938.2s26%
129Qwen 3.5 27B90.0%$0.0293.7m40%
130Hermes 3 70B68.6%$0.004542.7s25%
131Gemini 2.5 Flash (Reasoning)62.8%$0.02337.3s29%
132Gemma 4 26B (Reasoning)82.2%$0.00413.1m29%
133Llama 3.1 8B54.5%$0.000551.5s29%
134Qwen 3.5 9B90.3%$0.00407.8m65%
135Cohere Command R+ (Aug. 2024)69.1%$0.05756.5s16%
136Claude Opus 495.1%$0.3682.4m71%
137Qwen3.7 Max81.2%$0.0453.5m25%
138Nemotron 3 Nano66.5%$0.00392.8m19%
139Claude Sonnet 4.6 (Reasoning)100.0%$0.4485.9m100%
140Gemma 3 12B10.1%$0.001430.4s0%
95.29%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.7 Max100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
Claude Opus 4.8 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.8 (Reasoning, Low)100100100100100100.0%
GPT-5100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MiniMax M3100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.2100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
GPT-4.1100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Aion 2.0100100100100100100.0%
o4 Mini100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Gemma 4 26B100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-5.4100100100100100100.0%
Mistral Large 3100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
DeepSeek V4 Flash100100100100100100.0%
Inception Mercury 2100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Grok 4.20100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Mistral Small 4 (Reasoning)100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Grok 4.3100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Gemma 3 27B100100100100100100.0%
Qwen 2.5 72B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
LFM2 24B100100100100100100.0%
Rocinante 12B100100100100100100.0%
Qwen 3 32B1001001001007494.9%
Mistral Small 3.2 24B1001001001007494.9%
WizardLM 2 8x22b1001001001007494.9%
GPT-4.1 Nano1001001001007494.9%
Z.AI GLM 4.7 Flash1001001001007394.5%
Mistral Medium 3.11001001001007394.5%
Gemma 3 4B1001001001007394.5%
Gemma 4 31B (Reasoning)1001001001005190.3%
Claude Sonnet 41001001001005190.3%
Claude Opus 41001001001005190.3%
GPT-5.4 Mini (Reasoning, Low)1001001001005190.3%
GPT-5.4 Nano (Reasoning, Low)1001001001005190.3%
Qwen 3.5 9B1001001001004488.7%
Ministral 3B10010074737383.9%
Z.AI GLM 4.5 Air1001001001001482.9%
GPT-5.4 Nano10010073735680.3%
Qwen 3.5 27B100100100100080.0%
Skyfall 36B V2100100100542876.4%
Ministral 3 3B1007474745475.4%
Gemini 2.5 Flash Lite (Reasoning)10010073515175.0%
Mistral Small 410010010073174.7%
Cydonia 24B V4.110010010071074.1%
Hermes 3 70B10010073483571.2%
ByteDance Seed 1.6 Flash10010051514569.5%
Nemotron 3 Nano1001007351064.8%
Cohere Command R+ (Aug. 2024)10010010020064.0%
Gemini 2.5 Flash (Reasoning)1005151515161.0%
Llama 3.1 8B1005151413856.3%
Gemma 3 12B100000020.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
Claude Opus 4.8 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.8 (Reasoning, Low)100100100100100100.0%
GPT-5100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MiniMax M3100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Gemma 4 31B (Reasoning)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.2100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
GPT-4.1100100100100100100.0%
MiniMax M2.5100100100100100100.0%
Aion 2.0100100100100100100.0%
o4 Mini100100100100100100.0%
MiniMax M2.7100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
Z.AI GLM 4.5100100100100100100.0%
Gemma 4 26B100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-5.4100100100100100100.0%
Mistral Large 3100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
DeepSeek V4 Flash100100100100100100.0%
Inception Mercury 2100100100100100100.0%
Mistral Large 2100100100100100100.0%
Grok 4.20100100100100100100.0%
Z.AI GLM 4.5 Air100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Nemotron 3 Super100100.0%
Grok 4.3100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Gemma 3 27B100100100100100100.0%
WizardLM 2 8x22b100100100100100100.0%
Ministral 3 14B100100100100100100.0%
GPT-5 Mini1001001001008697.3%
GPT-5.11001001001008697.3%
Gemini 3 Flash (Preview, Reasoning)1001001001008697.3%
Z.AI GLM 4.7 Flash1001001001008697.3%
GPT-5.4 Mini1001001001008697.3%
Qwen3 235B A22B Instruct 25071001001001008697.3%
Claude 3 Haiku1001001001008697.3%
MoonshotAI: Kimi K2.51001001001008697.2%
Qwen 3.5 Flash1001001001008697.2%
Gemini 2.5 Flash1001001001008697.2%
Writer: Palmyra X51001001001008597.1%
Ministral 3 8B1001001001008597.1%
GPT-4o, Aug. 6th (temp=1)1001001001008496.9%
Gemma 3 4B1001001001007494.9%
Xiaomi MIMO v2.5 Pro1001001001007494.7%
GPT-5.4 Mini (Reasoning, Low)1001001001007494.7%
Gemini 2.5 Flash Lite (Reasoning)1001001001007494.7%
Ministral 8B100100100868694.6%
GPT-5.4 Mini (Reasoning)1001001001007394.5%
GPT-4.1 Mini100100100868694.5%
Mistral Small 3.2 24B100100100868694.5%
DeepSeek V3 (2025-03-24)100100100868594.2%
Qwen 2.5 72B100100100858493.7%
Arcee AI: Trinity Mini100100100838293.1%
Mistral NeMO1001001001006192.2%
Qwen 3.5 9B100100100857492.0%
GPT-5 Nano100100100866690.3%
Ministral 3B10010086857489.1%
Ministral 3 3B100100100746688.0%
Mistral Small 4100100100745585.9%
GPT-5.4 Nano10010086865685.9%
Cydonia 24B V4.110010085736684.7%
Hermes 3 405B1001001001002384.6%
Llama 3.1 70B100100100863083.2%
Qwen 3 32B100100100595482.5%
GPT-4.1 Nano1008482727181.8%
Mistral Small 4 (Reasoning)1001008685775.7%
Cohere Command R+ (Aug. 2024)1001008584174.2%
Nemotron 3 Nano10010010038468.2%
Hermes 3 70B10010010030066.0%
Gemini 2.5 Flash (Reasoning)100747474364.7%
Gemma 4 26B (Reasoning)100100100121064.4%
Qwen3.7 Max10010010012062.5%
Skyfall 36B V21006665453261.3%
Llama 3.1 8B856153353052.8%
Rocinante 12B100100439050.4%
Gemma 3 12B100000.2%