Character recall

Test: Relationship tree

Avg. Score
89.1%
Scenarios
2

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 3.1 Flash Lite (Reasoning)100.0%$0.00333.2s100%
2Gemini 3.1 Flash Lite100.0%$0.00313.3s100%
3Gemini 3.1 Flash Lite (Preview)100.0%$0.00463.2s100%
4Gemini 3 Flash (Preview)100.0%$0.00877.1s100%
5Grok 4.3100.0%$0.00947.8s100%
6DeepSeek V4 Pro100.0%$0.008317.3s100%
7Mistral Large 3100.0%$0.009117.5s100%
8Gemma 4 26B100.0%$0.001827.2s100%
9Grok 4.20 (Beta)100.0%$0.0195.5s100%
10Grok 4.20100.0%$0.0207.5s100%
11Gemini 3.5 Flash (Reasoning, Minimal)100.0%$0.0246.3s100%
12Claude Haiku 4.5100.0%$0.0248.2s100%
13DeepSeek V3.2100.0%$0.004633.9s100%
14DeepSeek V3 (2024-12-26)100.0%$0.004540.2s100%
15GPT-5.1100.0%$0.02517.7s100%
16GPT-5.4 Nano (Reasoning)100.0%$0.008841.7s100%
17DeepSeek V4 Flash100.0%$0.002051.4s100%
18GPT-5.4100.0%$0.03317.4s100%
19Mistral Large 2100.0%$0.03720.5s100%
20Gemini 2.5 Flash98.6%$0.00756.1s92%
21Qwen 3.6 Flash100.0%$0.01551.7s100%
22Ministral 8B98.6%$0.001615.6s92%
23Grok 4.20 (Reasoning)100.0%$0.02740.0s100%
24Gemini 3 Flash (Preview, Reasoning)100.0%$0.03040.3s100%
25Ministral 3 14B98.6%$0.003125.4s92%
26GPT-4.1 Mini98.6%$0.004626.0s92%
27Mistral Large100.0%$0.03641.1s100%
28ByteDance Seed 1.6100.0%$0.0141.2m100%
29Claude 3 Haiku97.2%$0.005611.6s89%
30GPT-5.4 (Reasoning, Low)100.0%$0.05335.5s100%
31Gemini 2.5 Flash Lite95.8%$0.00234.8s87%
32GPT-5.4 Mini95.8%$0.00877.9s87%
33Claude Sonnet 4.5100.0%$0.07718.6s100%
34GPT-4.197.2%$0.0248.5s89%
35Gemma 4 31B100.0%$0.00281.9m100%
36Claude Sonnet 4.6100.0%$0.08123.3s100%
37Grok 4.20 (Beta, Reasoning)98.6%$0.03230.1s92%
38Xiaomi MIMO v2.598.6%$0.00271.1m92%
39GPT-5.2100.0%$0.06250.7s100%
40Ministral 3 8B95.9%$0.002217.0s83%
41Qwen 3.5 35B100.0%$0.0152.0m100%
42Mistral Medium 3.197.3%$0.01021.2s84%
43Writer: Palmyra X597.4%$0.01618.4s84%
44GPT-5 Mini100.0%$0.0192.0m100%
45Inception Mercury 295.9%$0.009612.8s83%
46Mistral Small 3.2 24B93.1%$0.001713.8s81%
47DeepSeek V4 Pro (Reasoning)100.0%$0.0202.3m100%
48GPT-5.5100.0%$0.10537.1s100%
49Gemini 2.5 Pro100.0%$0.09958.2s100%
50Claude Opus 4.5100.0%$0.13119.8s100%
51GPT-5.4 Nano91.7%$0.002611.2s78%
52DeepSeek-V2 Chat96.2%$0.004934.0s77%
53Gemini 3.5 Flash (Reasoning)100.0%$0.11349.4s100%
54GPT-OSS 120B98.6%$0.00402.2m92%
55GPT-5 Nano98.6%$0.00802.2m92%
56GPT-5.4 Nano (Reasoning, Low)95.1%$0.003813.8s71%
57Qwen 3.5 397B A17B100.0%$0.0452.6m100%
58Qwen 3.5 Plus (2026-04-20)100.0%$0.0233.1m100%
59DeepSeek V4 Flash (Reasoning)100.0%$0.00413.5m100%
60Claude Opus 4.6100.0%$0.15431.4s100%
61Xiaomi MIMO v2.5 Pro97.4%$0.00761.9m84%
62ByteDance Seed 2.0 Lite100.0%$0.0143.6m100%
63Ministral 3 3B89.7%$0.00079.9s69%
64GPT-5.4 Mini (Reasoning, Low)92.5%$0.01414.4s68%
65Claude Opus 4.7100.0%$0.18224.3s100%
66GPT-5100.0%$0.0922.3m100%
67Claude Opus 4.7 (Reasoning)100.0%$0.18425.0s100%
68Z.AI GLM 4.698.6%$0.0183.0m92%
69Qwen 3.5 Plus (2026-02-15)100.0%$0.0213.9m100%
70Z.AI GLM 5 Turbo100.0%$0.0443.6m100%
71o4 Mini High94.5%$0.0591.3m82%
72Qwen 3.5 122B100.0%$0.0354.0m100%
73Qwen 3.5 Flash98.6%$0.00403.8m92%
74Z.AI GLM 4.589.7%$0.007548.1s69%
75Claude Sonnet 495.1%$0.07217.5s71%
76Gemini 3.1 Pro (Preview)100.0%$0.1721.5m100%
77o4 Mini92.5%$0.03745.7s68%
78MiniMax M2.590.4%$0.003645.0s62%
79Llama 3.1 70B91.6%$0.006434.0s58%
80Qwen3.6 Max Preview100.0%$0.0843.7m100%
81Qwen 3.6 27B96.2%$0.0342.7m77%
82WizardLM 2 8x22b93.7%$0.00982.3m70%
83Ministral 3B83.1%$0.00079.8s55%
84Z.AI GLM 4.797.3%$0.0253.7m84%
85GPT-5.4 (Reasoning)100.0%$0.1752.6m100%
86Gemini 2.5 Flash Lite (Reasoning)84.9%$0.00731.0m60%
87MiniMax M2.781.0%$0.006321.4s53%
88MoonshotAI: Kimi K2.6100.0%$0.0825.0m100%
89GPT-4o, Aug. 6th (temp=1)81.1%$0.0299.1s54%
90MoonshotAI: Kimi K2.598.6%$0.0315.4m92%
91Grok 4.3 (Reasoning)90.0%$0.01313.9s40%
92GPT-4o, Aug. 6th (temp=0)81.2%$0.0328.4s51%
93GPT-5.5 (Reasoning)100.0%$0.2512.0m100%
94DeepSeek V3.190.0%$0.005552.0s40%
95Qwen3 235B A22B Instruct 250790.0%$0.00171.0m40%
96GPT-5.4 Mini (Reasoning)94.8%$0.1172.8m77%
97Qwen 2.5 72B79.2%$0.00561.2m50%
98Qwen 3.6 35B90.0%$0.0141.1m40%
99Mistral Small 479.2%$0.00307.4s37%
100Z.AI GLM 4.7 Flash85.4%$0.00502.8m60%
101GPT-4o, May 13th (temp=1)81.6%$0.0865.4s52%
102GPT-4.1 Nano73.2%$0.00096.9s34%
103GPT-4o, May 13th (temp=0)79.3%$0.0885.5s47%
104Z.AI GLM 5.1100.0%$0.0817.3m100%
105GPT-5.5 (Reasoning, Low)90.0%$0.09742.6s40%
106Z.AI GLM 4.5 Air88.8%$0.00713.2m48%
107ByteDance Seed 2.0 Mini98.6%$0.00628.7m92%
108Claude Opus 4.8 (Reasoning, Low)100.0%$0.3831.9m100%
109Claude Opus 4.8 (Reasoning)100.0%$0.3811.9m100%
110Hermes 3 405B79.2%$0.0151.0m29%
111Claude Opus 4.6 (Reasoning)100.0%$0.3562.6m100%
112MiniMax M3100.0%$0.0369.3m100%
113Mistral Small 4 (Reasoning)72.6%$0.005124.7s24%
114Gemma 3 27B67.4%$0.001627.5s23%
115Qwen 3.5 27B90.0%$0.0293.7m40%
116Mistral NeMO64.6%$0.002415.4s19%
117Gemini 2.5 Flash (Reasoning)62.8%$0.02337.3s29%
118ByteDance Seed 1.6 Flash65.1%$0.001938.2s20%
119Gemma 4 26B (Reasoning)82.2%$0.00413.1m29%
120Qwen 3.5 9B89.1%$0.00407.8m71%
121Z.AI GLM 590.0%$0.0374.6m40%
122Cydonia 24B V4.163.8%$0.00401.0m17%
123Llama 3.1 8B52.2%$0.000551.5s26%
124Claude Opus 495.1%$0.3682.4m71%
125Qwen3.7 Max81.2%$0.0453.5m25%
126Qwen 3 32B63.3%$0.00422.9m26%
127Gemma 4 31B (Reasoning)85.1%$0.00435.8m36%
128Hermes 3 70B48.6%$0.004542.7s14%
129DeepSeek V3 (2025-03-24)53.4%$0.003750.3s8%
130Gemma 3 4B39.7%$0.000727.5s14%
131Aion 2.070.0%$0.0243.0m8%
132Claude Sonnet 4.6 (Reasoning)100.0%$0.4485.9m100%
133Skyfall 36B V236.4%$0.005426.5s5%
134Rocinante 12B36.3%$0.003332.1s4%
135Arcee AI: Trinity Mini30.4%$0.00371.9m24%
136Cohere Command R+ (Aug. 2024)34.2%$0.05756.5s18%
137Nemotron 3 Super0.0%$0.00003.2m
138Nemotron 3 Nano39.8%$0.00392.8m8%
139LFM2 24B0.0%$0.00047.8s0%
140Gemma 3 12B0.1%$0.001430.4s0%
89.07%

Individual Scenarios

Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Qwen3.7 Max100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
GPT-5.5 (Reasoning, Low)100100100100100100.0%
Claude Opus 4.8 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.8 (Reasoning, Low)100100100100100100.0%
GPT-5100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MiniMax M3100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
GPT-5.4 Mini (Reasoning)100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
Z.AI GLM 5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.2100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Gemma 4 26B (Reasoning)100100100100100100.0%
o4 Mini High100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
Qwen 3.6 27B100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Qwen 3.6 35B100100100100100100.0%
Z.AI GLM 4.6100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
GPT-4.1100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Xiaomi MIMO v2.5 Pro100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
ByteDance Seed 2.0 Mini100100100100100100.0%
Qwen 3.5 Flash100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
Gemma 4 26B100100100100100100.0%
GPT-OSS 120B100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100.0%
GPT-5.4100100100100100100.0%
Mistral Large 3100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
DeepSeek-V2 Chat100100100100100100.0%
Xiaomi MIMO v2.5100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
DeepSeek V4 Flash100100100100100100.0%
Inception Mercury 2100100100100100100.0%
Mistral Large 2100100100100100100.0%
GPT-4.1 Mini100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100.0%
Grok 4.20100100100100100100.0%
Hermes 3 405B100100100100100100.0%
Gemini 2.5 Flash100100100100100100.0%
GPT-5.4 Mini100100100100100100.0%
GPT-5 Nano100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100.0%
Mistral Large100100100100100100.0%
Writer: Palmyra X5100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100.0%
Grok 4.3100100100100100100.0%
Llama 3.1 70B100100100100100100.0%
Ministral 3 14B100100100100100100.0%
Claude 3 Haiku100100100100100100.0%
Ministral 3 8B100100100100100100.0%
GPT-4.1 Nano100100100100100100.0%
Mistral NeMO100100100100100100.0%
Ministral 8B100100100100100100.0%
Z.AI GLM 4.71001001001007394.5%
Mistral Small 3.2 24B1001001001007394.5%
Mistral Medium 3.11001001001007394.5%
Ministral 3 3B1001001001007394.5%
Gemma 4 31B (Reasoning)1001001001005190.3%
Claude Sonnet 41001001001005190.3%
Claude Opus 41001001001005190.3%
MiniMax M2.51001001001005190.3%
o4 Mini1001001001005190.3%
Z.AI GLM 4.51001001001005190.3%
GPT-5.4 Mini (Reasoning, Low)1001001001005190.3%
GPT-5.4 Nano (Reasoning, Low)1001001001005190.3%
Qwen 2.5 72B1001001001005190.3%
WizardLM 2 8x22b1001001001005190.3%
Qwen 3.5 9B100100100737389.0%
GPT-5.4 Nano100100100737389.0%
Ministral 3B100100100737389.0%
Z.AI GLM 4.5 Air1001001001001482.9%
Qwen 3.5 27B100100100100080.0%
Grok 4.3 (Reasoning)100100100100080.0%
DeepSeek V3.1100100100100080.0%
Mistral Small 4 (Reasoning)100100100100080.0%
Gemma 3 27B100100100100080.0%
Z.AI GLM 4.7 Flash10010073735179.3%
MiniMax M2.710010073515175.0%
Gemini 2.5 Flash Lite (Reasoning)10010073515175.0%
Mistral Small 410010010073174.7%
Gemma 3 4B737373737372.6%
Qwen 3 32B10010051513567.5%
Llama 3.1 8B1007351515165.3%
ByteDance Seed 1.6 Flash10010051512365.1%
Nemotron 3 Nano1001007351064.8%
Rocinante 12B1001007351064.8%
Gemini 2.5 Flash (Reasoning)1005151515161.0%
Cydonia 24B V4.11001005151060.5%
DeepSeek V3 (2025-03-24)1001001000060.0%
Hermes 3 70B1007351353558.8%
Skyfall 36B V2100735114047.6%
Cohere Command R+ (Aug. 2024)73515135042.1%
Arcee AI: Trinity Mini515135353541.5%
Aion 2.010010000040.0%
Gemma 3 12B000000.0%
LFM2 24B000000.0%
Model # 1 # 2 # 3 # 4 # 5 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100.0%
Z.AI GLM 5.1100100100100100100.0%
Qwen3.6 Max Preview100100100100100100.0%
GPT-5.5 (Reasoning)100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100.0%
MoonshotAI: Kimi K2.6100100100100100100.0%
Claude Opus 4.7 (Reasoning)100100100100100100.0%
Claude Opus 4.8 (Reasoning)100100100100100100.0%
Claude Opus 4.6100100100100100100.0%
Claude Opus 4.8 (Reasoning, Low)100100100100100100.0%
GPT-5100100100100100100.0%
GPT-5 Mini100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100.0%
Grok 4.3 (Reasoning)100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100.0%
Grok 4.20 (Reasoning)100100100100100100.0%
GPT-5.1100100100100100100.0%
Claude Sonnet 4.6100100100100100100.0%
MiniMax M3100100100100100100.0%
Qwen 3.5 122B100100100100100100.0%
Qwen 3.5 27B100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100.0%
Claude Opus 4.7100100100100100100.0%
Qwen 3.5 Plus (2026-04-20)100100100100100100.0%
Claude Opus 4.5100100100100100100.0%
ByteDance Seed 1.6100100100100100100.0%
GPT-5.2100100100100100100.0%
GPT-5.5100100100100100100.0%
Qwen 3.6 Flash100100100100100100.0%
DeepSeek V4 Pro (Reasoning)100100100100100100.0%
Gemini 2.5 Pro100100100100100100.0%
DeepSeek V4 Flash (Reasoning)100100100100100100.0%
Z.AI GLM 4.7100100100100100100.0%
Claude Sonnet 4100100100100100100.0%
Claude Sonnet 4.5100100100100100100.0%
Claude Opus 4100100100100100100.0%
Qwen 3.5 35B100100100100100100.0%
Aion 2.0100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100.0%
Gemini 3.1 Flash Lite (Reasoning)100100100100100100.0%
Gemini 3.5 Flash (Reasoning, Minimal)100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100.0%
Gemini 3.1 Flash Lite (Preview)100100100100100100.0%
Gemma 4 31B100100100100100100.0%
Gemini 3.1 Flash Lite100100100100100100.0%
Gemma 4 26B100100100100100100.0%
GPT-5.4100100100100100100.0%
Mistral Large 3100100100100100100.0%
ByteDance Seed 2.0 Lite100100100100100100.0%
Claude Haiku 4.5100100100100100100.0%
Grok 4.20 (Beta)100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100.0%
DeepSeek V3.1100100100100100100.0%
DeepSeek V3.2100100100100100100.0%
DeepSeek V4 Pro100100100100100100.0%
DeepSeek V4 Flash100100100100100100.0%
Mistral Large 2100100100100100100.0%
Grok 4.20100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100.0%
Mistral Large100100100100100100.0%
Grok 4.3100100100100100100.0%
GPT-5.4 Nano (Reasoning, Low)100100100100100100.0%
Mistral Medium 3.1100100100100100100.0%
Grok 4.20 (Beta, Reasoning)1001001001008697.2%
MoonshotAI: Kimi K2.51001001001008697.2%
Z.AI GLM 4.61001001001008697.2%
ByteDance Seed 2.0 Mini1001001001008697.2%
Qwen 3.5 Flash1001001001008697.2%
GPT-OSS 120B1001001001008697.2%
Xiaomi MIMO v2.51001001001008697.2%
GPT-4.1 Mini1001001001008697.2%
Gemini 2.5 Flash1001001001008697.2%
GPT-5 Nano1001001001008697.2%
WizardLM 2 8x22b1001001001008697.2%
Ministral 3 14B1001001001008697.2%
Ministral 8B1001001001008697.2%
o4 Mini1001001001007494.7%
Xiaomi MIMO v2.5 Pro1001001001007494.7%
GPT-5.4 Mini (Reasoning, Low)1001001001007494.7%
Gemini 2.5 Flash Lite (Reasoning)1001001001007494.7%
Z.AI GLM 4.5 Air1001001001007494.7%
Writer: Palmyra X51001001001007494.7%
GPT-4.1100100100868694.4%
GPT-5.4 Nano100100100868694.4%
Claude 3 Haiku100100100868694.4%
Qwen 3.6 27B1001001001006292.5%
DeepSeek-V2 Chat1001001001006292.5%
Inception Mercury 2100100100867491.9%
Ministral 3 8B100100100867491.9%
Mistral Small 3.2 24B10010086868691.6%
Z.AI GLM 4.7 Flash10010086868691.6%
GPT-5.4 Mini10010086868691.6%
Gemini 2.5 Flash Lite10010086868691.6%
MiniMax M2.51001001001005390.5%
GPT-5.4 Mini (Reasoning)100100100866289.7%
o4 Mini High10010086867489.1%
Z.AI GLM 4.510010086867489.1%
Qwen 3.5 9B10010086867489.1%
MiniMax M2.710010086866286.9%
Ministral 3 3B10010086865384.9%
Mistral Small 41008686747483.8%
Llama 3.1 70B100100100863083.2%
GPT-5.5 (Reasoning, Low)100100100100080.0%
Gemma 4 31B (Reasoning)100100100100080.0%
Z.AI GLM 5100100100100080.0%
Qwen 3.6 35B100100100100080.0%
Qwen3 235B A22B Instruct 2507100100100100080.0%
Ministral 3B1008674745377.1%
Qwen 2.5 72B868662624468.2%
Cydonia 24B V4.11001007462067.2%
Mistral Small 4 (Reasoning)86867474765.2%
Gemini 2.5 Flash (Reasoning)100747474364.7%
Gemma 4 26B (Reasoning)100100100121064.4%
GPT-4o, May 13th (temp=1)747462624463.2%
Qwen3.7 Max10010010012062.5%
GPT-4o, Aug. 6th (temp=0)626262626262.4%
GPT-4o, Aug. 6th (temp=1)1005353535362.1%
Qwen 3 32B1006253443759.2%
GPT-4o, May 13th (temp=0)626262535358.5%
Hermes 3 405B100868620058.3%
Gemma 3 27B626253534454.9%
DeepSeek V3 (2025-03-24)8686620046.9%
GPT-4.1 Nano535353443046.4%
Llama 3.1 8B745330301039.2%
Hermes 3 70B10062300038.5%
Mistral NeMO53443012729.3%
Cohere Command R+ (Aug. 2024)745350026.3%
Skyfall 36B V28620200025.1%
Arcee AI: Trinity Mini3730255019.3%
Nemotron 3 Nano532020014.9%
Rocinante 12B3730007.9%
Gemma 3 4B1255556.8%
Gemma 3 12B100000.2%
Nemotron 3 Super00.0%