Contains a count of nouns

Test: Novel outline

Avg. Score
71.4%
Scenarios
5

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1o4 Mini100.0%$0.00235.1s100%
2o4 Mini High100.0%$0.00286.5s100%
3GPT-5 Nano100.0%$0.000413.0s100%
4Gemini 2.5 Pro100.0%$0.00896.4s100%
5Claude Opus 4.6100.0%$0.0124.1s100%
6MoonshotAI: Kimi K2.5100.0%$0.002013.9s100%
7Gemini 3 Pro (Preview)100.0%$0.0106.5s100%
8GPT-5.198.0%$0.00174.1s72%
9GPT-5 Mini98.0%$0.00106.9s72%
10Gemini 3.1 Pro (Preview)100.0%$0.0119.2s100%
11Grok 4100.0%$0.01112.0s100%
12GPT-5.294.0%$0.00252.9s53%
13Minimax M2.596.0%$0.00079.1s61%
14Grok 4.1 Fast92.0%$0.00043.3s46%
15Stealth: Aurora Alpha94.0%1.4s53%
16Claude Sonnet 494.0%$0.00754.2s53%
17Llama 3.1 Nemotron 70B92.0%$0.00069.2s46%
18Z.AI GLM 4.7 Flash92.0%$0.000415.7s46%
19GPT-4.184.0%$0.00223.5s27%
20Qwen 3.5 397B A17B100.0%$0.005734.3s100%
21DeepSeek V3.288.0%$0.000411.9s35%
22Claude Opus 4.590.0%$0.0133.8s40%
23Claude Sonnet 4.586.0%$0.00743.4s31%
24ByteDance Seed 1.6 Flash80.0%$0.00045.2s20%
25Claude 3.5 Haiku80.0%$0.00203.7s20%
26Z.AI GLM 590.0%$0.002415.8s40%
27Ministral 3 14B76.0%$0.00033.0s15%
28Mistral Large 282.0%$0.00446.2s23%
29Mistral NeMO74.0%$0.00022.4s12%
30Claude Haiku 4.576.0%$0.00262.4s15%
31Z.AI GLM 4.696.0%$0.001931.1s61%
32Z.AI GLM 4.796.0%$0.001732.6s61%
33Ministral 3 8B70.0%$0.00021.7s8%
34ByteDance Seed 1.680.0%$0.001513.5s20%
35GPT-4.1 Mini66.0%$0.00021.8s5%
36Mistral Medium 3.168.0%$0.00092.9s7%
37Qwen 3.5 Plus (2026-02-15)78.0%$0.001611.7s17%
38Claude Sonnet 4.672.0%$0.00642.1s10%
39Claude 3.7 Sonnet74.0%$0.00684.4s12%
40DeepSeek V3 (2024-12-26)70.0%$0.00076.4s8%
41Arcee AI: Trinity Mini64.0%$0.00013.4s4%
42Arcee AI: Trinity Large (Preview)64.0%$0.00003.8s4%
43Gemma 3 12B64.0%$0.00013.8s4%
44DeepSeek V3 (2025-03-24)74.0%$0.000612.1s12%
45Mistral Small Creative60.0%$0.00022.0s2%
46Llama 3.1 70B66.0%$0.00115.8s5%
47Ministral 3 3B58.0%$0.00021.2s1%
48Llama 3.1 8B58.0%$0.00021.8s1%
49Z.AI GLM 4.566.0%$0.00077.2s5%
50Gemini 3 Flash (Preview)56.0%$0.00121.6s1%
51DeepSeek-V2 Chat64.0%$0.00038.0s4%
52Claude 3 Haiku54.0%$0.00051.3s0%
53Ministral 8B52.0%$0.00021.4s0%
54Writer: Palmyra X566.0%$0.00259.1s5%
55Ministral 3B50.0%$0.00011.2s0%
56Claude 3.5 Sonnet64.0%$0.00674.6s4%
57Gemini 2.5 Flash Lite48.0%$0.0001482ms0%
58Gemini 2.5 Flash48.0%$0.0002535ms0%
59DeepSeek V3.164.0%$0.000410.9s4%
60GPT-4o Mini (temp=1)46.0%$0.0002919ms0%
61Mistral Large 350.0%$0.00102.6s0%
62Qwen 2.5 72B56.0%$0.00066.5s1%
63Hermes 3 70B50.0%$0.00063.6s0%
64GPT-4o, May 13th (temp=1)62.0%$0.00953.4s3%
65Mistral Large54.0%$0.00403.3s0%
66Gemma 3 4B42.0%$0.00011.1s0%
67WizardLM 2 8x22b52.0%$0.00156.3s0%
68GPT-4o Mini (temp=0)40.0%$0.0002811ms0%
69Gemma 3 27B46.0%$0.00024.4s0%
70GPT-4.1 Nano40.0%$0.00011.3s0%
71GPT-4o, Aug. 6th (temp=1)46.0%$0.00381.3s0%
72Mistral Small 3.2 24B42.0%$0.00023.0s0%
73GPT-558.0%$0.00429.0s1%
74GPT-4o, Aug. 6th (temp=0)44.0%$0.00371.2s0%
75Grok 4 Fast40.0%$0.00032.1s0%
76GPT-4o, May 13th (temp=0)56.0%$0.00964.1s1%
77Cohere Command R+ (Aug. 2024)44.0%$0.00492.9s0%
78Rocinante 12B52.0%$0.000411.8s0%
79Hermes 3 405B44.0%$0.00008.1s0%
80Claude Opus 484.0%$0.0379.8s27%
71.42%

Individual Scenarios

outline-count

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Ministral 3B100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
Grok 4 Fast1001001001001001001001000080.0%
GPT-5100100100100100100000060.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
ByteDance Seed 1.6100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Gemini 3 Flash (Preview)100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Claude 3.5 Sonnet100100100100100100100100100100100.0%
DeepSeek V3 (2024-12-26)100100100100100100100100100100100.0%
Mistral Large 3100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
DeepSeek-V2 Chat100100100100100100100100100100100.0%
Claude 3.7 Sonnet100100100100100100100100100100100.0%
Claude Haiku 4.5100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100100100100100100.0%
GPT-4o, Aug. 6th (temp=0)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=1)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-4.1 Mini100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
DeepSeek V3.1100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Mistral Large 2100100100100100100100100100100100.0%
Gemini 2.5 Flash100100100100100100100100100100100.0%
Hermes 3 405B100100100100100100100100100100100.0%
GPT-4o Mini (temp=1)100100100100100100100100100100100.0%
GPT-4o Mini (temp=0)100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
Llama 3.1 70B100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Llama 3.1 Nemotron 70B100100100100100100100100100100100.0%
Mistral Large100100100100100100100100100100100.0%
Gemma 3 27B100100100100100100100100100100100.0%
Mistral Small Creative100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
Qwen 2.5 72B100100100100100100100100100100100.0%
Mistral Small 3.2 24B100100100100100100100100100100100.0%
Arcee AI: Trinity Large (Preview)100100100100100100100100100100100.0%
Hermes 3 70B100100100100100100100100100100100.0%
Claude 3 Haiku100100100100100100100100100100100.0%
Ministral 3 8B100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Ministral 3 3B100100100100100100100100100100100.0%
Llama 3.1 8B100100100100100100100100100100100.0%
Cohere Command R+ (Aug. 2024)100100100100100100100100100100100.0%
Mistral NeMO100100100100100100100100100100100.0%
GPT-4.1 Nano100100100100100100100100100100100.0%
Gemma 3 4B100100100100100100100100100100100.0%
Ministral 8B100100100100100100100100100100100.0%
WizardLM 2 8x22b100100100100100100100100100100100.0%
Rocinante 12B100100100100100100100100100100100.0%
GPT-5100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Ministral 3B1001001001001001001001000080.0%
Grok 4 Fast10010010010000000040.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
GPT-5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Grok 4.1 Fast100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Arcee AI: Trinity Mini100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Minimax M2.5100100100100100100100100100090.0%
Z.AI GLM 4.6100100100100100100100100100090.0%
Z.AI GLM 4.7 Flash100100100100100100100100100090.0%
DeepSeek V3 (2025-03-24)100100100100100100100100100090.0%
ByteDance Seed 1.6 Flash100100100100100100100100100090.0%
Mistral NeMO100100100100100100100100100090.0%
Claude Opus 4.51001001001001001001001000080.0%
Claude Sonnet 41001001001001001001001000080.0%
Claude Sonnet 4.51001001001001001001001000080.0%
Mistral Large 21001001001001001001001000080.0%
Llama 3.1 Nemotron 70B1001001001001001001001000080.0%
Z.AI GLM 510010010010010010010000070.0%
GPT-5.210010010010010010010000070.0%
ByteDance Seed 1.610010010010010010010000070.0%
Stealth: Aurora Alpha10010010010010010010000070.0%
Grok 4 Fast10010010010010010010000070.0%
DeepSeek V3.210010010010010010010000070.0%
GPT-4.1100100100100100100000060.0%
Claude Haiku 4.51001001001001000000050.0%
Ministral 3 3B1001001001001000000050.0%
Claude 3.5 Sonnet10010010010000000040.0%
DeepSeek V3 (2024-12-26)10010010010000000040.0%
Claude Opus 4100100100000000030.0%
Claude Sonnet 4.6100100100000000030.0%
DeepSeek-V2 Chat100100100000000030.0%
Claude 3.7 Sonnet100100100000000030.0%
Z.AI GLM 4.51001000000000020.0%
GPT-4o, May 13th (temp=1)1001000000000020.0%
Claude 3.5 Haiku1001000000000020.0%
Ministral 3 14B1001000000000020.0%
Arcee AI: Trinity Large (Preview)1001000000000020.0%
Claude 3 Haiku1001000000000020.0%
WizardLM 2 8x22b1001000000000020.0%
Qwen 3.5 Plus (2026-02-15)10000000000010.0%
Mistral Large 310000000000010.0%
GPT-4.1 Mini10000000000010.0%
Mistral Medium 3.110000000000010.0%
DeepSeek V3.110000000000010.0%
Writer: Palmyra X510000000000010.0%
GPT-4o Mini (temp=1)10000000000010.0%
Gemma 3 12B10000000000010.0%
Llama 3.1 70B10000000000010.0%
Mistral Large10000000000010.0%
Mistral Small Creative10000000000010.0%
Hermes 3 70B10000000000010.0%
Ministral 3 8B10000000000010.0%
Llama 3.1 8B10000000000010.0%
Cohere Command R+ (Aug. 2024)10000000000010.0%
Ministral 8B10000000000010.0%
Gemini 3 Flash (Preview)00000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
Gemini 2.5 Flash00000000000.0%
Hermes 3 405B00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemini 2.5 Flash Lite00000000000.0%
Gemma 3 27B00000000000.0%
Qwen 2.5 72B00000000000.0%
Mistral Small 3.2 24B00000000000.0%
GPT-4.1 Nano00000000000.0%
Gemma 3 4B00000000000.0%
Ministral 3B00000000000.0%
Rocinante 12B00000000000.0%

pov-count

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Claude Opus 4100100100100100100100100100100100.0%
Claude Sonnet 4100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.5100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
GPT-4.1100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
DeepSeek V3.2100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
Mistral Medium 3.1100100100100100100100100100100100.0%
Gemma 3 12B100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100090.0%
Z.AI GLM 4.7100100100100100100100100100090.0%
Minimax M2.5100100100100100100100100100090.0%
ByteDance Seed 1.6100100100100100100100100100090.0%
Claude Haiku 4.5100100100100100100100100100090.0%
DeepSeek V3.1100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
Mistral Small Creative100100100100100100100100100090.0%
Z.AI GLM 51001001001001001001001000080.0%
Grok 4.1 Fast1001001001001001001001000080.0%
Z.AI GLM 4.7 Flash1001001001001001001001000080.0%
Qwen 3.5 Plus (2026-02-15)1001001001001001001001000080.0%
Claude 3.5 Sonnet1001001001001001001001000080.0%
Claude 3.7 Sonnet1001001001001001001001000080.0%
GPT-4o, May 13th (temp=1)1001001001001001001001000080.0%
ByteDance Seed 1.6 Flash1001001001001001001001000080.0%
Writer: Palmyra X51001001001001001001001000080.0%
Claude Opus 4.510010010010010010010000070.0%
DeepSeek-V2 Chat10010010010010010010000070.0%
Z.AI GLM 4.510010010010010010010000070.0%
Mistral Large 210010010010010010010000070.0%
Llama 3.1 70B10010010010010010010000070.0%
Arcee AI: Trinity Large (Preview)10010010010010010010000070.0%
DeepSeek V3 (2024-12-26)100100100100100100000060.0%
Mistral Large100100100100100100000060.0%
Ministral 3 14B100100100100100100000060.0%
Ministral 3 8B100100100100100100000060.0%
Gemini 3 Flash (Preview)1001001001001000000050.0%
DeepSeek V3 (2025-03-24)1001001001001000000050.0%
Mistral Large 310010010010000000040.0%
Gemini 2.5 Flash Lite10010010010000000040.0%
Claude 3 Haiku10010010010000000040.0%
Ministral 3 3B10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Ministral 8B10010010010000000040.0%
Rocinante 12B10010010010000000040.0%
Claude Sonnet 4.6100100100000000030.0%
GPT-4o, Aug. 6th (temp=1)100100100000000030.0%
GPT-4.1 Mini100100100000000030.0%
Qwen 2.5 72B100100100000000030.0%
Llama 3.1 8B100100100000000030.0%
Ministral 3B100100100000000030.0%
GPT-51001000000000020.0%
GPT-4o, Aug. 6th (temp=0)1001000000000020.0%
Hermes 3 70B1001000000000020.0%
WizardLM 2 8x22b1001000000000020.0%
Hermes 3 405B10000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Gemma 3 4B10000000000010.0%
Grok 4 Fast00000000000.0%
GPT-4o, May 13th (temp=0)00000000000.0%
Gemini 2.5 Flash00000000000.0%
GPT-4o Mini (temp=1)00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemma 3 27B00000000000.0%
Mistral Small 3.2 24B00000000000.0%
Cohere Command R+ (Aug. 2024)00000000000.0%
GPT-4.1 Nano00000000000.0%
Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Claude Opus 4.6100100100100100100100100100100100.0%
GPT-5.1100100100100100100100100100100100.0%
o4 Mini High100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Claude Opus 4.5100100100100100100100100100100100.0%
o4 Mini100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Z.AI GLM 4.7100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Minimax M2.5100100100100100100100100100100100.0%
Grok 4100100100100100100100100100100100.0%
Claude Sonnet 4.6100100100100100100100100100100100.0%
Z.AI GLM 4.7 Flash100100100100100100100100100100100.0%
Stealth: Aurora Alpha100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-5 Nano100100100100100100100100100100100.0%
Ministral 3 14B100100100100100100100100100100100.0%
GPT-5 Mini100100100100100100100100100090.0%
Claude Opus 4100100100100100100100100100090.0%
Claude Sonnet 4100100100100100100100100100090.0%
Z.AI GLM 4.6100100100100100100100100100090.0%
GPT-4.1 Mini100100100100100100100100100090.0%
Llama 3.1 Nemotron 70B100100100100100100100100100090.0%
Grok 4.1 Fast1001001001001001001001000080.0%
GPT-4o, May 13th (temp=0)1001001001001001001001000080.0%
Claude 3.5 Haiku1001001001001001001001000080.0%
Ministral 3 8B1001001001001001001001000080.0%
DeepSeek V3.210010010010010010010000070.0%
GPT-4.1100100100100100100000060.0%
Claude 3.7 Sonnet100100100100100100000060.0%
Mistral Large 2100100100100100100000060.0%
Claude Sonnet 4.51001001001001000000050.0%
DeepSeek V3 (2025-03-24)1001001001001000000050.0%
DeepSeek V3 (2024-12-26)1001001001001000000050.0%
Llama 3.1 70B1001001001001000000050.0%
Qwen 2.5 72B1001001001001000000050.0%
Llama 3.1 8B1001001001001000000050.0%
ByteDance Seed 1.610010010010000000040.0%
Claude Haiku 4.510010010010000000040.0%
Z.AI GLM 4.510010010010000000040.0%
ByteDance Seed 1.6 Flash10010010010000000040.0%
Writer: Palmyra X510010010010000000040.0%
Gemini 2.5 Flash10010010010000000040.0%
Mistral NeMO10010010010000000040.0%
Ministral 3B10010010010000000040.0%
Gemini 3 Flash (Preview)100100100000000030.0%
Mistral Medium 3.1100100100000000030.0%
Gemma 3 27B100100100000000030.0%
Arcee AI: Trinity Large (Preview)100100100000000030.0%
GPT-51001000000000020.0%
DeepSeek-V2 Chat1001000000000020.0%
DeepSeek V3.11001000000000020.0%
GPT-4o Mini (temp=1)1001000000000020.0%
Hermes 3 70B1001000000000020.0%
WizardLM 2 8x22b1001000000000020.0%
Rocinante 12B1001000000000020.0%
Grok 4 Fast10000000000010.0%
GPT-4o, May 13th (temp=1)10000000000010.0%
Hermes 3 405B10000000000010.0%
Gemma 3 12B10000000000010.0%
Mistral Small 3.2 24B10000000000010.0%
Claude 3 Haiku10000000000010.0%
Arcee AI: Trinity Mini10000000000010.0%
Cohere Command R+ (Aug. 2024)10000000000010.0%
Ministral 8B10000000000010.0%
Claude 3.5 Sonnet00000000000.0%
Mistral Large 300000000000.0%
GPT-4o, Aug. 6th (temp=1)00000000000.0%
GPT-4o, Aug. 6th (temp=0)00000000000.0%
GPT-4o Mini (temp=0)00000000000.0%
Gemini 2.5 Flash Lite00000000000.0%
Mistral Large00000000000.0%
Mistral Small Creative00000000000.0%
Ministral 3 3B00000000000.0%
GPT-4.1 Nano00000000000.0%
Gemma 3 4B00000000000.0%