Either/Or composite

Test: Novel outline

Avg. Score
69.2%
Scenarios
1

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 2.5 Flash Lite100.0%$0.00031.0s100%
2GPT-5.4 Nano (Reasoning)100.0%$0.00084.8s100%
3Z.AI GLM 4.5100.0%$0.00098.1s100%
4Claude 3.5 Haiku100.0%$0.00254.9s100%
5Z.AI GLM 5 Turbo100.0%$0.00307.2s100%
6Qwen3 235B A22B Instruct 2507100.0%$0.000416.4s100%
7ByteDance Seed 1.6 Flash100.0%$0.001015.0s100%
8Gemini 3 Flash (Preview, Reasoning)100.0%$0.00416.0s100%
9GPT-5.4 (Reasoning, Low)100.0%$0.00515.3s100%
10Qwen 3.5 Plus (2026-02-15)100.0%$0.002116.3s100%
11GPT-5.2100.0%$0.00556.4s100%
12Qwen 3.5 Flash100.0%$0.001120.6s100%
13Writer: Palmyra X5100.0%$0.003912.4s100%
14Grok 4.20 (Beta, Reasoning)100.0%$0.00724.3s100%
15Z.AI GLM 5100.0%$0.003221.5s100%
16MoonshotAI: Kimi K2.5100.0%$0.003323.3s100%
17Aion 2.0100.0%$0.002824.9s100%
18GPT-5.4 (Reasoning)100.0%$0.006218.8s100%
19Z.AI GLM 4.6100.0%$0.002334.1s100%
20GPT-4o, May 13th (temp=0)100.0%$0.0137.2s100%
21Qwen 3.5 35B100.0%$0.007823.2s100%
22Gemini 2.5 Pro100.0%$0.0139.0s100%
23Qwen 3.5 122B100.0%$0.009121.8s100%
24Claude Sonnet 4.6 (Reasoning)100.0%$0.0149.3s100%
25Gemini 3 Pro (Preview)100.0%$0.0158.7s100%
26GPT-5.4 Nano (Reasoning, Low)95.0%$0.00073.2s70%
27Gemini 2.5 Flash Lite (Reasoning)95.0%$0.00095.1s70%
28Gemini 3.1 Pro (Preview)100.0%$0.01613.7s100%
29Llama 3.1 Nemotron 70B95.0%$0.000614.0s70%
30Claude Opus 4.6 (Reasoning)100.0%$0.01910.4s100%
31DeepSeek-V2 Chat95.0%$0.000317.9s70%
32Qwen 3.5 397B A17B100.0%$0.007748.3s100%
33Qwen 3.5 27B100.0%$0.006259.6s100%
34o4 Mini High95.0%$0.005413.6s70%
35MiniMax M2.790.0%$0.000911.7s60%
36GPT-5 Mini90.0%$0.001611.1s60%
37GPT-595.0%$0.009314.9s70%
38Mistral Large85.0%$0.00372.0s54%
39GPT-5.4 Mini (Reasoning)85.0%$0.00325.8s54%
40ByteDance Seed 2.0 Lite95.0%$0.003941.7s70%
41Stealth: Hunter Alpha80.0%$0.00007.5s51%
42MiniMax M2.585.0%$0.001016.4s54%
43Mistral Large 390.0%$0.00124.8s40%
44Nemotron 3 Super80.0%$0.000013.6s51%
45Gemini 3.1 Flash Lite (Preview)80.0%$0.00101.7s34%
46Z.AI GLM 4.790.0%$0.002951.1s60%
47GPT-4.185.0%$0.00467.3s36%
48Gemini 2.5 Flash (Reasoning)80.0%$0.00243.9s34%
49Gemma 3 12B75.0%$0.00029.5s38%
50GPT-4o, Aug. 6th (temp=0)75.0%$0.00371.1s38%
51GPT-5.185.0%$0.00569.0s36%
52o4 Mini85.0%$0.004911.6s36%
53Qwen 3.5 9B95.0%$0.00111.5m70%
54ByteDance Seed 1.680.0%$0.003434.0s51%
55GPT-4o Mini (temp=0)50.0%$0.00021.5s50%
56GPT-4o, Aug. 6th (temp=1)75.0%$0.00593.9s33%
57Stealth: Healer Alpha70.0%$0.00005.0s25%
58GPT-4.1 Nano65.0%$0.00014.1s27%
59GPT-4o Mini (temp=1)60.0%$0.00031.9s30%
60Inception Mercury 255.0%$0.0005836ms35%
61Mistral Small 4 (Reasoning)70.0%$0.001310.5s25%
62Qwen 2.5 72B70.0%$0.000814.9s25%
63Mistral Small 465.0%$0.00032.8s18%
64Qwen 3 32B75.0%$0.001036.8s33%
65Claude Sonnet 4.680.0%$0.00873.6s20%
66Gemini 2.5 Flash60.0%$0.0005886ms20%
67GPT-4.1 Mini65.0%$0.00075.8s16%
68Llama 3.1 70B70.0%$0.001217.6s20%
69GPT-5 Nano70.0%$0.000926.9s26%
70Ministral 3 8B50.0%$0.00032.3s28%
71Claude Sonnet 470.0%$0.00995.6s25%
72Claude Haiku 4.565.0%$0.00323.1s10%
73Z.AI GLM 4.7 Flash70.0%$0.001039.3s25%
74Mistral Large 250.0%$0.00487.9s28%
75ByteDance Seed 2.0 Mini95.0%$0.00242.3m70%
76Claude Opus 495.0%$0.04711.4s70%
77Inception Mercury40.0%$0.00011.2s20%
78Ministral 3 3B40.0%$0.00021.5s20%
79WizardLM 2 8x22b55.0%$0.001911.7s15%
80Mistral Small Creative40.0%$0.00033.6s20%
81Arcee AI: Trinity Mini35.0%$0.00026.7s27%
82Gemini 3 Flash (Preview)40.0%$0.00131.7s20%
83Grok 4.1 Fast35.0%$0.00067.0s27%
84DeepSeek V3.155.0%$0.000513.6s8%
85Stealth: Aurora Alpha60.0%937ms30%
86GPT-5.4 Nano45.0%$0.00061.9s8%
87DeepSeek V3 (2024-12-26)55.0%$0.000810.1s4%
88DeepSeek V3.250.0%$0.000513.5s11%
89Hermes 3 70B55.0%$0.000610.8s4%
90Ministral 3B45.0%$0.00013.9s8%
91Grok 4.20 (Beta)55.0%$0.002313.5s8%
92GPT-4o, May 13th (temp=1)65.0%$0.0147.2s16%
93Cohere Command R+ (Aug. 2024)40.0%$0.00462.4s20%
94Ministral 3 14B35.0%$0.00045.0s18%
95Hermes 3 405B30.0%$0.000011.9s26%
96GPT-5.4 Mini (Reasoning, Low)35.0%$0.00213.2s18%
97Grok 470.0%$0.01720.2s25%
98LFM2 24B35.0%$0.00019.9s18%
99Gemma 3 27B40.0%$0.00029.7s13%
100Rocinante 12B35.0%$0.000513.5s18%
101Claude Opus 4.555.0%$0.0164.9s23%
102Llama 3.1 8B35.0%$0.00022.6s5%
103Nemotron 3 Nano35.0%$0.000532.9s27%
104Mistral NeMO35.0%$0.00033.5s5%
105GPT-5.4 Mini25.0%$0.0013851ms13%
106Claude Sonnet 4.550.0%$0.00944.5s5%
107Claude 3 Haiku35.0%$0.00061.6s0%
108GPT-5.435.0%$0.00352.7s5%
109Arcee AI: Trinity Large (Preview)25.0%$0.00002.1s0%
110Grok 4 Fast25.0%$0.00053.8s0%
111Gemma 3 4B20.0%$0.00013.5s0%
112DeepSeek V3 (2025-03-24)25.0%$0.00089.5s0%
113Claude Opus 4.640.0%$0.0125.5s0%
114Ministral 8B15.0%$0.00028.4s0%
115Claude 3.7 Sonnet20.0%$0.00623.5s0%
116Mistral Medium 3.110.0%$0.00135.3s0%
117Mistral Small 3.2 24B5.0%$0.00023.0s0%
118Claude 3.5 Sonnet10.0%$0.00625.2s0%
69.19%

Individual Scenarios

pov-count

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Claude Opus 4.6 (Reasoning)100100100100100100100100100100100.0%
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 5 Turbo100100100100100100100100100100100.0%
Claude Sonnet 4.6 (Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
Qwen 3.5 122B100100100100100100100100100100100.0%
Grok 4.20 (Beta, Reasoning)100100100100100100100100100100100.0%
GPT-5.4 (Reasoning, Low)100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Qwen 3.5 27B100100100100100100100100100100100.0%
Gemini 3 Flash (Preview, Reasoning)100100100100100100100100100100100.0%
GPT-5.2100100100100100100100100100100100.0%
Aion 2.0100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Qwen 3.5 35B100100100100100100100100100100100.0%
Qwen 3.5 Flash100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
GPT-5.4 Nano (Reasoning)100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Qwen3 235B A22B Instruct 2507100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
GPT-51001001001001001001001001005095.0%
o4 Mini High1001001001001001001001001005095.0%
Claude Opus 41001001001001001001001001005095.0%
ByteDance Seed 2.0 Mini1001001001001001001001001005095.0%
Qwen 3.5 9B1001001001001001001001001005095.0%
Gemini 2.5 Flash Lite (Reasoning)1001001001001001001001001005095.0%
DeepSeek-V2 Chat1001001001001001001001001005095.0%
ByteDance Seed 2.0 Lite1001001001001001001001001005095.0%
GPT-5.4 Nano (Reasoning, Low)1001001001001001001001001005095.0%
Llama 3.1 Nemotron 70B1001001001001001001001001005095.0%
GPT-5 Mini100100100100100100100100505090.0%
MiniMax M2.7100100100100100100100100505090.0%
Z.AI GLM 4.7100100100100100100100100505090.0%
Mistral Large 3100100100100100100100100100090.0%
GPT-5.110010010010010010010010050085.0%
GPT-5.4 Mini (Reasoning)10010010010010010010050505085.0%
MiniMax M2.510010010010010010010050505085.0%
GPT-4.110010010010010010010010050085.0%
o4 Mini10010010010010010010010050085.0%
Mistral Large10010010010010010010050505085.0%
Claude Sonnet 4.61001001001001001001001000080.0%
ByteDance Seed 1.61001001001001001005050505080.0%
Stealth: Hunter Alpha1001001001001001005050505080.0%
Gemini 2.5 Flash (Reasoning)1001001001001001001005050080.0%
Gemini 3.1 Flash Lite (Preview)1001001001001001001005050080.0%
Nemotron 3 Super1001001001001001005050505080.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100505050075.0%
GPT-4o, Aug. 6th (temp=0)100100100100100505050505075.0%
Qwen 3 32B100100100100100100505050075.0%
Gemma 3 12B100100100100100505050505075.0%
Claude Sonnet 410010010010010050505050070.0%
Grok 410010010010010050505050070.0%
Stealth: Healer Alpha10010010010010050505050070.0%
Z.AI GLM 4.7 Flash10010010010010050505050070.0%
GPT-5 Nano10010010010050505050505070.0%
Mistral Small 4 (Reasoning)10010010010010050505050070.0%
Llama 3.1 70B10010010010010010050500070.0%
Qwen 2.5 72B10010010010010050505050070.0%
Claude Haiku 4.51001001001001001005000065.0%
GPT-4o, May 13th (temp=1)1001001001001005050500065.0%
GPT-4.1 Mini1001001001001005050500065.0%
Mistral Small 41001001001005050505050065.0%
GPT-4.1 Nano1001001005050505050505065.0%
Stealth: Aurora Alpha100100505050505050505060.0%
Gemini 2.5 Flash100100100505050505050060.0%
GPT-4o Mini (temp=1)100100505050505050505060.0%
Claude Opus 4.510010050505050505050055.0%
Grok 4.20 (Beta)10010010010050505000055.0%
Inception Mercury 210050505050505050505055.0%
DeepSeek V3 (2024-12-26)10010010010010050000055.0%
DeepSeek V3.110010010010050505000055.0%
Hermes 3 70B10010010010010050000055.0%
WizardLM 2 8x22b10010010050505050500055.0%
Claude Sonnet 4.51001001001005050000050.0%
Mistral Large 21005050505050505050050.0%
DeepSeek V3.21001001005050505000050.0%
GPT-4o Mini (temp=0)5050505050505050505050.0%
Ministral 3 8B1005050505050505050050.0%
GPT-5.4 Nano100100100505050000045.0%
Ministral 3B100100100505050000045.0%
Claude Opus 4.610010010010000000040.0%
Gemini 3 Flash (Preview)10050505050505000040.0%
Inception Mercury10050505050505000040.0%
Gemma 3 27B10010050505050000040.0%
Mistral Small Creative10050505050505000040.0%
Cohere Command R+ (Aug. 2024)10050505050505000040.0%
Ministral 3 3B10050505050505000040.0%
Grok 4.1 Fast5050505050505000035.0%
GPT-5.4 Mini (Reasoning, Low)1005050505050000035.0%
GPT-5.41001005050500000035.0%
Nemotron 3 Nano5050505050505000035.0%
Ministral 3 14B1005050505050000035.0%
Claude 3 Haiku1001001005000000035.0%
Arcee AI: Trinity Mini5050505050505000035.0%
Mistral NeMO1001005050500000035.0%
Llama 3.1 8B1001005050500000035.0%
LFM2 24B1005050505050000035.0%
Rocinante 12B1005050505050000035.0%
Hermes 3 405B505050505050000030.0%
Grok 4 Fast10050505000000025.0%
GPT-5.4 Mini50505050500000025.0%
DeepSeek V3 (2025-03-24)10050505000000025.0%
Arcee AI: Trinity Large (Preview)10050505000000025.0%
Claude 3.7 Sonnet1005050000000020.0%
Gemma 3 4B5050505000000020.0%
Ministral 8B100500000000015.0%
Claude 3.5 Sonnet50500000000010.0%
Mistral Medium 3.150500000000010.0%
Mistral Small 3.2 24B500000000005.0%