Either/Or composite

Test: Novel outline

Avg. Score
62.0%
Scenarios
1

Overall Performance

Rank ▲ Model Score Avg. Cost Avg. Time Stability
1Gemini 2.5 Flash Lite100.0%$0.00031.0s100%
2Claude 3.5 Haiku100.0%$0.00254.9s100%
3Z.AI GLM 4.5100.0%$0.00098.1s100%
4ByteDance Seed 1.6 Flash100.0%$0.001015.0s100%
5Writer: Palmyra X5100.0%$0.003912.4s100%
6Qwen 3.5 Plus (2026-02-15)100.0%$0.002116.3s100%
7GPT-4o, May 13th (temp=0)100.0%$0.0137.2s100%
8Gemini 2.5 Pro100.0%$0.0139.0s100%
9Gemini 3 Pro (Preview)100.0%$0.0158.7s100%
10Z.AI GLM 5100.0%$0.003221.5s100%
11MoonshotAI: Kimi K2.5100.0%$0.003323.3s100%
12Gemini 3.1 Pro (Preview)100.0%$0.01613.7s100%
13Llama 3.1 Nemotron 70B95.0%$0.000614.0s70%
14DeepSeek-V2 Chat95.0%$0.000317.9s70%
15Z.AI GLM 4.6100.0%$0.002334.1s100%
16Mistral Large85.0%$0.00372.0s54%
17GPT-5 Mini90.0%$0.001611.1s60%
18Mistral Large 390.0%$0.00124.8s40%
19Minimax M2.585.0%$0.001016.4s54%
20GPT-4o, Aug. 6th (temp=0)75.0%$0.00371.1s38%
21GPT-4.185.0%$0.00467.3s36%
22GPT-4o Mini (temp=0)50.0%$0.00021.5s50%
23Gemma 3 12B75.0%$0.00029.5s38%
24GPT-5.185.0%$0.00569.0s36%
25Qwen 3.5 397B A17B100.0%$0.007748.3s100%
26GPT-4o, Aug. 6th (temp=1)75.0%$0.00593.9s33%
27GPT-4o Mini (temp=1)60.0%$0.00031.9s30%
28GPT-4.1 Nano65.0%$0.00014.1s27%
29Claude Sonnet 4.680.0%$0.00873.6s20%
30GPT-585.0%$0.009314.9s36%
31Ministral 3 8B50.0%$0.00032.3s28%
32GPT-4.1 Mini65.0%$0.00075.8s16%
33Gemini 2.5 Flash55.0%$0.0005886ms15%
34Qwen 2.5 72B70.0%$0.000814.9s25%
35Claude Sonnet 470.0%$0.00995.6s25%
36o4 Mini70.0%$0.004911.6s25%
37Claude Haiku 4.565.0%$0.00323.1s10%
38o4 Mini High70.0%$0.005413.6s25%
39Ministral 3 3B40.0%$0.00021.5s20%
40Gemini 3 Flash (Preview)40.0%$0.00131.7s20%
41Mistral Large 250.0%$0.00487.9s28%
42Mistral Small Creative40.0%$0.00033.6s20%
43Grok 4.1 Fast35.0%$0.00067.0s27%
44Cohere Command R+ (Aug. 2024)40.0%$0.00462.4s20%
45Claude Opus 495.0%$0.04711.4s70%
46Z.AI GLM 4.790.0%$0.002951.1s60%
47Ministral 3B40.0%$0.00013.9s13%
48WizardLM 2 8x22b55.0%$0.001911.7s15%
49Ministral 3 14B35.0%$0.00045.0s18%
50GPT-5 Nano70.0%$0.000926.9s26%
51Arcee AI: Trinity Mini30.0%$0.00026.7s26%
52GPT-4o, May 13th (temp=1)65.0%$0.0147.2s16%
53ByteDance Seed 1.675.0%$0.003434.0s38%
54Llama 3.1 70B65.0%$0.001217.6s10%
55DeepSeek V3 (2024-12-26)55.0%$0.000810.1s4%
56Hermes 3 70B55.0%$0.000610.8s4%
57DeepSeek V3.155.0%$0.000513.6s8%
58Claude Opus 4.555.0%$0.0164.9s23%
59Stealth: Aurora Alpha50.0%937ms28%
60Llama 3.1 8B35.0%$0.00022.6s5%
61Gemma 3 27B40.0%$0.00029.7s13%
62Mistral NeMO35.0%$0.00033.5s5%
63Claude 3 Haiku35.0%$0.00061.6s0%
64Claude Sonnet 4.550.0%$0.00944.5s5%
65Rocinante 12B35.0%$0.000513.5s18%
66Arcee AI: Trinity Large (Preview)25.0%$0.00002.1s0%
67Z.AI GLM 4.7 Flash70.0%$0.001039.3s25%
68Grok 4 Fast25.0%$0.00053.8s0%
69DeepSeek V3.240.0%$0.000513.5s3%
70Hermes 3 405B25.0%$0.000011.9s13%
71Gemma 3 4B15.0%$0.00013.5s0%
72DeepSeek V3 (2025-03-24)25.0%$0.00089.5s0%
73Claude Opus 4.640.0%$0.0125.5s0%
74Claude 3.7 Sonnet20.0%$0.00623.5s0%
75Mistral Small 3.2 24B5.0%$0.00023.0s0%
76Grok 460.0%$0.01720.2s13%
77Ministral 8B15.0%$0.00028.4s0%
78Mistral Medium 3.110.0%$0.00135.3s0%
79GPT-5.220.0%$0.00556.4s0%
80Claude 3.5 Sonnet10.0%$0.00625.2s0%
62.00%

Individual Scenarios

pov-count

Model # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # 9 # 10 Avg ▼
Gemini 3.1 Pro (Preview)100100100100100100100100100100100.0%
Qwen 3.5 397B A17B100100100100100100100100100100100.0%
MoonshotAI: Kimi K2.5100100100100100100100100100100100.0%
Z.AI GLM 5100100100100100100100100100100100.0%
Gemini 2.5 Pro100100100100100100100100100100100.0%
Gemini 3 Pro (Preview)100100100100100100100100100100100.0%
Z.AI GLM 4.6100100100100100100100100100100100.0%
Qwen 3.5 Plus (2026-02-15)100100100100100100100100100100100.0%
GPT-4o, May 13th (temp=0)100100100100100100100100100100100.0%
Z.AI GLM 4.5100100100100100100100100100100100.0%
Claude 3.5 Haiku100100100100100100100100100100100.0%
ByteDance Seed 1.6 Flash100100100100100100100100100100100.0%
Writer: Palmyra X5100100100100100100100100100100100.0%
Gemini 2.5 Flash Lite100100100100100100100100100100100.0%
Claude Opus 41001001001001001001001001005095.0%
DeepSeek-V2 Chat1001001001001001001001001005095.0%
Llama 3.1 Nemotron 70B1001001001001001001001001005095.0%
GPT-5 Mini100100100100100100100100505090.0%
Z.AI GLM 4.7100100100100100100100100505090.0%
Mistral Large 3100100100100100100100100100090.0%
GPT-5.110010010010010010010010050085.0%
GPT-510010010010010010010010050085.0%
Minimax M2.510010010010010010010050505085.0%
GPT-4.110010010010010010010010050085.0%
Mistral Large10010010010010010010050505085.0%
Claude Sonnet 4.61001001001001001001001000080.0%
ByteDance Seed 1.6100100100100100505050505075.0%
GPT-4o, Aug. 6th (temp=1)100100100100100100505050075.0%
GPT-4o, Aug. 6th (temp=0)100100100100100505050505075.0%
Gemma 3 12B100100100100100505050505075.0%
o4 Mini High10010010010010050505050070.0%
o4 Mini10010010010010050505050070.0%
Claude Sonnet 410010010010010050505050070.0%
Z.AI GLM 4.7 Flash10010010010010050505050070.0%
GPT-5 Nano10010010010050505050505070.0%
Qwen 2.5 72B10010010010010050505050070.0%
Claude Haiku 4.51001001001001001005000065.0%
GPT-4o, May 13th (temp=1)1001001001001005050500065.0%
GPT-4.1 Mini1001001001001005050500065.0%
Llama 3.1 70B1001001001001001005000065.0%
GPT-4.1 Nano1001001005050505050505065.0%
Grok 4100100100100505050500060.0%
GPT-4o Mini (temp=1)100100505050505050505060.0%
Claude Opus 4.510010050505050505050055.0%
DeepSeek V3 (2024-12-26)10010010010010050000055.0%
DeepSeek V3.110010010010050505000055.0%
Gemini 2.5 Flash10010010050505050500055.0%
Hermes 3 70B10010010010010050000055.0%
WizardLM 2 8x22b10010010050505050500055.0%
Claude Sonnet 4.51001001001005050000050.0%
Stealth: Aurora Alpha1005050505050505050050.0%
Mistral Large 21005050505050505050050.0%
GPT-4o Mini (temp=0)5050505050505050505050.0%
Ministral 3 8B1005050505050505050050.0%
Claude Opus 4.610010010010000000040.0%
Gemini 3 Flash (Preview)10050505050505000040.0%
DeepSeek V3.210010010050500000040.0%
Gemma 3 27B10010050505050000040.0%
Mistral Small Creative10050505050505000040.0%
Ministral 3 3B10050505050505000040.0%
Cohere Command R+ (Aug. 2024)10050505050505000040.0%
Ministral 3B10010050505050000040.0%
Grok 4.1 Fast5050505050505000035.0%
Ministral 3 14B1005050505050000035.0%
Claude 3 Haiku1001001005000000035.0%
Llama 3.1 8B1001005050500000035.0%
Mistral NeMO1001005050500000035.0%
Rocinante 12B1005050505050000035.0%
Arcee AI: Trinity Mini505050505050000030.0%
DeepSeek V3 (2025-03-24)10050505000000025.0%
Grok 4 Fast10050505000000025.0%
Hermes 3 405B50505050500000025.0%
Arcee AI: Trinity Large (Preview)10050505000000025.0%
GPT-5.21001000000000020.0%
Claude 3.7 Sonnet1005050000000020.0%
Gemma 3 4B505050000000015.0%
Ministral 8B100500000000015.0%
Claude 3.5 Sonnet50500000000010.0%
Mistral Medium 3.150500000000010.0%
Mistral Small 3.2 24B500000000005.0%