Dialogue tags

Various tasks related to dialogue tags in text.

Model Total â–¼Write unattributed dialogueWrite 200 words with 10% dialogueWrite 200 words with 50% dialogueWrite 200 words with 90% dialogueWrite 500 words with 30% dialogueWrite 500 words with 50% dialogueWrite 500 words with 70% dialogue
o4 Mini High78%96%99%85%74%79%57%58%
Claude Opus 4.671%100%80%79%96%34%71%39%
Claude Opus 4.569%100%96%68%83%59%45%28%
o4 Mini66%67%96%88%55%72%65%19%
MoonshotAI: Kimi K2.562%100%94%83%76%33%34%13%
GPT-4o, Aug. 6th (temp=0)57%100%47%82%68%30%15%56%
Claude Opus 454%100%81%34%79%29%29%27%
GPT-4o Mini (temp=0)54%100%49%45%98%24%27%32%
Z.AI GLM 4.753%100%84%75%63%27%17%8%
GPT-4 Turbo52%96%77%67%61%9%20%31%
Gemini 2.5 Pro50%100%76%78%46%25%16%12%
GPT-4o, Aug. 6th (temp=1)49%96%50%62%64%9%19%46%
Claude Haiku 4.549%92%79%53%73%22%18%10%
Gemini 3 Flash (Preview)49%100%71%43%60%27%11%29%
Claude 3.7 Sonnet45%100%20%18%64%33%30%53%
1–15 of 93
Page 1 / 7

Write unattributed dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
Claude 3.5 Sonnet100%100%100%100%100%100%100%100%100%100%100%
Claude 3.7 Sonnet100%100%100%100%100%100%100%100%100%100%100%
Claude 3 Haiku100%100%100%100%100%100%100%100%100%100%100%
Hermes 3 405B100%100%100%100%100%100%100%100%100%100%100%
Mistral Large100%100%100%100%100%100%100%100%100%100%100%
Mistral Large 2100%100%100%100%100%100%100%100%100%100%100%
GPT-4o, May 13th (temp=0)100%100%100%100%100%100%100%100%100%100%100%
GPT-4o, May 13th (temp=1)100%100%100%100%100%100%100%100%100%100%100%
GPT-4o, Aug. 6th (temp=0)100%100%100%100%100%100%100%100%100%100%100%
DeepSeek-V2 Chat100%100%100%100%100%100%100%100%100%100%100%
Claude 2.0100%100%100%100%100%100%100%100%100%100%100%
Claude 2.1100%100%100%100%100%100%100%100%100%100%100%
Claude 3.0 Sonnet100%100%100%100%100%100%100%100%100%100%100%
GPT-4o Mini (temp=0)100%100%100%100%100%100%100%100%100%100%100%
GPT-4o Mini (temp=1)100%100%100%100%100%100%100%100%100%100%100%
Llama 3 70B100%100%100%100%100%100%100%100%100%100%100%
AI21 Jamba 1.5 Large100%100%100%100%100%100%100%100%100%100%100%
Inflection 3 (Productivity)100%100%100%100%100%100%100%100%100%100%100%
Llama 3 TenyxChat-DaybreakStorywriter 70B100%100%100%100%100%100%100%100%100%100%100%
Claude 3.5 Sonnet (new)100%100%100%100%100%100%100%100%100%100%100%
Claude 3.5 Haiku100%100%100%100%100%100%100%100%100%100%100%
GPT-4.1100%100%100%100%100%100%100%100%100%100%100%
Claude Sonnet 4100%100%100%100%100%100%100%100%100%100%100%
Claude Opus 4100%100%100%100%100%100%100%100%100%100%100%
Gemini 2.5 Pro100%100%100%100%100%100%100%100%100%100%100%
Gemini 3 Pro (Preview)100%100%100%100%100%100%100%100%100%100%100%
Gemini 3 Flash (Preview)100%100%100%100%100%100%100%100%100%100%100%
Claude Sonnet 4.5100%100%100%100%100%100%100%100%100%100%100%
Claude Opus 4.5100%100%100%100%100%100%100%100%100%100%100%
Claude Opus 4.6100%100%100%100%100%100%100%100%100%100%100%
Z.AI GLM 4.5100%100%100%100%100%100%100%100%100%100%100%
Z.AI GLM 4.6100%100%100%100%100%100%100%100%100%100%100%
Z.AI GLM 4.7100%100%100%100%100%100%100%100%100%100%100%
Z.AI GLM 4.7 Flash100%100%100%100%100%100%100%100%100%100%100%
Writer: Palmyra X5100%100%100%100%100%100%100%100%100%100%100%
MoonshotAI: Kimi K2.5100%100%100%100%100%100%100%100%100%100%100%
Cohere Command R+ (Apr. 2024)100%100%100%100%100%100%100%100%100%61%96%
GPT-4o, Aug. 6th (temp=1)100%100%100%100%100%100%100%100%100%61%96%
GPT-4 Turbo100%100%100%100%100%100%100%100%100%61%96%
Llama 3.1 70B100%100%100%100%100%100%100%100%100%61%96%
Llama 3.1 Nemotron 70B100%100%100%100%100%100%100%100%100%61%96%
o4 Mini High100%100%100%100%100%100%100%100%100%61%96%
Gemini Pro 1.5100%100%100%100%100%100%100%100%61%61%92%
Llama 3.2 90B (Vision)100%100%100%100%100%100%100%100%61%61%92%
Gemini 2.5 Flash Lite100%100%100%100%100%100%100%100%61%61%92%
Claude Haiku 4.5100%100%100%100%100%100%100%100%61%61%92%
Llama 3.1 405B100%100%100%100%100%100%100%100%100%14%91%
Cohere Command R+ (Aug. 2024)100%100%100%100%100%100%100%100%61%1%86%
Sao10K L3.1 70B Hanami x1100%100%100%100%100%100%61%61%61%61%84%
Mistral Small Creative100%100%100%100%100%100%100%61%61%14%83%
GPT-4.1 Mini100%100%100%100%100%100%100%100%14%14%83%
Inflection 3 (PI)100%100%100%100%100%100%100%100%1%0%80%
Gemini 2.5 Flash100%100%100%100%100%100%61%61%61%1%78%
Llama 3.1 Euryale 70B v2.2100%100%100%100%100%100%100%61%0%0%76%
Mistral NeMO100%100%100%100%100%61%61%61%61%14%76%
Hermes 3 70B100%100%100%100%100%100%61%61%14%0%73%
o4 Mini100%100%100%100%100%100%61%14%0%0%67%
Magnum v2 72B100%100%100%100%100%61%61%14%14%14%66%
Magnum 72B100%100%100%100%100%61%61%14%0%0%63%
GPT-4.1 Nano100%100%100%100%100%61%14%14%14%1%60%
Gemma 2 9B100%100%100%100%61%61%61%14%0%0%60%
Qwen 2.5 72B100%100%100%100%61%14%14%14%14%14%53%
Llama 3 Euryale 70B v2.1100%100%100%61%61%61%14%1%0%0%50%
Lumimaid v0.2 8B100%100%100%61%61%61%1%1%0%0%48%
Gemma 2 27B100%100%100%61%61%14%14%14%1%0%46%
Qwen 2 72B100%100%100%61%61%14%14%1%1%1%45%
Llama 3.2 3B100%100%100%61%61%14%14%1%0%0%45%
Goliath 120B100%100%100%100%14%14%1%0%0%0%43%
Rocinante 12B100%100%100%61%61%1%1%1%0%0%42%
MythoMist 7B100%61%61%61%61%14%14%1%0%0%37%
Llama 3.2 11B (Vision)100%61%61%61%61%14%1%1%1%0%36%
Fimbulvetr 11B v2100%61%61%61%14%14%14%14%1%0%34%
EVA Qwen 2.5 14B100%100%61%61%14%0%0%0%0%0%33%
Phi-3 Medium 128k100%100%61%14%14%14%1%1%0%0%30%
Llama 3.1 8B100%61%61%61%14%1%1%1%0%0%30%
Phi-3.5 Mini 128k100%61%61%14%14%14%14%14%1%1%29%
Ministral 3B100%61%61%14%1%1%1%0%0%0%24%
Mistral Nemo 12B Celeste100%100%14%1%1%0%0%0%0%0%22%
Mistral Medium100%100%14%1%0%0%0%0%0%0%21%
lzlv 70B100%100%14%0%0%0%0%0%0%0%21%
Llama 3.2 1B100%61%14%14%1%1%1%1%0%0%19%
Hermes 2 Theta 8B100%61%14%14%1%1%0%0%0%0%19%
AI21 Jamba 1.5 Mini61%61%14%14%14%1%1%0%0%0%16%
Gemini Flash 1.561%61%14%14%1%1%0%0%0%0%15%
Ministral 8B61%61%14%14%1%0%0%0%0%0%15%
MythoMax 13B100%1%1%1%1%1%1%0%0%0%11%
Phi-3 Mini 128k61%14%14%14%1%1%1%0%0%0%10%
Qwen 2 7B61%14%14%0%0%0%0%0%0%0%9%
WizardLM 2 8x22b61%14%1%1%0%0%0%0%0%0%8%
Liquid: LFM 40B MoE61%1%1%1%0%0%0%0%0%0%6%
AI21 Jamba14%14%14%1%1%1%0%0%0%0%4%
MN GRAND Gutenberg Lyra4 12B Madness14%14%1%1%1%1%0%0%0%0%3%
Toppy M 7B14%14%1%1%0%0%0%0%0%0%3%
69.92%

Write 200 words with 10% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini High100%100%100%100%100%100%100%100%98%96%99%
Claude Opus 4.5100%100%100%100%100%100%97%97%89%82%96%
o4 Mini100%100%100%100%100%100%100%100%99%66%96%
MoonshotAI: Kimi K2.5100%100%100%100%100%100%99%97%84%65%94%
Claude Sonnet 4100%97%97%96%96%95%94%88%84%60%91%
Claude Sonnet 4.5100%98%97%96%94%85%84%80%76%68%88%
Z.AI GLM 4.7100%100%100%99%99%98%84%64%50%50%84%
Claude Opus 4100%97%96%95%85%80%76%70%67%49%81%
Claude Opus 4.6100%100%99%98%90%90%68%58%54%45%80%
Claude Haiku 4.5100%100%99%97%97%94%73%50%41%38%79%
Claude 3.5 Haiku100%95%91%87%83%81%73%72%58%44%78%
GPT-4 Turbo100%100%95%95%94%80%58%52%50%47%77%
Gemini 2.5 Pro100%99%98%98%80%72%57%56%50%50%76%
Gemini 3 Flash (Preview)98%78%68%68%68%68%66%65%65%65%71%
Gemini 3 Pro (Preview)100%100%97%95%72%68%68%64%26%19%71%
Llama 3.2 90B (Vision)99%93%62%60%50%50%50%49%47%40%60%
Llama 3.1 405B95%79%77%54%50%50%49%48%38%10%55%
GPT-4o, Aug. 6th (temp=1)66%52%50%50%50%50%49%47%43%43%50%
GPT-4o Mini (temp=0)50%50%50%50%50%50%50%50%49%45%49%
GPT-4o Mini (temp=1)50%50%50%50%50%50%49%48%45%43%49%
Claude 3.0 Sonnet50%50%50%49%49%49%49%47%47%45%48%
Z.AI GLM 4.7 Flash80%59%50%50%50%49%48%47%45%0%48%
GPT-4.150%50%50%50%50%50%50%49%43%34%48%
GPT-4o, Aug. 6th (temp=0)50%50%50%50%50%50%50%49%48%26%47%
Z.AI GLM 4.699%95%60%50%50%50%36%11%0%0%45%
Claude 3.5 Sonnet97%71%52%51%50%45%44%5%5%0%42%
GPT-4.1 Nano50%50%49%49%48%45%43%34%26%22%42%
Z.AI GLM 4.588%88%65%49%48%48%30%0%0%0%42%
Llama 3.1 70B50%50%50%49%47%43%41%38%34%14%41%
Llama 3 70B50%49%48%47%45%41%38%34%30%0%38%
Claude 3 Haiku49%49%49%49%49%47%45%26%7%0%37%
GPT-4.1 Mini50%50%50%47%47%45%45%14%14%0%36%
Claude 3.5 Sonnet (new)63%45%45%45%30%30%30%27%14%11%34%
Mistral Large88%59%55%52%50%22%8%7%0%0%34%
Goliath 120B59%50%47%39%36%34%26%18%10%7%33%
Gemini Pro 1.550%49%49%49%46%42%28%10%0%0%32%
Mistral Nemo 12B Celeste50%49%49%48%44%43%34%0%0%0%32%
Gemini 2.5 Flash50%50%49%45%34%30%30%26%0%0%32%
GPT-4o, May 13th (temp=1)50%50%49%48%41%41%33%3%0%0%31%
Llama 3.1 8B52%50%50%49%49%41%18%1%0%0%31%
Gemma 2 27B86%50%50%50%48%0%0%0%0%0%28%
AI21 Jamba 1.5 Mini65%50%50%49%48%18%3%0%0%0%28%
Claude 2.092%50%45%38%26%25%3%2%2%0%28%
Hermes 3 405B50%50%45%41%41%34%10%5%0%0%28%
Sao10K L3.1 70B Hanami x150%50%49%47%41%22%1%0%0%0%26%
MythoMist 7B50%49%48%41%41%23%6%0%0%0%26%
Phi-3 Mini 128k82%49%43%41%30%10%2%1%0%0%26%
Gemini 2.5 Flash Lite50%50%50%49%48%0%0%0%0%0%25%
Llama 3 TenyxChat-DaybreakStorywriter 70B69%50%43%41%34%5%1%1%0%0%24%
AI21 Jamba50%50%49%48%26%14%3%0%0%0%24%
Llama 3.2 11B (Vision)50%50%43%38%31%22%1%1%0%0%23%
Writer: Palmyra X550%49%49%47%36%1%1%0%0%0%23%
Lumimaid v0.2 8B85%49%45%45%5%0%0%0%0%0%23%
Gemma 2 9B50%50%47%26%22%14%9%3%0%0%22%
DeepSeek-V2 Chat50%49%47%26%26%18%5%0%0%0%22%
Hermes 3 70B75%50%49%38%7%1%0%0%0%0%22%
GPT-4o, May 13th (temp=0)50%49%49%41%22%7%0%0%0%0%22%
Rocinante 12B50%49%49%45%22%0%0%0%0%0%22%
Claude 2.149%18%18%18%18%18%18%18%18%18%21%
Qwen 2.5 72B50%50%45%43%22%0%0%0%0%0%21%
Llama 3.2 3B50%49%49%45%10%3%0%0%0%0%21%
Inflection 3 (Productivity)52%48%43%41%10%7%4%0%0%0%20%
Claude 3.7 Sonnet41%38%38%34%22%7%7%7%5%5%20%
Mistral NeMO69%50%50%34%0%0%0%0%0%0%20%
Cohere Command R+ (Apr. 2024)51%50%45%41%2%0%0%0%0%0%19%
Llama 3.2 1B96%40%18%14%5%3%1%0%0%0%18%
AI21 Jamba 1.5 Large49%49%43%10%7%5%3%1%0%0%17%
Liquid: LFM 40B MoE50%50%48%18%0%0%0%0%0%0%17%
Magnum 72B50%49%47%7%2%0%0%0%0%0%16%
Llama 3.1 Nemotron 70B50%47%34%18%5%0%0%0%0%0%15%
MythoMax 13B59%49%42%1%0%0%0%0%0%0%15%
Inflection 3 (PI)50%38%34%22%0%0%0%0%0%0%14%
EVA Qwen 2.5 14B45%38%22%19%18%2%0%0%0%0%14%
Qwen 2 72B59%48%34%1%1%0%0%0%0%0%14%
MN GRAND Gutenberg Lyra4 12B Madness50%50%38%1%0%0%0%0%0%0%14%
Mistral Medium51%45%18%14%7%2%0%0%0%0%14%
Phi-3 Medium 128k48%34%31%14%4%1%0%0%0%0%13%
Toppy M 7B48%37%28%0%0%0%0%0%0%0%11%
Ministral 3B45%18%18%18%3%0%0%0%0%0%10%
Llama 3 Euryale 70B v2.149%49%0%0%0%0%0%0%0%0%10%
Phi-3.5 Mini 128k49%45%1%0%0%0%0%0%0%0%10%
Ministral 8B49%18%0%0%0%0%0%0%0%0%7%
Qwen 2 7B50%1%0%0%0%0%0%0%0%0%5%
Llama 3.1 Euryale 70B v2.249%0%0%0%0%0%0%0%0%0%5%
Hermes 2 Theta 8B44%1%0%0%0%0%0%0%0%0%4%
Mistral Large 222%18%5%1%0%0%0%0%0%0%4%
Gemini Flash 1.531%1%0%0%0%0%0%0%0%0%3%
lzlv 70B13%8%5%0%0%0%0%0%0%0%3%
Cohere Command R+ (Aug. 2024)10%0%0%0%0%0%0%0%0%0%1%
Mistral Small Creative2%0%0%0%0%0%0%0%0%0%0%
Magnum v2 72B1%0%0%0%0%0%0%0%0%0%0%
WizardLM 2 8x22b0%0%0%0%0%0%0%0%0%0%0%
Fimbulvetr 11B v20%0%0%0%0%0%0%0%0%0%0%
33.74%

Write 200 words with 50% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini100%100%100%100%100%99%94%91%50%50%88%
o4 Mini High100%100%100%100%99%99%98%50%50%50%85%
MoonshotAI: Kimi K2.5100%100%93%93%92%87%76%72%64%50%83%
GPT-4o, Aug. 6th (temp=0)100%100%99%90%88%86%71%71%64%51%82%
Claude Opus 4.6100%100%100%100%93%93%51%51%50%49%79%
Gemini 2.5 Pro98%98%97%94%93%84%68%67%45%36%78%
Gemini 3 Pro (Preview)100%100%98%95%91%83%50%48%48%38%75%
Z.AI GLM 4.7100%100%100%99%95%94%57%50%48%5%75%
Z.AI GLM 4.6100%99%98%93%80%68%59%50%39%1%69%
Claude Opus 4.5100%96%94%79%69%50%50%49%49%47%68%
GPT-4o, May 13th (temp=0)100%97%83%82%81%72%64%50%41%1%67%
GPT-4 Turbo95%95%91%88%71%63%55%50%41%19%67%
GPT-4o, Aug. 6th (temp=1)100%95%83%50%50%50%50%50%49%48%62%
Mistral Large100%99%88%86%80%59%57%51%0%0%62%
Claude Haiku 4.596%88%76%70%68%48%41%30%16%0%53%
Hermes 3 405B96%85%82%58%55%55%41%26%26%2%53%
Llama 3.1 8B100%99%75%62%49%49%48%35%3%0%52%
GPT-4o Mini (temp=1)50%50%50%50%50%50%50%50%49%48%50%
Goliath 120B100%99%55%53%48%47%44%43%1%1%49%
Cohere Command R+ (Apr. 2024)85%84%83%50%50%44%37%36%7%0%47%
GPT-4.150%50%50%50%50%50%50%50%49%14%46%
Claude 3 Haiku74%71%61%51%49%41%41%38%22%14%46%
Hermes 3 70B100%64%62%54%52%47%42%25%10%2%46%
Llama 3.1 405B89%89%50%50%49%43%41%34%5%2%45%
GPT-4o Mini (temp=0)50%50%50%50%49%48%43%43%34%30%45%
Phi-3 Mini 128k100%48%48%47%46%44%36%34%27%13%44%
MythoMax 13B80%73%52%50%50%48%43%40%4%0%44%
DeepSeek-V2 Chat50%50%50%50%50%50%48%48%43%0%44%
AI21 Jamba 1.5 Large99%94%57%51%50%48%16%15%7%0%44%
Llama 3.1 70B93%66%51%49%47%43%35%22%18%14%44%
Gemini 3 Flash (Preview)50%50%50%50%49%48%47%34%30%25%43%
GPT-4.1 Mini83%50%49%49%48%47%45%43%18%0%43%
GPT-4o, May 13th (temp=1)88%50%50%50%49%49%45%44%0%0%43%
GPT-4.1 Nano50%50%50%50%49%49%47%45%30%1%42%
MythoMist 7B92%78%51%50%50%50%41%0%0%0%41%
Mistral Medium79%59%55%51%50%50%28%23%13%0%41%
Llama 3.2 3B95%52%51%48%47%43%24%22%22%0%40%
Llama 3.2 90B (Vision)67%57%50%50%49%38%34%34%18%0%40%
Liquid: LFM 40B MoE99%77%50%50%44%35%22%4%0%0%38%
Hermes 2 Theta 8B94%83%50%49%47%44%10%1%0%0%38%
Claude 3.5 Haiku87%50%50%48%40%38%19%8%7%3%35%
Claude Opus 450%49%49%48%47%45%34%18%3%1%34%
Cohere Command R+ (Aug. 2024)71%50%50%49%42%40%33%2%0%0%34%
Z.AI GLM 4.7 Flash82%64%53%50%47%28%0%0%0%0%32%
Ministral 3B74%50%50%48%46%26%18%0%0%0%31%
Llama 3 70B50%50%50%45%43%30%22%10%7%0%31%
Llama 3 Euryale 70B v2.150%50%49%46%44%32%27%9%0%0%31%
Sao10K L3.1 70B Hanami x150%50%50%49%48%41%14%2%1%0%30%
Inflection 3 (Productivity)79%66%50%49%47%12%0%0%0%0%30%
Claude 3.5 Sonnet (new)84%48%45%41%39%18%14%7%4%3%30%
Claude 2.052%50%49%41%40%33%18%14%4%0%30%
Magnum 72B71%51%50%49%47%30%0%0%0%0%30%
Llama 3.2 11B (Vision)50%49%47%45%42%34%22%6%2%0%30%
Llama 3.2 1B86%47%46%45%42%25%1%0%0%0%29%
AI21 Jamba65%50%49%37%26%22%21%13%6%0%29%
Phi-3 Medium 128k54%50%49%49%36%24%18%8%0%0%29%
Claude Sonnet 484%57%53%37%33%14%6%2%0%0%29%
Mistral Small Creative79%52%50%49%46%3%1%0%0%0%28%
Gemini 2.5 Flash Lite75%50%49%47%41%4%0%0%0%0%27%
Gemma 2 27B50%50%48%44%35%26%7%5%1%0%27%
Qwen 2.5 72B50%50%49%49%26%13%10%7%5%0%26%
Claude 3.0 Sonnet50%45%44%41%26%19%18%3%2%0%25%
MN GRAND Gutenberg Lyra4 12B Madness66%50%50%29%27%15%4%4%0%0%25%
AI21 Jamba 1.5 Mini49%49%45%45%34%18%1%1%0%0%24%
Claude Sonnet 4.569%51%46%22%18%10%8%6%5%3%24%
Qwen 2 72B59%57%50%42%11%9%1%0%0%0%23%
Toppy M 7B50%50%50%49%28%1%1%0%0%0%23%
Claude 3.5 Sonnet46%43%40%34%27%18%15%2%1%0%23%
Phi-3.5 Mini 128k93%50%47%33%2%0%0%0%0%0%22%
Llama 3 TenyxChat-DaybreakStorywriter 70B50%47%41%29%26%14%10%5%2%0%22%
Qwen 2 7B50%50%49%49%10%2%0%0%0%0%21%
Mistral Nemo 12B Celeste50%48%42%41%8%7%6%3%0%0%21%
Inflection 3 (PI)98%59%17%14%13%1%0%0%0%0%20%
Rocinante 12B47%44%40%25%23%17%0%0%0%0%20%
Lumimaid v0.2 8B91%50%50%0%0%0%0%0%0%0%19%
EVA Qwen 2.5 14B50%50%43%41%0%0%0%0%0%0%18%
Claude 3.7 Sonnet45%41%34%18%18%14%5%3%3%0%18%
lzlv 70B50%50%48%21%5%3%2%0%0%0%18%
Mistral NeMO53%50%40%19%1%0%0%0%0%0%16%
Llama 3.1 Nemotron 70B50%39%36%17%15%1%0%0%0%0%16%
Writer: Palmyra X550%47%38%16%1%1%0%0%0%0%15%
Gemini Pro 1.550%48%43%3%0%0%0%0%0%0%14%
Magnum v2 72B86%50%4%3%0%0%0%0%0%0%14%
Gemini 2.5 Flash50%50%22%18%1%1%0%0%0%0%14%
Gemma 2 9B50%45%30%7%0%0%0%0%0%0%13%
Ministral 8B49%48%16%8%3%2%0%0%0%0%13%
WizardLM 2 8x22b40%28%21%20%8%3%1%0%0%0%12%
Llama 3.1 Euryale 70B v2.262%33%18%0%0%0%0%0%0%0%11%
Gemini Flash 1.549%34%18%0%0%0%0%0%0%0%10%
Z.AI GLM 4.534%18%12%10%7%3%2%0%0%0%9%
Mistral Large 249%14%3%1%0%0%0%0%0%0%7%
Fimbulvetr 11B v250%9%0%0%0%0%0%0%0%0%6%
Claude 2.150%0%0%0%0%0%0%0%0%0%5%
36.25%

Write 200 words with 90% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
GPT-4o Mini (temp=0)100%100%100%100%100%100%99%98%97%90%98%
Claude Opus 4.6100%100%100%99%99%99%97%90%89%86%96%
Claude Opus 4.599%98%96%96%95%91%87%67%52%49%83%
Claude Opus 499%99%95%93%89%86%81%68%64%21%79%
MoonshotAI: Kimi K2.5100%97%95%87%84%81%60%53%51%50%76%
Gemini 2.5 Flash Lite95%88%88%88%84%76%67%67%50%49%75%
o4 Mini High99%97%89%82%77%75%68%53%50%49%74%
Claude Haiku 4.5100%98%93%87%87%60%55%52%50%44%73%
DeepSeek-V2 Chat99%92%92%87%83%72%50%50%49%49%72%
GPT-4o Mini (temp=1)95%68%68%68%68%68%68%68%66%44%68%
GPT-4o, Aug. 6th (temp=0)68%68%68%68%68%68%68%68%68%66%68%
Claude 3.7 Sonnet95%84%67%67%64%59%55%52%52%50%64%
GPT-4o, Aug. 6th (temp=1)68%68%68%68%66%66%65%62%56%49%64%
Z.AI GLM 4.798%93%93%68%66%65%59%50%18%18%63%
GPT-4o, May 13th (temp=0)68%68%68%68%68%68%68%68%52%18%62%
Llama 3.1 70B90%72%72%68%68%64%64%62%53%2%61%
GPT-4 Turbo91%80%79%76%67%54%49%47%44%27%61%
Llama 3.2 90B (Vision)99%94%80%80%77%51%50%50%13%3%60%
Gemini 3 Flash (Preview)94%68%68%65%64%62%59%52%32%32%60%
GPT-4.1 Nano100%99%96%92%66%63%41%18%14%2%59%
GPT-4.168%68%68%68%66%66%52%50%48%34%59%
Llama 3.2 3B100%99%71%67%50%50%49%45%44%10%58%
Qwen 2.5 72B100%100%79%72%53%50%50%50%20%0%57%
Gemma 2 9B95%85%75%59%57%51%50%47%20%19%56%
o4 Mini100%94%82%72%55%50%40%34%23%5%55%
Claude 3.5 Sonnet92%91%65%65%56%56%44%40%20%18%55%
Ministral 8B95%95%78%68%51%50%49%44%18%0%55%
Writer: Palmyra X593%93%53%51%51%50%50%50%49%0%54%
Claude Sonnet 4.586%61%57%50%50%50%50%49%49%36%54%
AI21 Jamba 1.5 Large99%99%97%68%59%47%34%18%14%0%54%
Llama 3 TenyxChat-DaybreakStorywriter 70B75%68%68%68%64%59%50%32%19%18%52%
AI21 Jamba 1.5 Mini100%84%80%80%54%49%35%31%5%1%52%
Gemini 2.5 Flash68%68%66%56%50%50%49%44%29%18%50%
Mistral NeMO50%50%50%50%50%50%50%49%47%46%49%
Llama 3.1 405B85%78%77%76%60%55%30%22%1%0%48%
Magnum 72B100%99%92%50%50%48%37%3%0%0%48%
Phi-3 Mini 128k99%68%53%51%50%49%45%43%14%0%47%
Gemini 2.5 Pro93%67%63%63%49%45%28%18%18%18%46%
GPT-4o, May 13th (temp=1)68%68%66%52%52%44%44%23%23%18%46%
Z.AI GLM 4.677%52%51%51%50%48%48%43%18%18%46%
Llama 3.1 Nemotron 70B97%52%52%51%51%50%50%26%19%0%45%
Z.AI GLM 4.591%76%50%50%50%50%28%18%18%10%44%
GPT-4.1 Mini68%68%68%67%44%40%23%19%18%18%43%
Z.AI GLM 4.7 Flash95%62%55%50%50%47%18%18%18%18%43%
Llama 3.2 11B (Vision)93%80%53%50%50%45%33%20%1%0%42%
Hermes 3 70B82%72%60%50%50%48%28%22%2%0%41%
Qwen 2 72B90%66%50%49%47%46%30%21%13%0%41%
Llama 3 70B68%68%68%52%50%26%20%18%18%18%41%
AI21 Jamba97%51%50%50%50%49%48%3%1%1%40%
Claude 3.0 Sonnet91%80%73%66%38%31%14%2%0%0%40%
Hermes 3 405B60%55%51%51%45%43%34%26%14%10%39%
Goliath 120B58%50%50%50%50%48%41%30%10%1%39%
Phi-3.5 Mini 128k91%50%49%49%48%45%38%0%0%0%37%
Cohere Command R+ (Apr. 2024)92%56%50%49%47%41%9%1%1%0%34%
Mistral Nemo 12B Celeste95%57%50%49%48%34%0%0%0%0%33%
Magnum v2 72B50%48%47%46%45%45%44%5%0%0%33%
Gemini Flash 1.554%51%50%50%48%47%15%0%0%0%31%
Sao10K L3.1 70B Hanami x194%50%48%41%32%29%18%1%0%0%31%
Llama 3.1 8B62%50%48%45%41%26%22%18%0%0%31%
EVA Qwen 2.5 14B50%50%45%45%33%30%30%25%2%0%31%
Claude 2.068%49%49%32%19%18%18%18%18%18%31%
Inflection 3 (PI)82%52%50%50%49%14%5%0%0%0%30%
Ministral 3B84%50%46%30%30%21%18%15%0%0%29%
Mistral Medium72%68%50%43%30%22%10%0%0%0%29%
MythoMist 7B88%55%50%49%34%7%5%0%0%0%29%
Claude 3 Haiku50%50%50%47%38%22%18%7%1%0%28%
Mistral Large50%50%48%41%41%38%12%1%0%0%28%
Claude Sonnet 451%44%40%21%20%20%19%19%19%18%27%
Claude 3.5 Sonnet (new)56%36%32%28%21%21%20%19%19%18%27%
Lumimaid v0.2 8B68%50%49%39%18%18%18%3%0%0%26%
Gemini 3 Pro (Preview)50%49%49%47%43%5%5%3%1%0%25%
Gemma 2 27B72%49%32%18%18%18%12%5%3%0%23%
Mistral Large 250%49%49%39%24%7%2%1%1%0%22%
Claude 3.5 Haiku54%53%27%21%16%8%5%3%1%0%19%
MythoMax 13B75%54%50%7%0%0%0%0%0%0%19%
Rocinante 12B50%49%46%22%10%4%0%0%0%0%18%
Llama 3.2 1B68%45%26%25%9%0%0%0%0%0%17%
Llama 3.1 Euryale 70B v2.253%51%50%4%2%0%0%0%0%0%16%
Qwen 2 7B50%49%48%2%0%0%0%0%0%0%15%
Fimbulvetr 11B v250%50%45%0%0%0%0%0%0%0%14%
Inflection 3 (Productivity)41%34%14%13%4%4%3%0%0%0%11%
WizardLM 2 8x22b50%44%6%4%1%0%0%0%0%0%11%
Mistral Small Creative47%23%18%4%3%0%0%0%0%0%10%
Phi-3 Medium 128k50%36%3%0%0%0%0%0%0%0%9%
Liquid: LFM 40B MoE41%26%14%6%2%0%0%0%0%0%9%
lzlv 70B68%11%0%0%0%0%0%0%0%0%8%
Cohere Command R+ (Aug. 2024)31%10%10%1%0%0%0%0%0%0%5%
MN GRAND Gutenberg Lyra4 12B Madness42%10%0%0%0%0%0%0%0%0%5%
Llama 3 Euryale 70B v2.118%4%2%1%0%0%0%0%0%0%3%
Hermes 2 Theta 8B17%0%0%0%0%0%0%0%0%0%2%
Gemini Pro 1.50%0%0%0%0%0%0%0%0%0%0%
Toppy M 7B0%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
40.48%

Write 500 words with 30% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini High100%100%100%100%89%89%85%50%50%25%79%
o4 Mini100%100%100%80%76%61%55%53%51%43%72%
Claude Opus 4.596%65%65%60%58%52%50%50%50%47%59%
Claude Opus 4.690%72%50%50%40%18%18%1%1%1%34%
MoonshotAI: Kimi K2.594%91%55%39%27%27%0%0%0%0%33%
Claude 3.7 Sonnet50%50%49%43%43%41%38%18%0%0%33%
GPT-4o, Aug. 6th (temp=0)50%43%43%43%43%41%34%5%0%0%30%
Inflection 3 (Productivity)98%97%52%26%10%8%7%0%0%0%30%
Claude Opus 450%50%50%49%34%34%22%5%0%0%29%
Claude 3.0 Sonnet52%51%50%49%42%26%15%0%0%0%28%
Claude Sonnet 487%54%50%42%17%9%8%5%0%0%27%
Toppy M 7B50%47%45%43%42%42%1%0%0%0%27%
Z.AI GLM 4.767%50%43%43%28%19%14%3%0%0%27%
lzlv 70B49%45%44%43%40%29%15%2%1%0%27%
Gemini 3 Flash (Preview)50%50%50%49%34%30%5%0%0%0%27%
Hermes 2 Theta 8B49%48%44%43%42%28%0%0%0%0%26%
Gemini 2.5 Pro98%94%41%14%2%0%0%0%0%0%25%
GPT-4o Mini (temp=0)50%50%47%22%22%18%14%14%10%0%24%
Mistral Large49%49%45%36%35%28%0%0%0%0%24%
Hermes 3 405B50%50%45%38%30%14%10%0%0%0%24%
Inflection 3 (PI)97%53%49%13%3%3%1%0%0%0%22%
Claude 3.5 Sonnet (new)49%47%47%41%18%7%3%3%1%1%22%
Claude Haiku 4.550%49%41%38%34%3%1%0%0%0%22%
AI21 Jamba 1.5 Large50%45%41%27%23%2%1%0%0%0%19%
Llama 3.2 11B (Vision)49%45%34%22%18%18%0%0%0%0%19%
Claude 2.050%50%38%26%10%0%0%0%0%0%17%
Gemini 2.5 Flash Lite50%47%45%30%0%0%0%0%0%0%17%
Mistral Medium50%48%47%18%4%1%1%0%0%0%17%
Claude 3 Haiku79%71%15%3%1%0%0%0%0%0%17%
Gemini 3 Pro (Preview)50%45%43%27%0%0%0%0%0%0%17%
Claude Sonnet 4.547%38%37%11%7%7%6%5%0%0%16%
Z.AI GLM 4.7 Flash50%50%45%1%0%0%0%0%0%0%15%
AI21 Jamba50%50%44%1%1%0%0%0%0%0%15%
GPT-4o, May 13th (temp=0)45%23%22%16%15%10%8%3%1%0%14%
Ministral 8B49%48%41%0%0%0%0%0%0%0%14%
Llama 3.2 90B (Vision)87%48%0%0%0%0%0%0%0%0%13%
Llama 3 Euryale 70B v2.154%51%12%4%3%1%1%0%0%0%13%
Phi-3.5 Mini 128k44%43%26%10%1%0%0%0%0%0%12%
Llama 3.2 3B45%41%34%0%0%0%0%0%0%0%12%
Phi-3 Medium 128k35%33%22%11%10%1%0%0%0%0%11%
Cohere Command R+ (Apr. 2024)50%49%7%3%0%0%0%0%0%0%11%
MythoMax 13B51%50%5%0%0%0%0%0%0%0%11%
Llama 3.1 Euryale 70B v2.250%42%14%0%0%0%0%0%0%0%11%
Gemma 2 27B52%50%0%0%0%0%0%0%0%0%10%
AI21 Jamba 1.5 Mini48%43%10%0%0%0%0%0%0%0%10%
GPT-4o, May 13th (temp=1)47%31%13%6%4%0%0%0%0%0%10%
Hermes 3 70B41%31%26%1%0%0%0%0%0%0%10%
MN GRAND Gutenberg Lyra4 12B Madness50%49%0%0%0%0%0%0%0%0%10%
GPT-4 Turbo38%25%21%7%2%1%0%0%0%0%9%
GPT-4o, Aug. 6th (temp=1)43%29%14%2%0%0%0%0%0%0%9%
Llama 3.1 405B50%34%0%0%0%0%0%0%0%0%8%
Qwen 2 72B45%34%0%0%0%0%0%0%0%0%8%
Mistral NeMO44%26%6%2%0%0%0%0%0%0%8%
DeepSeek-V2 Chat49%26%2%1%0%0%0%0%0%0%8%
Cohere Command R+ (Aug. 2024)41%30%5%0%0%0%0%0%0%0%8%
Llama 3.2 1B49%24%1%0%0%0%0%0%0%0%7%
Z.AI GLM 4.667%3%0%0%0%0%0%0%0%0%7%
Ministral 3B67%3%0%0%0%0%0%0%0%0%7%
GPT-4.1 Mini45%11%7%0%0%0%0%0%0%0%6%
Phi-3 Mini 128k48%7%1%0%0%0%0%0%0%0%6%
Llama 3.1 8B53%2%0%0%0%0%0%0%0%0%5%
Mistral Nemo 12B Celeste26%26%1%1%0%0%0%0%0%0%5%
WizardLM 2 8x22b49%2%2%0%0%0%0%0%0%0%5%
Gemini Flash 1.549%2%1%1%0%0%0%0%0%0%5%
MythoMist 7B49%3%0%0%0%0%0%0%0%0%5%
Gemini 2.5 Flash50%1%0%0%0%0%0%0%0%0%5%
Claude 3.5 Haiku20%19%9%1%0%0%0%0%0%0%5%
GPT-4o Mini (temp=1)41%7%2%0%0%0%0%0%0%0%5%
Qwen 2.5 72B47%2%1%0%0%0%0%0%0%0%5%
Rocinante 12B43%1%0%0%0%0%0%0%0%0%4%
Llama 3 70B43%0%0%0%0%0%0%0%0%0%4%
Qwen 2 7B39%2%1%0%0%0%0%0%0%0%4%
GPT-4.141%1%0%0%0%0%0%0%0%0%4%
Llama 3.1 Nemotron 70B38%0%0%0%0%0%0%0%0%0%4%
Magnum v2 72B36%1%0%0%0%0%0%0%0%0%4%
Llama 3.1 70B18%7%1%1%0%0%0%0%0%0%3%
Goliath 120B24%1%0%0%0%0%0%0%0%0%3%
Mistral Large 226%0%0%0%0%0%0%0%0%0%3%
Gemini Pro 1.521%0%0%0%0%0%0%0%0%0%2%
Liquid: LFM 40B MoE10%9%0%0%0%0%0%0%0%0%2%
Claude 3.5 Sonnet10%5%1%1%0%0%0%0%0%0%2%
Fimbulvetr 11B v214%1%0%0%0%0%0%0%0%0%2%
EVA Qwen 2.5 14B14%1%0%0%0%0%0%0%0%0%1%
Z.AI GLM 4.514%0%0%0%0%0%0%0%0%0%1%
Lumimaid v0.2 8B6%0%0%0%0%0%0%0%0%0%1%
Mistral Small Creative3%0%0%0%0%0%0%0%0%0%0%
Magnum 72B2%0%0%0%0%0%0%0%0%0%0%
Sao10K L3.1 70B Hanami x11%0%0%0%0%0%0%0%0%0%0%
Writer: Palmyra X50%0%0%0%0%0%0%0%0%0%0%
Llama 3 TenyxChat-DaybreakStorywriter 70B0%0%0%0%0%0%0%0%0%0%0%
Gemma 2 9B0%0%0%0%0%0%0%0%0%0%0%
GPT-4.1 Nano0%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
13.67%

Write 500 words with 50% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
Claude Opus 4.694%92%92%89%72%64%58%50%50%49%71%
o4 Mini100%100%90%71%59%50%50%49%45%38%65%
o4 Mini High100%91%91%62%52%50%50%50%22%0%57%
Claude Opus 4.568%67%66%66%54%50%49%22%10%0%45%
MoonshotAI: Kimi K2.5100%60%45%39%25%24%24%16%7%0%34%
Llama 3.2 11B (Vision)77%66%61%49%43%0%0%0%0%0%30%
Claude 3.7 Sonnet50%49%49%48%41%30%26%2%1%0%30%
Cohere Command R+ (Apr. 2024)50%50%49%48%47%43%0%0%0%0%29%
Claude Opus 459%50%50%47%47%18%10%7%1%0%29%
Claude 2.090%50%50%49%20%18%0%0%0%0%28%
GPT-4o Mini (temp=0)50%47%47%47%41%34%7%2%0%0%27%
Claude 3.5 Sonnet (new)49%49%49%43%41%22%10%1%0%0%26%
Claude Sonnet 448%47%45%44%23%4%4%1%0%0%22%
Cohere Command R+ (Aug. 2024)85%79%48%2%1%0%0%0%0%0%22%
Hermes 3 70B81%59%39%22%13%0%0%0%0%0%21%
Llama 3.1 8B59%50%30%30%22%22%0%0%0%0%21%
Qwen 2 7B50%42%41%29%23%22%0%0%0%0%21%
GPT-4 Turbo69%64%50%11%6%5%0%0%0%0%20%
Gemini 2.5 Flash Lite50%49%43%41%10%7%0%0%0%0%20%
Claude Sonnet 4.548%47%46%38%13%3%1%0%0%0%20%
Llama 3 70B58%41%35%32%26%0%0%0%0%0%19%
GPT-4o, Aug. 6th (temp=1)50%46%45%26%15%7%1%1%0%0%19%
GPT-4.150%48%34%26%22%3%0%0%0%0%18%
Phi-3 Medium 128k57%49%41%20%11%2%1%0%0%0%18%
Mistral Medium87%50%39%5%0%0%0%0%0%0%18%
Mistral Large51%50%26%24%15%9%3%2%0%0%18%
Llama 3.2 3B49%47%44%27%13%1%0%0%0%0%18%
Rocinante 12B100%33%24%16%3%1%0%0%0%0%18%
Claude Haiku 4.552%47%45%26%5%0%0%0%0%0%18%
Inflection 3 (Productivity)72%47%24%24%7%0%0%0%0%0%17%
Ministral 8B48%39%36%29%20%1%0%0%0%0%17%
Hermes 3 405B49%48%38%31%5%1%0%0%0%0%17%
Qwen 2 72B50%48%39%29%5%0%0%0%0%0%17%
Magnum 72B58%50%50%11%1%1%0%0%0%0%17%
Z.AI GLM 4.770%60%18%16%4%0%0%0%0%0%17%
Gemini 2.5 Pro67%50%34%7%0%0%0%0%0%0%16%
WizardLM 2 8x22b50%41%33%21%3%3%0%0%0%0%15%
Gemini 2.5 Flash49%45%43%14%0%0%0%0%0%0%15%
GPT-4o, Aug. 6th (temp=0)44%38%37%30%0%0%0%0%0%0%15%
Hermes 2 Theta 8B50%40%38%10%5%4%1%0%0%0%15%
Llama 3.1 Euryale 70B v2.242%40%37%18%3%0%0%0%0%0%14%
Gemini Flash 1.564%47%29%0%0%0%0%0%0%0%14%
Fimbulvetr 11B v249%29%26%15%15%3%1%0%0%0%14%
Inflection 3 (PI)48%48%26%10%5%0%0%0%0%0%14%
Phi-3 Mini 128k48%35%26%20%5%4%0%0%0%0%14%
AI21 Jamba 1.5 Large52%48%32%3%0%0%0%0%0%0%14%
Llama 3.1 405B50%50%32%0%0%0%0%0%0%0%13%
GPT-4o Mini (temp=1)49%41%34%1%0%0%0%0%0%0%12%
Llama 3 Euryale 70B v2.154%36%31%0%0%0%0%0%0%0%12%
Llama 3.1 70B50%34%34%4%0%0%0%0%0%0%12%
Mistral Large 257%50%4%4%3%0%0%0%0%0%12%
Ministral 3B48%48%16%3%3%1%0%0%0%0%12%
GPT-4o, May 13th (temp=1)47%44%18%8%0%0%0%0%0%0%12%
Gemma 2 27B49%38%19%7%3%0%0%0%0%0%12%
AI21 Jamba50%50%13%1%0%0%0%0%0%0%11%
Gemini 3 Flash (Preview)48%43%22%0%0%0%0%0%0%0%11%
Llama 3.2 1B43%42%26%1%0%0%0%0%0%0%11%
Mistral NeMO32%31%22%20%2%1%0%0%0%0%11%
Phi-3.5 Mini 128k46%22%21%14%2%2%0%0%0%0%11%
MythoMist 7B50%50%1%1%0%0%0%0%0%0%10%
MythoMax 13B50%30%18%3%0%0%0%0%0%0%10%
GPT-4.1 Mini43%38%7%5%3%2%1%0%0%0%10%
Liquid: LFM 40B MoE47%37%12%0%0%0%0%0%0%0%10%
EVA Qwen 2.5 14B71%20%4%0%0%0%0%0%0%0%10%
Gemini 3 Pro (Preview)48%41%5%0%0%0%0%0%0%0%9%
Goliath 120B43%31%14%4%1%0%0%0%0%0%9%
Llama 3.2 90B (Vision)49%22%12%3%3%2%2%1%0%0%9%
Claude 3.5 Haiku41%27%8%3%3%1%0%0%0%0%8%
Toppy M 7B50%31%1%0%0%0%0%0%0%0%8%
Lumimaid v0.2 8B43%23%10%6%0%0%0%0%0%0%8%
AI21 Jamba 1.5 Mini41%23%14%1%0%0%0%0%0%0%8%
Z.AI GLM 4.546%18%0%0%0%0%0%0%0%0%6%
Claude 3.5 Sonnet31%19%4%3%1%1%0%0%0%0%6%
Mistral Nemo 12B Celeste31%23%0%0%0%0%0%0%0%0%5%
Gemma 2 9B49%0%0%0%0%0%0%0%0%0%5%
DeepSeek-V2 Chat45%3%0%0%0%0%0%0%0%0%5%
Magnum v2 72B26%14%2%1%0%0%0%0%0%0%4%
Claude 3 Haiku38%1%1%0%0%0%0%0%0%0%4%
Gemini Pro 1.526%9%4%0%0%0%0%0%0%0%4%
Claude 3.0 Sonnet21%8%4%2%0%0%0%0%0%0%3%
Qwen 2.5 72B14%12%5%0%0%0%0%0%0%0%3%
Writer: Palmyra X530%0%0%0%0%0%0%0%0%0%3%
Z.AI GLM 4.7 Flash14%2%0%0%0%0%0%0%0%0%2%
lzlv 70B10%4%0%0%0%0%0%0%0%0%1%
MN GRAND Gutenberg Lyra4 12B Madness12%0%0%0%0%0%0%0%0%0%1%
Llama 3.1 Nemotron 70B11%1%0%0%0%0%0%0%0%0%1%
GPT-4.1 Nano11%0%0%0%0%0%0%0%0%0%1%
Sao10K L3.1 70B Hanami x18%1%0%0%0%0%0%0%0%0%1%
GPT-4o, May 13th (temp=0)7%0%0%0%0%0%0%0%0%0%1%
Z.AI GLM 4.61%0%0%0%0%0%0%0%0%0%0%
Llama 3 TenyxChat-DaybreakStorywriter 70B1%0%0%0%0%0%0%0%0%0%0%
Mistral Small Creative1%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
14.88%

Write 500 words with 70% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini High100%95%83%64%60%56%46%44%29%0%58%
GPT-4o, Aug. 6th (temp=0)99%97%83%65%46%46%46%46%35%1%56%
Claude 3.7 Sonnet72%70%66%64%59%50%49%41%35%22%53%
GPT-4o, Aug. 6th (temp=1)99%93%86%50%44%43%43%2%0%0%46%
Claude 3.5 Sonnet98%92%73%50%46%44%41%3%3%0%45%
DeepSeek-V2 Chat89%84%50%44%44%40%19%17%11%5%40%
Claude Opus 4.656%50%50%50%50%45%30%26%18%18%39%
Claude 3.5 Sonnet (new)50%50%50%50%45%43%19%16%5%0%33%
GPT-4o Mini (temp=0)50%50%50%49%41%38%34%14%0%0%32%
GPT-4 Turbo100%91%59%49%14%0%0%0%0%0%31%
Llama 3.1 405B50%46%38%37%32%31%22%19%12%2%29%
Gemini 3 Flash (Preview)51%50%50%49%38%18%18%14%0%0%29%
Llama 3.1 Euryale 70B v2.275%64%50%47%32%6%2%0%0%0%28%
Claude Opus 4.550%50%48%45%41%14%10%10%5%3%28%
Cohere Command R+ (Aug. 2024)79%50%44%32%31%12%10%10%4%0%27%
Llama 3.1 70B71%47%47%45%45%11%3%1%0%0%27%
Claude Opus 448%47%43%38%38%18%18%17%1%0%27%
Claude 3.5 Haiku50%50%48%45%37%10%8%0%0%0%25%
Mistral Small Creative50%49%45%39%27%14%12%12%0%0%25%
EVA Qwen 2.5 14B83%50%50%37%6%3%0%0%0%0%23%
Sao10K L3.1 70B Hanami x149%49%47%47%20%3%2%0%0%0%22%
Mistral Large 250%49%45%36%17%9%4%4%0%0%21%
Qwen 2.5 72B50%47%37%35%30%12%0%0%0%0%21%
Llama 3.1 8B48%47%37%35%18%10%7%0%0%0%20%
Claude 3 Haiku49%44%30%25%22%14%13%1%0%0%20%
o4 Mini56%55%49%29%6%0%0%0%0%0%19%
Gemini 2.5 Flash Lite47%45%43%22%18%10%0%0%0%0%19%
Claude 3.0 Sonnet50%45%42%21%10%8%3%1%0%0%18%
Llama 3.2 3B43%38%36%26%23%11%0%0%0%0%18%
Claude Sonnet 4.546%43%37%18%17%16%1%0%0%0%18%
GPT-4o Mini (temp=1)50%49%48%24%1%0%0%0%0%0%17%
Qwen 2 72B37%34%29%27%24%16%4%1%0%0%17%
Gemma 2 27B50%35%33%32%14%3%2%2%0%0%17%
Llama 3.2 90B (Vision)99%35%21%7%6%1%1%0%0%0%17%
MythoMist 7B85%43%34%6%0%0%0%0%0%0%17%
Ministral 3B67%43%24%18%15%0%0%0%0%0%17%
GPT-4o, May 13th (temp=1)54%45%41%26%0%0%0%0%0%0%17%
Llama 3.1 Nemotron 70B49%43%38%23%5%2%2%1%1%0%16%
Inflection 3 (Productivity)48%43%40%27%2%1%0%0%0%0%16%
GPT-4.1 Mini50%48%47%12%2%1%0%0%0%0%16%
Rocinante 12B50%46%18%15%8%7%6%0%0%0%15%
Magnum v2 72B67%45%36%0%0%0%0%0%0%0%15%
Llama 3 TenyxChat-DaybreakStorywriter 70B49%45%43%5%3%1%0%0%0%0%15%
Hermes 3 70B45%39%23%22%8%4%2%1%0%0%14%
Mistral NeMO50%43%41%5%0%0%0%0%0%0%14%
Llama 3.2 1B50%46%23%13%1%0%0%0%0%0%13%
Gemini Pro 1.549%49%11%7%5%4%4%2%0%0%13%
GPT-4.150%43%38%1%0%0%0%0%0%0%13%
MoonshotAI: Kimi K2.548%45%15%8%7%6%0%0%0%0%13%
Llama 3 70B47%31%23%8%8%5%1%0%0%0%12%
Gemini Flash 1.541%39%26%11%5%1%0%0%0%0%12%
AI21 Jamba 1.5 Large50%49%22%0%0%0%0%0%0%0%12%
Phi-3 Medium 128k50%30%30%10%0%0%0%0%0%0%12%
Gemini 2.5 Pro47%45%14%14%0%0%0%0%0%0%12%
AI21 Jamba 1.5 Mini41%34%24%15%2%0%0%0%0%0%12%
Magnum 72B45%34%17%14%1%1%0%0%0%0%11%
Z.AI GLM 4.640%30%21%14%5%0%0%0%0%0%11%
Inflection 3 (PI)50%44%14%0%0%0%0%0%0%0%11%
Hermes 3 405B44%36%15%11%1%0%0%0%0%0%11%
Claude 2.058%25%19%2%1%0%0%0%0%0%11%
Llama 3.2 11B (Vision)43%26%25%2%2%1%1%1%0%0%10%
Qwen 2 7B47%37%7%5%4%1%0%0%0%0%10%
Phi-3.5 Mini 128k49%38%9%3%1%0%0%0%0%0%10%
Claude Haiku 4.564%19%6%5%4%1%1%0%0%0%10%
Lumimaid v0.2 8B50%47%0%0%0%0%0%0%0%0%10%
MythoMax 13B50%33%5%3%0%0%0%0%0%0%9%
WizardLM 2 8x22b49%19%13%3%0%0%0%0%0%0%8%
Writer: Palmyra X549%30%2%0%0%0%0%0%0%0%8%
Z.AI GLM 4.741%18%18%0%0%0%0%0%0%0%8%
Llama 3 Euryale 70B v2.143%30%2%0%0%0%0%0%0%0%8%
Gemini 3 Pro (Preview)71%1%0%0%0%0%0%0%0%0%7%
Cohere Command R+ (Apr. 2024)40%18%12%1%0%0%0%0%0%0%7%
GPT-4.1 Nano24%15%14%13%1%1%0%0%0%0%7%
Claude Sonnet 450%6%3%1%0%0%0%0%0%0%6%
Mistral Large34%18%4%0%0%0%0%0%0%0%6%
Mistral Nemo 12B Celeste48%2%1%0%0%0%0%0%0%0%5%
lzlv 70B50%0%0%0%0%0%0%0%0%0%5%
MN GRAND Gutenberg Lyra4 12B Madness48%1%0%0%0%0%0%0%0%0%5%
Ministral 8B49%0%0%0%0%0%0%0%0%0%5%
Hermes 2 Theta 8B49%0%0%0%0%0%0%0%0%0%5%
Toppy M 7B45%1%0%0%0%0%0%0%0%0%5%
Fimbulvetr 11B v246%0%0%0%0%0%0%0%0%0%5%
Gemini 2.5 Flash30%13%0%0%0%0%0%0%0%0%4%
Z.AI GLM 4.7 Flash38%0%0%0%0%0%0%0%0%0%4%
Gemma 2 9B31%3%1%0%0%0%0%0%0%0%4%
Phi-3 Mini 128k20%7%2%0%0%0%0%0%0%0%3%
AI21 Jamba12%9%1%0%0%0%0%0%0%0%2%
Z.AI GLM 4.514%4%0%0%0%0%0%0%0%0%2%
GPT-4o, May 13th (temp=0)13%0%0%0%0%0%0%0%0%0%1%
Goliath 120B10%1%0%0%0%0%0%0%0%0%1%
Liquid: LFM 40B MoE10%0%0%0%0%0%0%0%0%0%1%
Mistral Medium0%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
16.06%