Dialogue tags

Various tasks related to dialogue tags in text.

Write 500 words with 50% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
Claude Opus 4.694%92%92%89%72%64%58%50%50%49%71%
o4 Mini100%100%90%71%59%50%50%49%45%38%65%
o4 Mini High100%91%91%62%52%50%50%50%22%0%57%
Claude Opus 4.568%67%66%66%54%50%49%22%10%0%45%
MoonshotAI: Kimi K2.5100%60%45%39%25%24%24%16%7%0%34%
Llama 3.2 11B (Vision)77%66%61%49%43%0%0%0%0%0%30%
Claude 3.7 Sonnet50%49%49%48%41%30%26%2%1%0%30%
Cohere Command R+ (Apr. 2024)50%50%49%48%47%43%0%0%0%0%29%
Claude Opus 459%50%50%47%47%18%10%7%1%0%29%
Claude 2.090%50%50%49%20%18%0%0%0%0%28%
GPT-4o Mini (temp=0)50%47%47%47%41%34%7%2%0%0%27%
Claude 3.5 Sonnet (new)49%49%49%43%41%22%10%1%0%0%26%
Claude Sonnet 448%47%45%44%23%4%4%1%0%0%22%
Cohere Command R+ (Aug. 2024)85%79%48%2%1%0%0%0%0%0%22%
Hermes 3 70B81%59%39%22%13%0%0%0%0%0%21%
Llama 3.1 8B59%50%30%30%22%22%0%0%0%0%21%
Qwen 2 7B50%42%41%29%23%22%0%0%0%0%21%
GPT-4 Turbo69%64%50%11%6%5%0%0%0%0%20%
Gemini 2.5 Flash Lite50%49%43%41%10%7%0%0%0%0%20%
Claude Sonnet 4.548%47%46%38%13%3%1%0%0%0%20%
Llama 3 70B58%41%35%32%26%0%0%0%0%0%19%
GPT-4o, Aug. 6th (temp=1)50%46%45%26%15%7%1%1%0%0%19%
GPT-4.150%48%34%26%22%3%0%0%0%0%18%
Phi-3 Medium 128k57%49%41%20%11%2%1%0%0%0%18%
Mistral Medium87%50%39%5%0%0%0%0%0%0%18%
Mistral Large51%50%26%24%15%9%3%2%0%0%18%
Llama 3.2 3B49%47%44%27%13%1%0%0%0%0%18%
Rocinante 12B100%33%24%16%3%1%0%0%0%0%18%
Claude Haiku 4.552%47%45%26%5%0%0%0%0%0%18%
Inflection 3 (Productivity)72%47%24%24%7%0%0%0%0%0%17%
Ministral 8B48%39%36%29%20%1%0%0%0%0%17%
Hermes 3 405B49%48%38%31%5%1%0%0%0%0%17%
Qwen 2 72B50%48%39%29%5%0%0%0%0%0%17%
Magnum 72B58%50%50%11%1%1%0%0%0%0%17%
Z.AI GLM 4.770%60%18%16%4%0%0%0%0%0%17%
Gemini 2.5 Pro67%50%34%7%0%0%0%0%0%0%16%
WizardLM 2 8x22b50%41%33%21%3%3%0%0%0%0%15%
Gemini 2.5 Flash49%45%43%14%0%0%0%0%0%0%15%
GPT-4o, Aug. 6th (temp=0)44%38%37%30%0%0%0%0%0%0%15%
Hermes 2 Theta 8B50%40%38%10%5%4%1%0%0%0%15%
Llama 3.1 Euryale 70B v2.242%40%37%18%3%0%0%0%0%0%14%
Gemini Flash 1.564%47%29%0%0%0%0%0%0%0%14%
Fimbulvetr 11B v249%29%26%15%15%3%1%0%0%0%14%
Inflection 3 (PI)48%48%26%10%5%0%0%0%0%0%14%
Phi-3 Mini 128k48%35%26%20%5%4%0%0%0%0%14%
AI21 Jamba 1.5 Large52%48%32%3%0%0%0%0%0%0%14%
Llama 3.1 405B50%50%32%0%0%0%0%0%0%0%13%
GPT-4o Mini (temp=1)49%41%34%1%0%0%0%0%0%0%12%
Llama 3 Euryale 70B v2.154%36%31%0%0%0%0%0%0%0%12%
Llama 3.1 70B50%34%34%4%0%0%0%0%0%0%12%
Mistral Large 257%50%4%4%3%0%0%0%0%0%12%
Ministral 3B48%48%16%3%3%1%0%0%0%0%12%
GPT-4o, May 13th (temp=1)47%44%18%8%0%0%0%0%0%0%12%
Gemma 2 27B49%38%19%7%3%0%0%0%0%0%12%
AI21 Jamba50%50%13%1%0%0%0%0%0%0%11%
Gemini 3 Flash (Preview)48%43%22%0%0%0%0%0%0%0%11%
Llama 3.2 1B43%42%26%1%0%0%0%0%0%0%11%
Mistral NeMO32%31%22%20%2%1%0%0%0%0%11%
Phi-3.5 Mini 128k46%22%21%14%2%2%0%0%0%0%11%
MythoMist 7B50%50%1%1%0%0%0%0%0%0%10%
MythoMax 13B50%30%18%3%0%0%0%0%0%0%10%
GPT-4.1 Mini43%38%7%5%3%2%1%0%0%0%10%
Liquid: LFM 40B MoE47%37%12%0%0%0%0%0%0%0%10%
EVA Qwen 2.5 14B71%20%4%0%0%0%0%0%0%0%10%
Gemini 3 Pro (Preview)48%41%5%0%0%0%0%0%0%0%9%
Goliath 120B43%31%14%4%1%0%0%0%0%0%9%
Llama 3.2 90B (Vision)49%22%12%3%3%2%2%1%0%0%9%
Claude 3.5 Haiku41%27%8%3%3%1%0%0%0%0%8%
Toppy M 7B50%31%1%0%0%0%0%0%0%0%8%
Lumimaid v0.2 8B43%23%10%6%0%0%0%0%0%0%8%
AI21 Jamba 1.5 Mini41%23%14%1%0%0%0%0%0%0%8%
Z.AI GLM 4.546%18%0%0%0%0%0%0%0%0%6%
Claude 3.5 Sonnet31%19%4%3%1%1%0%0%0%0%6%
Mistral Nemo 12B Celeste31%23%0%0%0%0%0%0%0%0%5%
Gemma 2 9B49%0%0%0%0%0%0%0%0%0%5%
DeepSeek-V2 Chat45%3%0%0%0%0%0%0%0%0%5%
Magnum v2 72B26%14%2%1%0%0%0%0%0%0%4%
Claude 3 Haiku38%1%1%0%0%0%0%0%0%0%4%
Gemini Pro 1.526%9%4%0%0%0%0%0%0%0%4%
Claude 3.0 Sonnet21%8%4%2%0%0%0%0%0%0%3%
Qwen 2.5 72B14%12%5%0%0%0%0%0%0%0%3%
Writer: Palmyra X530%0%0%0%0%0%0%0%0%0%3%
Z.AI GLM 4.7 Flash14%2%0%0%0%0%0%0%0%0%2%
lzlv 70B10%4%0%0%0%0%0%0%0%0%1%
MN GRAND Gutenberg Lyra4 12B Madness12%0%0%0%0%0%0%0%0%0%1%
Llama 3.1 Nemotron 70B11%1%0%0%0%0%0%0%0%0%1%
GPT-4.1 Nano11%0%0%0%0%0%0%0%0%0%1%
Sao10K L3.1 70B Hanami x18%1%0%0%0%0%0%0%0%0%1%
GPT-4o, May 13th (temp=0)7%0%0%0%0%0%0%0%0%0%1%
Z.AI GLM 4.61%0%0%0%0%0%0%0%0%0%0%
Llama 3 TenyxChat-DaybreakStorywriter 70B1%0%0%0%0%0%0%0%0%0%0%
Mistral Small Creative1%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
14.88%