Dialogue tags

Various tasks related to dialogue tags in text.

Write 500 words with 70% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini High100%95%83%64%60%56%46%44%29%0%58%
GPT-4o, Aug. 6th (temp=0)99%97%83%65%46%46%46%46%35%1%56%
Claude 3.7 Sonnet72%70%66%64%59%50%49%41%35%22%53%
GPT-4o, Aug. 6th (temp=1)99%93%86%50%44%43%43%2%0%0%46%
Claude 3.5 Sonnet98%92%73%50%46%44%41%3%3%0%45%
DeepSeek-V2 Chat89%84%50%44%44%40%19%17%11%5%40%
Claude Opus 4.656%50%50%50%50%45%30%26%18%18%39%
Claude 3.5 Sonnet (new)50%50%50%50%45%43%19%16%5%0%33%
GPT-4o Mini (temp=0)50%50%50%49%41%38%34%14%0%0%32%
GPT-4 Turbo100%91%59%49%14%0%0%0%0%0%31%
Llama 3.1 405B50%46%38%37%32%31%22%19%12%2%29%
Gemini 3 Flash (Preview)51%50%50%49%38%18%18%14%0%0%29%
Llama 3.1 Euryale 70B v2.275%64%50%47%32%6%2%0%0%0%28%
Claude Opus 4.550%50%48%45%41%14%10%10%5%3%28%
Cohere Command R+ (Aug. 2024)79%50%44%32%31%12%10%10%4%0%27%
Llama 3.1 70B71%47%47%45%45%11%3%1%0%0%27%
Claude Opus 448%47%43%38%38%18%18%17%1%0%27%
Claude 3.5 Haiku50%50%48%45%37%10%8%0%0%0%25%
Mistral Small Creative50%49%45%39%27%14%12%12%0%0%25%
EVA Qwen 2.5 14B83%50%50%37%6%3%0%0%0%0%23%
Sao10K L3.1 70B Hanami x149%49%47%47%20%3%2%0%0%0%22%
Mistral Large 250%49%45%36%17%9%4%4%0%0%21%
Qwen 2.5 72B50%47%37%35%30%12%0%0%0%0%21%
Llama 3.1 8B48%47%37%35%18%10%7%0%0%0%20%
Claude 3 Haiku49%44%30%25%22%14%13%1%0%0%20%
o4 Mini56%55%49%29%6%0%0%0%0%0%19%
Gemini 2.5 Flash Lite47%45%43%22%18%10%0%0%0%0%19%
Claude 3.0 Sonnet50%45%42%21%10%8%3%1%0%0%18%
Llama 3.2 3B43%38%36%26%23%11%0%0%0%0%18%
Claude Sonnet 4.546%43%37%18%17%16%1%0%0%0%18%
GPT-4o Mini (temp=1)50%49%48%24%1%0%0%0%0%0%17%
Qwen 2 72B37%34%29%27%24%16%4%1%0%0%17%
Gemma 2 27B50%35%33%32%14%3%2%2%0%0%17%
Llama 3.2 90B (Vision)99%35%21%7%6%1%1%0%0%0%17%
MythoMist 7B85%43%34%6%0%0%0%0%0%0%17%
Ministral 3B67%43%24%18%15%0%0%0%0%0%17%
GPT-4o, May 13th (temp=1)54%45%41%26%0%0%0%0%0%0%17%
Llama 3.1 Nemotron 70B49%43%38%23%5%2%2%1%1%0%16%
Inflection 3 (Productivity)48%43%40%27%2%1%0%0%0%0%16%
GPT-4.1 Mini50%48%47%12%2%1%0%0%0%0%16%
Rocinante 12B50%46%18%15%8%7%6%0%0%0%15%
Magnum v2 72B67%45%36%0%0%0%0%0%0%0%15%
Llama 3 TenyxChat-DaybreakStorywriter 70B49%45%43%5%3%1%0%0%0%0%15%
Hermes 3 70B45%39%23%22%8%4%2%1%0%0%14%
Mistral NeMO50%43%41%5%0%0%0%0%0%0%14%
Llama 3.2 1B50%46%23%13%1%0%0%0%0%0%13%
Gemini Pro 1.549%49%11%7%5%4%4%2%0%0%13%
GPT-4.150%43%38%1%0%0%0%0%0%0%13%
MoonshotAI: Kimi K2.548%45%15%8%7%6%0%0%0%0%13%
Llama 3 70B47%31%23%8%8%5%1%0%0%0%12%
Gemini Flash 1.541%39%26%11%5%1%0%0%0%0%12%
AI21 Jamba 1.5 Large50%49%22%0%0%0%0%0%0%0%12%
Phi-3 Medium 128k50%30%30%10%0%0%0%0%0%0%12%
Gemini 2.5 Pro47%45%14%14%0%0%0%0%0%0%12%
AI21 Jamba 1.5 Mini41%34%24%15%2%0%0%0%0%0%12%
Magnum 72B45%34%17%14%1%1%0%0%0%0%11%
Z.AI GLM 4.640%30%21%14%5%0%0%0%0%0%11%
Inflection 3 (PI)50%44%14%0%0%0%0%0%0%0%11%
Hermes 3 405B44%36%15%11%1%0%0%0%0%0%11%
Claude 2.058%25%19%2%1%0%0%0%0%0%11%
Llama 3.2 11B (Vision)43%26%25%2%2%1%1%1%0%0%10%
Qwen 2 7B47%37%7%5%4%1%0%0%0%0%10%
Phi-3.5 Mini 128k49%38%9%3%1%0%0%0%0%0%10%
Claude Haiku 4.564%19%6%5%4%1%1%0%0%0%10%
Lumimaid v0.2 8B50%47%0%0%0%0%0%0%0%0%10%
MythoMax 13B50%33%5%3%0%0%0%0%0%0%9%
WizardLM 2 8x22b49%19%13%3%0%0%0%0%0%0%8%
Writer: Palmyra X549%30%2%0%0%0%0%0%0%0%8%
Z.AI GLM 4.741%18%18%0%0%0%0%0%0%0%8%
Llama 3 Euryale 70B v2.143%30%2%0%0%0%0%0%0%0%8%
Gemini 3 Pro (Preview)71%1%0%0%0%0%0%0%0%0%7%
Cohere Command R+ (Apr. 2024)40%18%12%1%0%0%0%0%0%0%7%
GPT-4.1 Nano24%15%14%13%1%1%0%0%0%0%7%
Claude Sonnet 450%6%3%1%0%0%0%0%0%0%6%
Mistral Large34%18%4%0%0%0%0%0%0%0%6%
Mistral Nemo 12B Celeste48%2%1%0%0%0%0%0%0%0%5%
lzlv 70B50%0%0%0%0%0%0%0%0%0%5%
MN GRAND Gutenberg Lyra4 12B Madness48%1%0%0%0%0%0%0%0%0%5%
Ministral 8B49%0%0%0%0%0%0%0%0%0%5%
Hermes 2 Theta 8B49%0%0%0%0%0%0%0%0%0%5%
Toppy M 7B45%1%0%0%0%0%0%0%0%0%5%
Fimbulvetr 11B v246%0%0%0%0%0%0%0%0%0%5%
Gemini 2.5 Flash30%13%0%0%0%0%0%0%0%0%4%
Z.AI GLM 4.7 Flash38%0%0%0%0%0%0%0%0%0%4%
Gemma 2 9B31%3%1%0%0%0%0%0%0%0%4%
Phi-3 Mini 128k20%7%2%0%0%0%0%0%0%0%3%
AI21 Jamba12%9%1%0%0%0%0%0%0%0%2%
Z.AI GLM 4.514%4%0%0%0%0%0%0%0%0%2%
GPT-4o, May 13th (temp=0)13%0%0%0%0%0%0%0%0%0%1%
Goliath 120B10%1%0%0%0%0%0%0%0%0%1%
Liquid: LFM 40B MoE10%0%0%0%0%0%0%0%0%0%1%
Mistral Medium0%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
16.06%