Dialogue tags

Various tasks related to dialogue tags in text.

Write 200 words with 10% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini High100%100%100%100%100%100%100%100%98%96%99%
Claude Opus 4.5100%100%100%100%100%100%97%97%89%82%96%
o4 Mini100%100%100%100%100%100%100%100%99%66%96%
MoonshotAI: Kimi K2.5100%100%100%100%100%100%99%97%84%65%94%
Claude Sonnet 4100%97%97%96%96%95%94%88%84%60%91%
Claude Sonnet 4.5100%98%97%96%94%85%84%80%76%68%88%
Z.AI GLM 4.7100%100%100%99%99%98%84%64%50%50%84%
Claude Opus 4100%97%96%95%85%80%76%70%67%49%81%
Claude Opus 4.6100%100%99%98%90%90%68%58%54%45%80%
Claude Haiku 4.5100%100%99%97%97%94%73%50%41%38%79%
Claude 3.5 Haiku100%95%91%87%83%81%73%72%58%44%78%
GPT-4 Turbo100%100%95%95%94%80%58%52%50%47%77%
Gemini 2.5 Pro100%99%98%98%80%72%57%56%50%50%76%
Gemini 3 Flash (Preview)98%78%68%68%68%68%66%65%65%65%71%
Gemini 3 Pro (Preview)100%100%97%95%72%68%68%64%26%19%71%
Llama 3.2 90B (Vision)99%93%62%60%50%50%50%49%47%40%60%
Llama 3.1 405B95%79%77%54%50%50%49%48%38%10%55%
GPT-4o, Aug. 6th (temp=1)66%52%50%50%50%50%49%47%43%43%50%
GPT-4o Mini (temp=0)50%50%50%50%50%50%50%50%49%45%49%
GPT-4o Mini (temp=1)50%50%50%50%50%50%49%48%45%43%49%
Claude 3.0 Sonnet50%50%50%49%49%49%49%47%47%45%48%
Z.AI GLM 4.7 Flash80%59%50%50%50%49%48%47%45%0%48%
GPT-4.150%50%50%50%50%50%50%49%43%34%48%
GPT-4o, Aug. 6th (temp=0)50%50%50%50%50%50%50%49%48%26%47%
Z.AI GLM 4.699%95%60%50%50%50%36%11%0%0%45%
Claude 3.5 Sonnet97%71%52%51%50%45%44%5%5%0%42%
GPT-4.1 Nano50%50%49%49%48%45%43%34%26%22%42%
Z.AI GLM 4.588%88%65%49%48%48%30%0%0%0%42%
Llama 3.1 70B50%50%50%49%47%43%41%38%34%14%41%
Llama 3 70B50%49%48%47%45%41%38%34%30%0%38%
Claude 3 Haiku49%49%49%49%49%47%45%26%7%0%37%
GPT-4.1 Mini50%50%50%47%47%45%45%14%14%0%36%
Claude 3.5 Sonnet (new)63%45%45%45%30%30%30%27%14%11%34%
Mistral Large88%59%55%52%50%22%8%7%0%0%34%
Goliath 120B59%50%47%39%36%34%26%18%10%7%33%
Gemini Pro 1.550%49%49%49%46%42%28%10%0%0%32%
Mistral Nemo 12B Celeste50%49%49%48%44%43%34%0%0%0%32%
Gemini 2.5 Flash50%50%49%45%34%30%30%26%0%0%32%
GPT-4o, May 13th (temp=1)50%50%49%48%41%41%33%3%0%0%31%
Llama 3.1 8B52%50%50%49%49%41%18%1%0%0%31%
Gemma 2 27B86%50%50%50%48%0%0%0%0%0%28%
AI21 Jamba 1.5 Mini65%50%50%49%48%18%3%0%0%0%28%
Claude 2.092%50%45%38%26%25%3%2%2%0%28%
Hermes 3 405B50%50%45%41%41%34%10%5%0%0%28%
Sao10K L3.1 70B Hanami x150%50%49%47%41%22%1%0%0%0%26%
MythoMist 7B50%49%48%41%41%23%6%0%0%0%26%
Phi-3 Mini 128k82%49%43%41%30%10%2%1%0%0%26%
Gemini 2.5 Flash Lite50%50%50%49%48%0%0%0%0%0%25%
Llama 3 TenyxChat-DaybreakStorywriter 70B69%50%43%41%34%5%1%1%0%0%24%
AI21 Jamba50%50%49%48%26%14%3%0%0%0%24%
Llama 3.2 11B (Vision)50%50%43%38%31%22%1%1%0%0%23%
Writer: Palmyra X550%49%49%47%36%1%1%0%0%0%23%
Lumimaid v0.2 8B85%49%45%45%5%0%0%0%0%0%23%
Gemma 2 9B50%50%47%26%22%14%9%3%0%0%22%
DeepSeek-V2 Chat50%49%47%26%26%18%5%0%0%0%22%
Hermes 3 70B75%50%49%38%7%1%0%0%0%0%22%
GPT-4o, May 13th (temp=0)50%49%49%41%22%7%0%0%0%0%22%
Rocinante 12B50%49%49%45%22%0%0%0%0%0%22%
Claude 2.149%18%18%18%18%18%18%18%18%18%21%
Qwen 2.5 72B50%50%45%43%22%0%0%0%0%0%21%
Llama 3.2 3B50%49%49%45%10%3%0%0%0%0%21%
Inflection 3 (Productivity)52%48%43%41%10%7%4%0%0%0%20%
Claude 3.7 Sonnet41%38%38%34%22%7%7%7%5%5%20%
Mistral NeMO69%50%50%34%0%0%0%0%0%0%20%
Cohere Command R+ (Apr. 2024)51%50%45%41%2%0%0%0%0%0%19%
Llama 3.2 1B96%40%18%14%5%3%1%0%0%0%18%
AI21 Jamba 1.5 Large49%49%43%10%7%5%3%1%0%0%17%
Liquid: LFM 40B MoE50%50%48%18%0%0%0%0%0%0%17%
Magnum 72B50%49%47%7%2%0%0%0%0%0%16%
Llama 3.1 Nemotron 70B50%47%34%18%5%0%0%0%0%0%15%
MythoMax 13B59%49%42%1%0%0%0%0%0%0%15%
Inflection 3 (PI)50%38%34%22%0%0%0%0%0%0%14%
EVA Qwen 2.5 14B45%38%22%19%18%2%0%0%0%0%14%
Qwen 2 72B59%48%34%1%1%0%0%0%0%0%14%
MN GRAND Gutenberg Lyra4 12B Madness50%50%38%1%0%0%0%0%0%0%14%
Mistral Medium51%45%18%14%7%2%0%0%0%0%14%
Phi-3 Medium 128k48%34%31%14%4%1%0%0%0%0%13%
Toppy M 7B48%37%28%0%0%0%0%0%0%0%11%
Ministral 3B45%18%18%18%3%0%0%0%0%0%10%
Llama 3 Euryale 70B v2.149%49%0%0%0%0%0%0%0%0%10%
Phi-3.5 Mini 128k49%45%1%0%0%0%0%0%0%0%10%
Ministral 8B49%18%0%0%0%0%0%0%0%0%7%
Qwen 2 7B50%1%0%0%0%0%0%0%0%0%5%
Llama 3.1 Euryale 70B v2.249%0%0%0%0%0%0%0%0%0%5%
Hermes 2 Theta 8B44%1%0%0%0%0%0%0%0%0%4%
Mistral Large 222%18%5%1%0%0%0%0%0%0%4%
Gemini Flash 1.531%1%0%0%0%0%0%0%0%0%3%
lzlv 70B13%8%5%0%0%0%0%0%0%0%3%
Cohere Command R+ (Aug. 2024)10%0%0%0%0%0%0%0%0%0%1%
Mistral Small Creative2%0%0%0%0%0%0%0%0%0%0%
Magnum v2 72B1%0%0%0%0%0%0%0%0%0%0%
WizardLM 2 8x22b0%0%0%0%0%0%0%0%0%0%0%
Fimbulvetr 11B v20%0%0%0%0%0%0%0%0%0%0%
33.74%