Dialogue tags

Various tasks related to dialogue tags in text.

Write 200 words with 90% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
GPT-4o Mini (temp=0)100%100%100%100%100%100%99%98%97%90%98%
Claude Opus 4.6100%100%100%99%99%99%97%90%89%86%96%
Claude Opus 4.599%98%96%96%95%91%87%67%52%49%83%
Claude Opus 499%99%95%93%89%86%81%68%64%21%79%
MoonshotAI: Kimi K2.5100%97%95%87%84%81%60%53%51%50%76%
Gemini 2.5 Flash Lite95%88%88%88%84%76%67%67%50%49%75%
o4 Mini High99%97%89%82%77%75%68%53%50%49%74%
Claude Haiku 4.5100%98%93%87%87%60%55%52%50%44%73%
DeepSeek-V2 Chat99%92%92%87%83%72%50%50%49%49%72%
GPT-4o Mini (temp=1)95%68%68%68%68%68%68%68%66%44%68%
GPT-4o, Aug. 6th (temp=0)68%68%68%68%68%68%68%68%68%66%68%
Claude 3.7 Sonnet95%84%67%67%64%59%55%52%52%50%64%
GPT-4o, Aug. 6th (temp=1)68%68%68%68%66%66%65%62%56%49%64%
Z.AI GLM 4.798%93%93%68%66%65%59%50%18%18%63%
GPT-4o, May 13th (temp=0)68%68%68%68%68%68%68%68%52%18%62%
Llama 3.1 70B90%72%72%68%68%64%64%62%53%2%61%
GPT-4 Turbo91%80%79%76%67%54%49%47%44%27%61%
Llama 3.2 90B (Vision)99%94%80%80%77%51%50%50%13%3%60%
Gemini 3 Flash (Preview)94%68%68%65%64%62%59%52%32%32%60%
GPT-4.1 Nano100%99%96%92%66%63%41%18%14%2%59%
GPT-4.168%68%68%68%66%66%52%50%48%34%59%
Llama 3.2 3B100%99%71%67%50%50%49%45%44%10%58%
Qwen 2.5 72B100%100%79%72%53%50%50%50%20%0%57%
Gemma 2 9B95%85%75%59%57%51%50%47%20%19%56%
o4 Mini100%94%82%72%55%50%40%34%23%5%55%
Claude 3.5 Sonnet92%91%65%65%56%56%44%40%20%18%55%
Ministral 8B95%95%78%68%51%50%49%44%18%0%55%
Writer: Palmyra X593%93%53%51%51%50%50%50%49%0%54%
Claude Sonnet 4.586%61%57%50%50%50%50%49%49%36%54%
AI21 Jamba 1.5 Large99%99%97%68%59%47%34%18%14%0%54%
Llama 3 TenyxChat-DaybreakStorywriter 70B75%68%68%68%64%59%50%32%19%18%52%
AI21 Jamba 1.5 Mini100%84%80%80%54%49%35%31%5%1%52%
Gemini 2.5 Flash68%68%66%56%50%50%49%44%29%18%50%
Mistral NeMO50%50%50%50%50%50%50%49%47%46%49%
Llama 3.1 405B85%78%77%76%60%55%30%22%1%0%48%
Magnum 72B100%99%92%50%50%48%37%3%0%0%48%
Phi-3 Mini 128k99%68%53%51%50%49%45%43%14%0%47%
Gemini 2.5 Pro93%67%63%63%49%45%28%18%18%18%46%
GPT-4o, May 13th (temp=1)68%68%66%52%52%44%44%23%23%18%46%
Z.AI GLM 4.677%52%51%51%50%48%48%43%18%18%46%
Llama 3.1 Nemotron 70B97%52%52%51%51%50%50%26%19%0%45%
Z.AI GLM 4.591%76%50%50%50%50%28%18%18%10%44%
GPT-4.1 Mini68%68%68%67%44%40%23%19%18%18%43%
Z.AI GLM 4.7 Flash95%62%55%50%50%47%18%18%18%18%43%
Llama 3.2 11B (Vision)93%80%53%50%50%45%33%20%1%0%42%
Hermes 3 70B82%72%60%50%50%48%28%22%2%0%41%
Qwen 2 72B90%66%50%49%47%46%30%21%13%0%41%
Llama 3 70B68%68%68%52%50%26%20%18%18%18%41%
AI21 Jamba97%51%50%50%50%49%48%3%1%1%40%
Claude 3.0 Sonnet91%80%73%66%38%31%14%2%0%0%40%
Hermes 3 405B60%55%51%51%45%43%34%26%14%10%39%
Goliath 120B58%50%50%50%50%48%41%30%10%1%39%
Phi-3.5 Mini 128k91%50%49%49%48%45%38%0%0%0%37%
Cohere Command R+ (Apr. 2024)92%56%50%49%47%41%9%1%1%0%34%
Mistral Nemo 12B Celeste95%57%50%49%48%34%0%0%0%0%33%
Magnum v2 72B50%48%47%46%45%45%44%5%0%0%33%
Gemini Flash 1.554%51%50%50%48%47%15%0%0%0%31%
Sao10K L3.1 70B Hanami x194%50%48%41%32%29%18%1%0%0%31%
Llama 3.1 8B62%50%48%45%41%26%22%18%0%0%31%
EVA Qwen 2.5 14B50%50%45%45%33%30%30%25%2%0%31%
Claude 2.068%49%49%32%19%18%18%18%18%18%31%
Inflection 3 (PI)82%52%50%50%49%14%5%0%0%0%30%
Ministral 3B84%50%46%30%30%21%18%15%0%0%29%
Mistral Medium72%68%50%43%30%22%10%0%0%0%29%
MythoMist 7B88%55%50%49%34%7%5%0%0%0%29%
Claude 3 Haiku50%50%50%47%38%22%18%7%1%0%28%
Mistral Large50%50%48%41%41%38%12%1%0%0%28%
Claude Sonnet 451%44%40%21%20%20%19%19%19%18%27%
Claude 3.5 Sonnet (new)56%36%32%28%21%21%20%19%19%18%27%
Lumimaid v0.2 8B68%50%49%39%18%18%18%3%0%0%26%
Gemini 3 Pro (Preview)50%49%49%47%43%5%5%3%1%0%25%
Gemma 2 27B72%49%32%18%18%18%12%5%3%0%23%
Mistral Large 250%49%49%39%24%7%2%1%1%0%22%
Claude 3.5 Haiku54%53%27%21%16%8%5%3%1%0%19%
MythoMax 13B75%54%50%7%0%0%0%0%0%0%19%
Rocinante 12B50%49%46%22%10%4%0%0%0%0%18%
Llama 3.2 1B68%45%26%25%9%0%0%0%0%0%17%
Llama 3.1 Euryale 70B v2.253%51%50%4%2%0%0%0%0%0%16%
Qwen 2 7B50%49%48%2%0%0%0%0%0%0%15%
Fimbulvetr 11B v250%50%45%0%0%0%0%0%0%0%14%
Inflection 3 (Productivity)41%34%14%13%4%4%3%0%0%0%11%
WizardLM 2 8x22b50%44%6%4%1%0%0%0%0%0%11%
Mistral Small Creative47%23%18%4%3%0%0%0%0%0%10%
Phi-3 Medium 128k50%36%3%0%0%0%0%0%0%0%9%
Liquid: LFM 40B MoE41%26%14%6%2%0%0%0%0%0%9%
lzlv 70B68%11%0%0%0%0%0%0%0%0%8%
Cohere Command R+ (Aug. 2024)31%10%10%1%0%0%0%0%0%0%5%
MN GRAND Gutenberg Lyra4 12B Madness42%10%0%0%0%0%0%0%0%0%5%
Llama 3 Euryale 70B v2.118%4%2%1%0%0%0%0%0%0%3%
Hermes 2 Theta 8B17%0%0%0%0%0%0%0%0%0%2%
Gemini Pro 1.50%0%0%0%0%0%0%0%0%0%0%
Toppy M 7B0%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
40.48%