Dialogue tags

Various tasks related to dialogue tags in text.

Write 500 words with 30% dialogue

0-shot Creative writingRule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
o4 Mini High100%100%100%100%89%89%85%50%50%25%79%
o4 Mini100%100%100%80%76%61%55%53%51%43%72%
Claude Opus 4.596%65%65%60%58%52%50%50%50%47%59%
Claude Opus 4.690%72%50%50%40%18%18%1%1%1%34%
MoonshotAI: Kimi K2.594%91%55%39%27%27%0%0%0%0%33%
Claude 3.7 Sonnet50%50%49%43%43%41%38%18%0%0%33%
GPT-4o, Aug. 6th (temp=0)50%43%43%43%43%41%34%5%0%0%30%
Inflection 3 (Productivity)98%97%52%26%10%8%7%0%0%0%30%
Claude Opus 450%50%50%49%34%34%22%5%0%0%29%
Claude 3.0 Sonnet52%51%50%49%42%26%15%0%0%0%28%
Claude Sonnet 487%54%50%42%17%9%8%5%0%0%27%
Toppy M 7B50%47%45%43%42%42%1%0%0%0%27%
Z.AI GLM 4.767%50%43%43%28%19%14%3%0%0%27%
lzlv 70B49%45%44%43%40%29%15%2%1%0%27%
Gemini 3 Flash (Preview)50%50%50%49%34%30%5%0%0%0%27%
Hermes 2 Theta 8B49%48%44%43%42%28%0%0%0%0%26%
Gemini 2.5 Pro98%94%41%14%2%0%0%0%0%0%25%
GPT-4o Mini (temp=0)50%50%47%22%22%18%14%14%10%0%24%
Mistral Large49%49%45%36%35%28%0%0%0%0%24%
Hermes 3 405B50%50%45%38%30%14%10%0%0%0%24%
Inflection 3 (PI)97%53%49%13%3%3%1%0%0%0%22%
Claude 3.5 Sonnet (new)49%47%47%41%18%7%3%3%1%1%22%
Claude Haiku 4.550%49%41%38%34%3%1%0%0%0%22%
AI21 Jamba 1.5 Large50%45%41%27%23%2%1%0%0%0%19%
Llama 3.2 11B (Vision)49%45%34%22%18%18%0%0%0%0%19%
Claude 2.050%50%38%26%10%0%0%0%0%0%17%
Gemini 2.5 Flash Lite50%47%45%30%0%0%0%0%0%0%17%
Mistral Medium50%48%47%18%4%1%1%0%0%0%17%
Claude 3 Haiku79%71%15%3%1%0%0%0%0%0%17%
Gemini 3 Pro (Preview)50%45%43%27%0%0%0%0%0%0%17%
Claude Sonnet 4.547%38%37%11%7%7%6%5%0%0%16%
Z.AI GLM 4.7 Flash50%50%45%1%0%0%0%0%0%0%15%
AI21 Jamba50%50%44%1%1%0%0%0%0%0%15%
GPT-4o, May 13th (temp=0)45%23%22%16%15%10%8%3%1%0%14%
Ministral 8B49%48%41%0%0%0%0%0%0%0%14%
Llama 3.2 90B (Vision)87%48%0%0%0%0%0%0%0%0%13%
Llama 3 Euryale 70B v2.154%51%12%4%3%1%1%0%0%0%13%
Phi-3.5 Mini 128k44%43%26%10%1%0%0%0%0%0%12%
Llama 3.2 3B45%41%34%0%0%0%0%0%0%0%12%
Phi-3 Medium 128k35%33%22%11%10%1%0%0%0%0%11%
Cohere Command R+ (Apr. 2024)50%49%7%3%0%0%0%0%0%0%11%
MythoMax 13B51%50%5%0%0%0%0%0%0%0%11%
Llama 3.1 Euryale 70B v2.250%42%14%0%0%0%0%0%0%0%11%
Gemma 2 27B52%50%0%0%0%0%0%0%0%0%10%
AI21 Jamba 1.5 Mini48%43%10%0%0%0%0%0%0%0%10%
GPT-4o, May 13th (temp=1)47%31%13%6%4%0%0%0%0%0%10%
Hermes 3 70B41%31%26%1%0%0%0%0%0%0%10%
MN GRAND Gutenberg Lyra4 12B Madness50%49%0%0%0%0%0%0%0%0%10%
GPT-4 Turbo38%25%21%7%2%1%0%0%0%0%9%
GPT-4o, Aug. 6th (temp=1)43%29%14%2%0%0%0%0%0%0%9%
Llama 3.1 405B50%34%0%0%0%0%0%0%0%0%8%
Qwen 2 72B45%34%0%0%0%0%0%0%0%0%8%
Mistral NeMO44%26%6%2%0%0%0%0%0%0%8%
DeepSeek-V2 Chat49%26%2%1%0%0%0%0%0%0%8%
Cohere Command R+ (Aug. 2024)41%30%5%0%0%0%0%0%0%0%8%
Llama 3.2 1B49%24%1%0%0%0%0%0%0%0%7%
Z.AI GLM 4.667%3%0%0%0%0%0%0%0%0%7%
Ministral 3B67%3%0%0%0%0%0%0%0%0%7%
GPT-4.1 Mini45%11%7%0%0%0%0%0%0%0%6%
Phi-3 Mini 128k48%7%1%0%0%0%0%0%0%0%6%
Llama 3.1 8B53%2%0%0%0%0%0%0%0%0%5%
Mistral Nemo 12B Celeste26%26%1%1%0%0%0%0%0%0%5%
WizardLM 2 8x22b49%2%2%0%0%0%0%0%0%0%5%
Gemini Flash 1.549%2%1%1%0%0%0%0%0%0%5%
MythoMist 7B49%3%0%0%0%0%0%0%0%0%5%
Gemini 2.5 Flash50%1%0%0%0%0%0%0%0%0%5%
Claude 3.5 Haiku20%19%9%1%0%0%0%0%0%0%5%
GPT-4o Mini (temp=1)41%7%2%0%0%0%0%0%0%0%5%
Qwen 2.5 72B47%2%1%0%0%0%0%0%0%0%5%
Rocinante 12B43%1%0%0%0%0%0%0%0%0%4%
Llama 3 70B43%0%0%0%0%0%0%0%0%0%4%
Qwen 2 7B39%2%1%0%0%0%0%0%0%0%4%
GPT-4.141%1%0%0%0%0%0%0%0%0%4%
Llama 3.1 Nemotron 70B38%0%0%0%0%0%0%0%0%0%4%
Magnum v2 72B36%1%0%0%0%0%0%0%0%0%4%
Llama 3.1 70B18%7%1%1%0%0%0%0%0%0%3%
Goliath 120B24%1%0%0%0%0%0%0%0%0%3%
Mistral Large 226%0%0%0%0%0%0%0%0%0%3%
Gemini Pro 1.521%0%0%0%0%0%0%0%0%0%2%
Liquid: LFM 40B MoE10%9%0%0%0%0%0%0%0%0%2%
Claude 3.5 Sonnet10%5%1%1%0%0%0%0%0%0%2%
Fimbulvetr 11B v214%1%0%0%0%0%0%0%0%0%2%
EVA Qwen 2.5 14B14%1%0%0%0%0%0%0%0%0%1%
Z.AI GLM 4.514%0%0%0%0%0%0%0%0%0%1%
Lumimaid v0.2 8B6%0%0%0%0%0%0%0%0%0%1%
Mistral Small Creative3%0%0%0%0%0%0%0%0%0%0%
Magnum 72B2%0%0%0%0%0%0%0%0%0%0%
Sao10K L3.1 70B Hanami x11%0%0%0%0%0%0%0%0%0%0%
Writer: Palmyra X50%0%0%0%0%0%0%0%0%0%0%
Llama 3 TenyxChat-DaybreakStorywriter 70B0%0%0%0%0%0%0%0%0%0%0%
Gemma 2 9B0%0%0%0%0%0%0%0%0%0%0%
GPT-4.1 Nano0%0%0%0%0%0%0%0%0%0%0%
Claude 2.10%0%0%0%0%0%0%0%0%0%0%
13.67%