N-Length Sentences

Write sentences with exactly N words

Write sentences with 10 words each

0-shot Rule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
Gemini 3 Pro (Preview)100%100%100%100%100%100%100%100%100%100%100%
MoonshotAI: Kimi K2.5100%100%100%100%100%100%100%100%100%100%100%
Gemini 3 Flash (Preview)100%100%100%100%100%100%100%100%100%98%100%
Claude Opus 4.5100%100%100%100%100%100%100%100%100%97%100%
Llama 3.1 405B100%100%100%100%100%100%100%100%97%97%99%
o4 Mini High100%100%100%100%100%100%100%97%96%96%99%
Claude Opus 4100%100%100%100%100%100%100%100%100%88%99%
o4 Mini100%100%100%100%100%100%97%97%97%97%99%
Llama 3 TenyxChat-DaybreakStorywriter 70B100%100%100%100%100%100%100%97%95%95%99%
Llama 3.1 Nemotron 70B100%100%100%100%100%97%97%97%95%95%98%
Claude 3.5 Sonnet (new)100%100%100%100%97%97%97%97%97%95%98%
Llama 3.2 90B (Vision)100%100%100%100%97%97%97%97%96%91%98%
Claude 3.5 Haiku100%100%100%100%97%97%95%95%95%92%97%
Claude Sonnet 4.5100%100%100%100%100%98%97%96%90%87%97%
Llama 3.1 70B100%100%100%100%97%97%97%95%87%87%96%
Llama 3.1 8B100%100%100%98%97%96%95%93%91%77%95%
Claude Opus 4.6100%100%100%100%100%100%99%98%76%74%95%
Llama 3.2 11B (Vision)100%100%100%100%100%100%96%94%80%77%95%
Llama 3.2 3B100%97%97%97%95%95%92%92%90%87%94%
Llama 3 70B100%97%97%95%95%95%92%92%90%87%94%
GPT-4o, Aug. 6th (temp=1)100%97%95%95%95%92%92%90%90%90%94%
Gemini 2.5 Pro100%97%96%96%95%95%93%88%88%84%93%
GPT-4o, Aug. 6th (temp=0)100%95%95%95%92%92%92%92%90%90%93%
GPT-4.1 Mini97%97%96%95%92%92%92%90%88%84%92%
GPT-4o Mini (temp=1)97%95%95%95%92%92%92%92%90%83%92%
Claude 3.7 Sonnet100%94%94%93%93%93%93%91%91%81%92%
GPT-4 Turbo98%97%94%94%91%91%88%88%87%87%92%
GPT-4.1100%95%95%95%92%92%92%90%83%79%91%
Claude 3.5 Sonnet100%100%97%95%92%90%90%86%83%78%91%
Claude Sonnet 4100%100%100%100%97%95%90%88%74%64%91%
Z.AI GLM 4.7100%100%100%100%100%100%100%90%74%39%90%
Z.AI GLM 4.7 Flash100%100%100%100%96%96%79%77%74%74%90%
GPT-4o Mini (temp=0)92%90%90%90%90%90%90%87%84%84%89%
GPT-4o, May 13th (temp=0)97%97%97%97%92%92%87%87%70%63%88%
Hermes 3 405B97%97%92%92%90%88%87%87%84%67%88%
GPT-4o, May 13th (temp=1)97%94%92%92%90%87%83%80%80%65%86%
Qwen 2.5 72B100%96%94%92%91%88%85%83%65%37%83%
Sao10K L3.1 70B Hanami x197%95%95%92%90%81%72%69%66%61%82%
Magnum v2 72B90%90%87%87%84%84%78%78%78%57%81%
Claude 3.0 Sonnet100%100%90%83%79%79%74%74%70%57%81%
Llama 3.1 Euryale 70B v2.295%93%93%86%79%77%75%75%70%60%80%
Claude Haiku 4.589%89%86%86%82%77%74%74%74%67%80%
Mistral Small Creative90%90%90%87%80%79%75%70%65%63%79%
Qwen 2 72B94%86%85%84%84%79%77%75%65%59%79%
Writer: Palmyra X590%89%87%79%78%77%73%73%54%48%75%
GPT-4.1 Nano83%83%82%80%79%77%72%68%66%55%74%
Claude 3 Haiku97%84%84%83%83%75%74%63%61%28%73%
AI21 Jamba 1.5 Large100%100%87%85%83%74%74%74%30%25%73%
Magnum 72B95%92%90%87%76%76%76%57%51%24%72%
Lumimaid v0.2 8B89%87%80%80%79%76%75%63%48%25%70%
Llama 3 Euryale 70B v2.188%87%87%87%83%83%74%65%42%0%70%
Z.AI GLM 4.6100%96%90%80%77%70%55%42%35%18%66%
Phi-3 Medium 128k80%78%77%71%67%61%59%56%53%52%66%
Cohere Command R+ (Aug. 2024)95%84%81%70%70%63%56%56%18%10%60%
Phi-3.5 Mini 128k76%73%70%69%65%63%56%55%49%26%60%
Gemini 2.5 Flash Lite88%78%76%61%61%60%50%39%37%22%57%
WizardLM 2 8x22b77%74%66%66%59%57%54%52%46%19%57%
Gemma 2 9B83%62%60%59%57%55%43%42%40%25%52%
Inflection 3 (PI)68%64%63%58%56%52%50%48%29%17%50%
Qwen 2 7B84%74%74%62%61%44%29%20%19%16%48%
Gemini 2.5 Flash69%62%62%54%52%50%45%41%27%14%48%
Llama 3.2 1B100%79%74%61%46%40%32%30%7%1%47%
Cohere Command R+ (Apr. 2024)83%70%70%70%53%35%35%11%9%1%44%
Phi-3 Mini 128k80%59%57%56%45%40%34%31%23%0%42%
Gemma 2 27B82%48%45%43%43%40%39%32%21%21%41%
Claude 2.170%70%61%40%38%33%27%26%25%13%40%
AI21 Jamba 1.5 Mini74%74%68%59%52%30%30%7%7%0%40%
Mistral Large 281%70%57%46%43%28%19%19%17%14%39%
Claude 2.074%71%59%55%44%39%16%14%10%5%39%
Toppy M 7B62%57%48%42%42%37%32%23%21%0%36%
Gemini Pro 1.555%49%47%41%37%35%34%31%28%0%36%
Gemini Flash 1.586%57%45%39%31%30%28%23%13%5%36%
Hermes 3 70B81%57%42%41%40%33%24%8%7%6%34%
Mistral Large65%51%51%42%38%26%21%21%16%4%33%
Z.AI GLM 4.556%54%48%39%39%39%25%11%10%5%33%
Inflection 3 (Productivity)56%53%52%46%46%31%12%11%9%7%32%
EVA Qwen 2.5 14B67%61%61%34%31%31%10%6%4%0%30%
Mistral Medium73%53%50%35%34%18%9%5%2%2%28%
lzlv 70B60%52%52%45%28%26%9%5%1%0%28%
Mistral NeMO57%48%45%24%24%15%14%11%10%9%26%
Ministral 3B39%32%31%30%30%27%26%21%7%6%25%
AI21 Jamba62%53%52%32%22%6%5%3%3%0%24%
Mistral Nemo 12B Celeste34%31%28%27%23%22%19%19%19%15%24%
Rocinante 12B56%43%35%32%26%21%14%3%1%0%23%
Fimbulvetr 11B v250%46%32%28%24%21%14%2%1%0%22%
Ministral 8B43%36%34%28%28%16%15%10%0%0%21%
MythoMist 7B51%30%27%26%19%18%9%9%0%0%19%
Goliath 120B58%51%25%24%24%3%1%0%0%0%18%
MN GRAND Gutenberg Lyra4 12B Madness31%26%20%20%19%18%13%13%12%7%18%
Hermes 2 Theta 8B36%34%26%21%15%7%3%1%1%1%14%
DeepSeek-V2 Chat25%23%18%18%15%13%10%8%7%4%14%
Liquid: LFM 40B MoE46%30%20%15%7%4%1%1%1%0%13%
MythoMax 13B37%15%4%2%1%0%0%0%0%0%6%
65.02%