N-Length Sentences

Write sentences with exactly N words

Write sentences with 5 words each

0-shot Rule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
Llama 3.2 90B (Vision)100%100%100%100%100%100%100%100%100%100%100%
Llama 3.1 Nemotron 70B100%100%100%100%100%100%100%100%100%100%100%
Llama 3 TenyxChat-DaybreakStorywriter 70B100%100%100%100%100%100%100%100%100%100%100%
o4 Mini High100%100%100%100%100%100%100%100%100%100%100%
Gemini 3 Pro (Preview)100%100%100%100%100%100%100%100%100%100%100%
Gemini 3 Flash (Preview)100%100%100%100%100%100%100%100%100%100%100%
Z.AI GLM 4.7100%100%100%100%100%100%100%100%100%99%100%
Llama 3.1 70B100%100%100%100%100%100%100%100%100%98%100%
Llama 3.1 8B100%100%100%100%100%100%100%100%100%98%100%
Llama 3.1 405B100%100%100%100%100%100%100%100%100%98%100%
o4 Mini100%100%100%100%100%100%100%100%100%98%100%
Llama 3.2 11B (Vision)100%100%100%100%100%100%100%100%100%98%100%
MoonshotAI: Kimi K2.5100%100%100%100%100%100%100%100%100%94%99%
GPT-4o, May 13th (temp=0)100%100%100%100%100%100%99%98%98%98%99%
Qwen 2 72B100%100%100%100%100%100%100%98%96%96%99%
Magnum v2 72B100%100%100%100%100%100%100%100%96%94%99%
Claude Opus 4.5100%100%100%100%100%100%100%100%100%89%99%
Gemini 2.5 Pro100%100%100%100%100%100%98%97%97%96%99%
Llama 3 70B100%100%100%100%100%100%100%100%97%91%99%
GPT-4o, May 13th (temp=1)100%100%100%100%100%100%98%97%96%96%99%
Magnum 72B100%100%100%100%100%100%97%96%96%96%98%
GPT-4o Mini (temp=0)100%98%98%98%98%98%98%98%98%96%98%
GPT-4.1 Mini100%100%100%100%99%98%98%97%96%93%98%
Claude Opus 4.6100%100%100%100%100%100%98%95%94%93%98%
Claude 3.5 Sonnet (new)100%100%100%100%100%100%100%100%100%79%98%
GPT-4o Mini (temp=1)100%100%100%100%100%100%96%95%94%93%98%
Llama 3 Euryale 70B v2.1100%100%100%100%100%98%96%96%94%94%98%
Claude Opus 4100%100%100%100%100%100%97%95%94%93%98%
Claude 3.7 Sonnet100%100%100%100%100%100%100%100%100%78%98%
Claude 3.5 Sonnet100%100%100%100%100%100%98%96%93%87%97%
GPT-4.1 Nano100%100%100%100%100%98%97%96%91%89%97%
Z.AI GLM 4.7 Flash100%100%100%100%100%100%100%96%87%80%96%
Llama 3.2 3B100%100%100%100%100%100%96%92%91%83%96%
Sao10K L3.1 70B Hanami x1100%100%99%98%97%97%96%96%94%84%96%
Z.AI GLM 4.6100%98%98%98%97%96%94%93%92%90%96%
Claude Sonnet 4100%100%100%100%97%97%93%92%91%86%96%
Claude Sonnet 4.5100%100%100%100%99%96%94%91%89%86%95%
GPT-4o, Aug. 6th (temp=1)100%100%99%97%96%95%94%93%91%89%95%
GPT-4o, Aug. 6th (temp=0)98%96%96%93%93%93%93%93%92%92%94%
GPT-4 Turbo100%100%100%99%98%94%94%91%89%75%94%
Claude 3.5 Haiku100%98%98%96%92%92%91%91%91%91%94%
Z.AI GLM 4.5100%100%100%98%98%98%89%87%82%81%93%
Claude Haiku 4.599%99%99%98%97%92%90%88%86%84%93%
GPT-4.1100%98%98%98%97%96%96%89%88%70%93%
Gemini 2.5 Flash Lite100%98%96%95%93%91%89%86%86%72%91%
Lumimaid v0.2 8B99%97%96%96%94%91%89%85%79%72%90%
Llama 3.1 Euryale 70B v2.2100%100%100%100%93%89%86%84%76%62%89%
Qwen 2.5 72B100%100%100%100%100%100%100%95%93%0%89%
Mistral Large96%95%93%93%89%89%83%82%79%73%87%
DeepSeek-V2 Chat100%93%91%89%89%82%82%80%78%72%86%
Claude 2.1100%100%96%96%96%96%96%91%52%28%85%
Gemini Pro 1.5100%100%91%91%87%87%82%73%70%67%85%
Gemini 2.5 Flash98%98%95%93%93%92%88%84%64%40%84%
Hermes 3 70B100%100%98%92%90%87%87%69%58%56%84%
Hermes 3 405B90%90%88%87%85%84%83%80%76%69%83%
Claude 3.0 Sonnet100%96%82%78%78%78%78%78%78%78%82%
Cohere Command R+ (Apr. 2024)93%92%92%92%92%92%92%70%58%52%82%
Hermes 2 Theta 8B96%94%91%87%87%84%82%73%61%52%81%
Writer: Palmyra X5100%98%95%90%83%82%78%76%70%11%78%
Qwen 2 7B100%100%96%89%83%82%79%78%57%2%77%
Phi-3.5 Mini 128k91%89%89%86%84%79%65%61%59%57%76%
Inflection 3 (Productivity)96%91%89%87%78%76%69%68%67%35%75%
Claude 2.0100%98%95%87%81%72%65%59%42%40%74%
Mistral Small Creative80%80%78%72%72%72%72%72%70%63%73%
Mistral Large 291%91%91%89%87%82%70%60%52%16%73%
Mistral Medium96%94%91%84%83%83%76%62%46%12%73%
AI21 Jamba 1.5 Large100%96%89%82%79%78%78%78%40%2%72%
Cohere Command R+ (Aug. 2024)94%89%78%78%76%76%65%53%51%36%70%
Gemini Flash 1.586%85%82%78%78%77%76%63%43%14%68%
Inflection 3 (PI)89%87%87%87%82%81%70%64%26%0%67%
Gemma 2 9B96%87%79%72%67%60%52%52%52%47%66%
WizardLM 2 8x22b91%89%89%85%81%68%64%50%23%10%65%
Claude 3 Haiku89%81%78%77%72%72%68%65%47%1%65%
Phi-3 Medium 128k87%87%85%82%72%67%46%42%33%28%63%
Ministral 8B100%100%78%78%64%56%45%36%32%20%61%
Gemma 2 27B84%71%68%62%61%57%55%52%52%40%60%
Rocinante 12B88%78%72%66%62%59%59%40%34%22%58%
Llama 3.2 1B100%96%85%70%56%53%53%49%2%0%56%
lzlv 70B87%87%83%80%78%59%27%22%22%0%55%
Phi-3 Mini 128k91%73%69%63%60%59%59%27%23%12%54%
EVA Qwen 2.5 14B89%87%67%65%56%56%43%36%11%1%51%
Ministral 3B71%64%64%64%52%52%40%37%20%0%46%
Liquid: LFM 40B MoE92%80%78%78%67%40%18%1%1%0%45%
AI21 Jamba 1.5 Mini96%94%91%76%67%17%1%1%0%0%44%
AI21 Jamba91%78%72%64%59%32%2%1%0%0%40%
Goliath 120B94%78%52%47%39%17%1%1%1%0%33%
Mistral NeMO72%47%45%37%35%22%21%17%1%1%30%
Toppy M 7B87%40%40%40%33%20%16%12%0%0%29%
Mistral Nemo 12B Celeste68%54%39%21%15%7%1%0%0%0%21%
MythoMist 7B59%52%39%39%1%1%0%0%0%0%19%
Fimbulvetr 11B v276%53%39%2%1%1%0%0%0%0%17%
MN GRAND Gutenberg Lyra4 12B Madness52%32%21%15%9%8%8%5%4%3%16%
MythoMax 13B64%0%0%0%0%0%0%0%0%0%6%
79.27%