N-Length Sentences

Write sentences with exactly N words

Write sentences with 20 words each

0-shot Rule following
Model Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
MoonshotAI: Kimi K2.5100%100%100%100%100%100%100%100%94%87%98%
Gemini 3 Pro (Preview)100%100%100%100%100%100%100%100%93%80%97%
Gemini 3 Flash (Preview)100%100%100%100%100%95%95%95%94%90%97%
o4 Mini High100%100%100%100%100%100%95%92%87%83%96%
o4 Mini100%100%100%100%100%100%92%92%84%83%95%
Z.AI GLM 4.7100%100%100%100%100%100%100%100%100%51%95%
Z.AI GLM 4.7 Flash100%100%100%94%93%90%84%74%74%60%87%
Llama 3.1 405B100%100%92%92%84%84%84%76%70%69%85%
Claude Opus 4.596%92%88%88%84%80%80%75%74%60%82%
GPT-4.193%92%92%89%87%85%84%60%59%57%80%
Llama 3.2 3B100%100%92%87%76%76%72%67%60%49%78%
GPT-4o, May 13th (temp=0)100%100%90%88%84%80%69%64%44%18%74%
Llama 3 70B100%93%84%83%76%70%69%69%47%32%72%
Claude 3.5 Sonnet (new)92%84%76%76%76%69%69%67%58%52%72%
GPT-4o Mini (temp=0)72%72%72%72%72%72%72%72%72%68%71%
Gemini 2.5 Pro89%87%80%74%74%72%69%64%40%40%69%
Llama 3.2 90B (Vision)95%83%80%80%76%67%61%54%53%34%68%
Claude Opus 483%75%70%70%66%64%61%61%61%53%66%
GPT-4o Mini (temp=1)100%80%80%72%70%65%59%55%34%29%64%
GPT-4o, May 13th (temp=1)92%87%80%74%72%57%51%48%39%13%61%
Llama 3 TenyxChat-DaybreakStorywriter 70B76%70%69%69%59%59%58%55%50%38%60%
Qwen 2.5 72B100%61%61%61%61%61%61%61%40%35%60%
GPT-4.1 Nano76%76%76%75%64%59%59%57%30%26%60%
Claude Opus 4.6100%93%80%61%55%47%47%37%37%32%59%
Llama 3.1 70B74%72%65%61%60%59%51%46%43%38%57%
Sao10K L3.1 70B Hanami x176%75%74%58%52%46%44%40%36%25%53%
Claude Sonnet 4.584%69%69%62%60%58%44%42%17%2%51%
GPT-4.1 Mini87%82%67%59%59%42%38%34%32%4%50%
Claude Sonnet 483%77%72%72%66%60%23%15%12%3%48%
Qwen 2 72B72%67%63%59%55%47%36%34%24%12%47%
Claude 3.5 Haiku84%83%76%54%49%47%27%16%15%3%45%
Llama 3.1 8B87%84%74%48%45%33%30%25%21%5%45%
Llama 3.1 Euryale 70B v2.283%74%66%64%59%57%47%1%0%0%45%
Claude 3.7 Sonnet59%53%52%52%50%47%45%27%24%4%41%
Llama 3.1 Nemotron 70B66%55%52%45%43%37%30%29%27%27%41%
GPT-4 Turbo57%49%45%43%43%38%38%37%27%22%40%
Writer: Palmyra X567%62%53%45%42%37%22%18%8%0%35%
GPT-4o, Aug. 6th (temp=1)66%52%50%41%36%34%29%22%14%11%35%
Llama 3.2 11B (Vision)86%59%42%40%39%39%14%12%9%2%34%
Llama 3 Euryale 70B v2.156%50%44%37%37%37%22%14%10%7%31%
Claude 3.5 Sonnet62%54%53%44%19%18%16%13%6%1%28%
Claude Haiku 4.571%66%52%50%33%10%0%0%0%0%28%
Gemma 2 9B47%45%44%42%31%18%15%14%3%0%26%
Magnum 72B61%57%43%41%29%12%9%1%0%0%25%
Qwen 2 7B100%61%29%27%14%14%1%1%0%0%25%
Magnum v2 72B49%49%39%38%32%19%14%4%0%0%24%
Gemma 2 27B44%43%39%28%27%20%14%12%5%0%23%
Mistral Small Creative44%40%30%28%22%21%16%13%3%0%22%
Llama 3.2 1B69%61%35%15%15%15%5%0%0%0%21%
Claude 2.035%33%28%27%26%14%12%12%10%10%21%
MythoMist 7B100%37%15%12%7%6%3%1%0%0%18%
Gemini Pro 1.544%30%28%25%25%15%4%3%0%0%18%
Inflection 3 (Productivity)40%30%24%20%18%13%12%11%0%0%17%
Inflection 3 (PI)37%33%32%23%16%14%7%3%0%0%16%
Gemini Flash 1.527%25%20%20%17%13%13%13%12%0%16%
GPT-4o, Aug. 6th (temp=0)25%21%17%15%13%13%12%12%12%11%15%
Z.AI GLM 4.659%36%13%6%5%5%2%1%0%0%13%
Toppy M 7B36%32%22%20%12%2%0%0%0%0%13%
Lumimaid v0.2 8B52%30%14%12%10%5%0%0%0%0%12%
Gemini 2.5 Flash Lite57%34%4%4%2%0%0%0%0%0%10%
WizardLM 2 8x22b41%21%19%5%5%1%1%0%0%0%9%
Phi-3 Medium 128k32%15%15%10%7%5%3%0%0%0%9%
MN GRAND Gutenberg Lyra4 12B Madness17%16%11%10%10%7%6%6%2%0%9%
AI21 Jamba 1.5 Mini61%20%5%0%0%0%0%0%0%0%9%
Ministral 3B32%19%15%11%4%0%0%0%0%0%8%
Hermes 3 70B28%16%15%8%7%7%0%0%0%0%8%
lzlv 70B25%20%18%7%7%0%0%0%0%0%8%
Hermes 2 Theta 8B32%18%15%8%3%2%0%0%0%0%8%
Claude 3 Haiku27%15%13%10%4%0%0%0%0%0%7%
Phi-3 Mini 128k23%20%8%8%3%1%1%0%0%0%7%
Fimbulvetr 11B v219%10%9%8%7%2%0%0%0%0%6%
Cohere Command R+ (Aug. 2024)25%19%3%0%0%0%0%0%0%0%5%
Mistral Nemo 12B Celeste12%12%11%9%0%0%0%0%0%0%4%
Liquid: LFM 40B MoE25%14%1%0%0%0%0%0%0%0%4%
Hermes 3 405B20%16%0%0%0%0%0%0%0%0%4%
Goliath 120B15%9%5%3%0%0%0%0%0%0%3%
EVA Qwen 2.5 14B21%12%0%0%0%0%0%0%0%0%3%
Claude 2.115%10%5%0%0%0%0%0%0%0%3%
Phi-3.5 Mini 128k15%14%0%0%0%0%0%0%0%0%3%
Cohere Command R+ (Apr. 2024)16%12%0%0%0%0%0%0%0%0%3%
Ministral 8B13%7%2%2%1%0%0%0%0%0%2%
AI21 Jamba15%5%3%0%0%0%0%0%0%0%2%
Mistral Medium8%3%2%0%0%0%0%0%0%0%1%
MythoMax 13B3%3%2%0%0%0%0%0%0%0%1%
Mistral Large6%0%0%0%0%0%0%0%0%0%1%
Mistral NeMO2%0%0%0%0%0%0%0%0%0%0%
Claude 3.0 Sonnet1%0%0%0%0%0%0%0%0%0%0%
Gemini 2.5 Flash0%0%0%0%0%0%0%0%0%0%0%
Rocinante 12B0%0%0%0%0%0%0%0%0%0%0%
DeepSeek-V2 Chat0%0%0%0%0%0%0%0%0%0%0%
Z.AI GLM 4.50%0%0%0%0%0%0%0%0%0%0%
AI21 Jamba 1.5 Large0%0%0%0%0%0%0%0%0%0%0%
Mistral Large 20%0%0%0%0%0%0%0%0%0%0%
32.96%