Mechanical Style

Subcategory of Creative Writing. 155 models scored.

Model Leaderboard

All models ranked by their Mechanical Style subcategory score.

#	Model	Mechanical Style	Creative Writing	Overall
1	GPT-5.5 (Reasoning)	99.27%	90.26%	92.98%
2	GPT-5.5 (Reasoning, Low)	99.26%	90.24%	92.59%
3	GPT-5.5	99.17%	90.39%	89.09%
4	Qwen3.6 Max Preview	99.15%	88.42%	94.54%
5	GPT-5.4 Mini (Reasoning)	98.70%	88.66%	90.65%
6	GPT-5.4 Mini	98.21%	88.10%	82.43%
7	GPT-5.4 Mini (Reasoning, Low)	98.16%	87.72%	85.75%
8	GPT-5.4 (Reasoning)	97.97%	91.17%	93.24%
9	Qwen 3.6 Flash	97.83%	86.02%	90.65%
10	GPT-5.4 (Reasoning, Low)	97.75%	90.51%	91.41%
11	Qwen 3.5 397B A17B	97.50%	86.93%	91.73%
12	GPT-5.4	97.49%	90.94%	84.32%
13	Qwen 3.6 35B	97.44%	85.97%	89.05%
14	Gemini 3.1 Pro (Preview)	97.27%	85.44%	94.37%
15	Qwen 3.6 27B	96.84%	82.81%	89.72%
16	Qwen 3.5 Plus (2026-04-20)	96.76%	85.18%	91.51%
17	Qwen3.7 Max	96.29%	85.39%	95.75%
18	Qwen 3.5 9B	96.17%	84.35%	86.05%
19	GPT-4o, May 13th (temp=0)	95.74%	74.89%	85.36%
20	Qwen 3.5 122B	95.52%	83.02%	91.53%
21	Qwen 2.5 72B	95.34%	75.16%	75.46%
22	Qwen 3.5 Flash	95.18%	83.81%	86.38%
23	Gemini 2.5 Flash (Reasoning)	94.83%	76.30%	86.51%
24	Gemini 2.5 Flash	94.77%	77.57%	80.60%
25	Mistral NeMO	94.70%	76.72%	65.04%
26	Qwen 3.5 27B	94.50%	82.54%	90.85%
27	GPT-5.1	93.87%	87.20%	92.54%
28	Gemini 2.5 Flash Lite	93.78%	75.05%	81.08%
29	GPT-5	93.65%	86.87%	91.93%
30	Grok 4.3 (Reasoning)	93.58%	85.11%	93.60%
31	Qwen 3.5 35B	93.58%	83.51%	88.00%
32	GPT-4o, Aug. 6th (temp=0)	93.39%	73.65%	82.45%
33	Grok 4.20 (Reasoning)	92.86%	86.25%	91.39%
34	Grok 4.20 (Beta, Reasoning)	92.70%	84.50%	91.49%
35	Qwen 3.5 Plus (2026-02-15)	92.52%	77.07%	85.96%
36	GPT-4o Mini (temp=0)	92.33%	73.10%	78.29%
37	Hermes 3 405B	92.33%	80.92%	82.86%
38	MoonshotAI: Kimi K2.6	91.73%	85.47%	92.31%
39	Grok 4.20 (Beta)	91.63%	82.80%	83.85%
40	Grok 4.3	91.18%	84.51%	78.66%
41	GPT-5.2	90.94%	80.36%	90.26%
42	Gemini 2.5 Pro	90.61%	81.03%	88.53%
43	Inception Mercury 2	89.69%	68.31%	83.85%
44	Inception Mercury	89.59%	69.99%	79.50%
45	Skyfall 36B V2	89.59%	83.32%	65.76%
46	ByteDance Seed 2.0 Lite	89.48%	82.35%	84.80%
47	Grok 4.20	89.30%	83.44%	81.70%
48	Gemini 3.5 Flash (Reasoning)	89.17%	79.87%	94.08%
49	Cohere Command R+ (Aug. 2024)	89.16%	77.70%	69.03%
50	Gemma 3 27B	88.98%	78.79%	77.85%
51	Claude 3 Haiku	88.96%	74.53%	71.19%
52	GPT-5.4 Nano (Reasoning, Low)	88.80%	80.93%	79.48%
53	Stealth: Aurora Alpha	88.68%	67.54%	83.79%
54	GPT-5.4 Nano	88.53%	80.50%	74.40%
55	o4 Mini High	88.52%	82.72%	90.29%
56	Mistral Small 3.2 24B	88.45%	71.87%	78.58%
57	GPT-5.4 Nano (Reasoning)	88.12%	80.97%	81.36%
58	Hermes 3 70B	88.06%	77.41%	72.57%
59	Z.AI GLM 4.6	87.76%	78.86%	89.11%
60	o4 Mini	87.50%	82.04%	88.35%
61	DeepSeek V3.2	87.42%	79.95%	82.25%
62	Gemini 2.5 Flash Lite (Reasoning)	87.33%	71.64%	85.75%
63	Claude Sonnet 4.5	86.97%	84.19%	88.03%
64	GPT-4o, May 13th (temp=1)	86.96%	75.88%	83.80%
65	WizardLM 2 8x22b	86.84%	79.06%	71.06%
66	Gemini 3.5 Flash (Reasoning, Minimal)	86.81%	78.55%	86.47%
67	Aion 2.0	86.75%	80.24%	89.21%
68	Xiaomi MIMO v2.5 Pro	86.42%	81.08%	87.36%
69	Z.AI GLM 5 Turbo	86.19%	84.66%	94.27%
70	MoonshotAI: Kimi K2.5	86.00%	81.35%	91.04%
71	DeepSeek V4 Flash	85.70%	83.42%	82.02%
72	Claude Opus 4.8 (Reasoning, Low)	85.67%	85.86%	92.14%
73	Claude 3.5 Sonnet	85.61%	78.69%	84.24%
74	Claude Opus 4.8 (Reasoning)	84.99%	85.25%	92.22%
75	Gemma 4 31B (Reasoning)	84.94%	78.13%	91.71%
76	MiniMax M2.7	84.86%	81.70%	89.10%
77	Claude Opus 4	84.86%	83.79%	87.69%
78	GPT-5 Mini	84.52%	80.48%	92.62%
79	DeepSeek V3.1	84.49%	77.45%	82.39%
80	DeepSeek V4 Pro (Reasoning)	84.41%	82.99%	90.10%
81	MiniMax M3	84.36%	84.57%	90.88%
82	Z.AI GLM 4.7	84.23%	78.89%	88.69%
83	Grok 4.1 Fast	84.23%	82.14%	89.55%
84	Gemma 3 12B	84.22%	75.38%	78.41%
85	Nemotron 3 Nano	84.19%	65.87%	77.73%
86	GPT-OSS 120B	84.17%	67.85%	86.44%
87	Z.AI GLM 4.5 Air	84.17%	74.61%	83.12%
88	Claude Sonnet 4	84.00%	79.21%	88.72%
89	Llama 3.1 Nemotron 70B	83.82%	71.71%	74.70%
90	Z.AI GLM 5	83.75%	83.63%	91.23%
91	LFM2 24B	83.73%	78.10%	58.77%
92	Z.AI GLM 5.1	83.70%	84.05%	94.37%
93	Arcee AI: Trinity Large (Preview)	83.61%	75.26%	73.33%
94	Claude Opus 4.5	83.45%	81.71%	89.69%
95	DeepSeek V3 (2024-12-26)	83.43%	77.88%	83.68%
96	Arcee AI: Trinity Mini	83.32%	74.01%	70.90%
97	DeepSeek V4 Pro	83.03%	83.70%	82.63%
98	Z.AI GLM 4.5	83.00%	76.56%	86.27%
99	Llama 3.1 70B	82.99%	72.78%	78.40%
100	DeepSeek-V2 Chat	82.94%	77.20%	84.83%
101	Claude Opus 4.6 (Reasoning)	82.92%	84.55%	95.02%
102	Gemma 4 26B (Reasoning)	82.87%	76.38%	91.49%
103	Claude Opus 4.6	82.67%	83.59%	92.35%
104	Rocinante 12B	82.57%	81.94%	54.54%
105	Xiaomi MIMO v2.5	82.42%	79.16%	85.05%
106	Stealth: Healer Alpha	82.39%	78.28%	85.93%
107	Gemma 4 26B	81.90%	75.17%	85.84%
108	Stealth: Hunter Alpha	81.86%	79.18%	87.34%
109	Gemma 4 31B	81.81%	75.59%	86.91%
110	Nemotron 3 Super	81.62%	69.75%	84.56%
111	Gemini 3 Flash (Preview, Reasoning)	81.52%	75.87%	90.50%
112	DeepSeek V4 Flash (Reasoning)	81.27%	83.03%	89.01%
113	GPT-4o Mini (temp=1)	81.11%	74.37%	79.08%
114	Mistral Large	81.07%	82.02%	80.15%
115	Llama 3.1 8B	81.03%	76.54%	63.35%
116	GPT-4.1	80.93%	81.24%	88.68%
117	Grok 4	80.92%	77.34%	88.12%
118	GPT-4o, Aug. 6th (temp=1)	80.80%	75.50%	82.62%
119	Mistral Large 2	80.68%	81.86%	82.41%
120	GPT-4.1 Mini	80.59%	74.52%	83.20%
121	Claude 3.7 Sonnet	80.55%	76.31%	83.39%
122	Gemini 3 Pro (Preview)	80.39%	77.77%	88.79%
123	Claude Opus 4.7	80.25%	84.74%	89.93%
124	Grok 4 Fast	80.12%	77.03%	86.15%
125	ByteDance Seed 2.0 Mini	80.08%	80.11%	86.91%
126	Mistral Large 3	80.06%	81.21%	85.43%
127	Qwen3 235B A22B Instruct 2507	79.99%	84.81%	80.10%
128	Writer: Palmyra X5	79.92%	83.95%	79.57%
129	DeepSeek V3 (2025-03-24)	79.89%	82.34%	81.99%
130	Claude Opus 4.7 (Reasoning)	79.84%	84.73%	93.23%
131	Qwen 3 32B	79.28%	81.30%	82.21%
132	Mistral Small 4 (Reasoning)	79.13%	81.67%	82.39%
133	Claude Sonnet 4.6 (Reasoning)	78.68%	83.09%	93.66%
134	Z.AI GLM 4.7 Flash	78.53%	77.36%	84.82%
135	Gemini 3.1 Flash Lite (Reasoning)	78.50%	76.31%	86.41%
136	MiniMax M2.5	78.48%	81.21%	88.71%
137	Gemini 3.1 Flash Lite (Preview)	78.41%	75.78%	85.87%
138	Claude Haiku 4.5	78.41%	78.96%	85.14%
139	Claude Sonnet 4.6	78.39%	83.31%	91.15%
140	Mistral Small 4	78.11%	81.12%	76.46%
141	Gemma 3 4B	78.08%	72.10%	68.57%
142	Cydonia 24B V4.1	77.93%	74.19%	75.09%
143	GPT-5 Nano	77.89%	67.04%	82.60%
144	Gemini 3.1 Flash Lite	77.70%	76.01%	85.75%
145	Ministral 3 3B	77.57%	75.45%	67.22%
146	ByteDance Seed 1.6 Flash	77.28%	81.51%	73.27%
147	Mistral Medium 3.1	77.16%	81.70%	77.83%
148	Mistral Small Creative	76.63%	80.29%	73.27%
149	Gemini 3 Flash (Preview)	76.32%	75.04%	85.35%
150	GPT-4.1 Nano	76.30%	71.81%	71.94%
151	ByteDance Seed 1.6	74.93%	78.43%	90.70%
152	Ministral 3 8B	74.38%	77.26%	71.76%
153	Ministral 8B	73.33%	76.87%	64.87%
154	Ministral 3B	73.22%	75.49%	61.29%
155	Ministral 3 14B	73.18%	79.11%	72.54%