Categories

NC Bench evaluates models across 8 categories and 23 subcategories.

Category Distribution

Shows the number of scenarios in each category. Some scenarios may be in multiple categories.

Tooling (13)
Creative Writing (18)
Language (9)
Utility (32)
Reasoning (20)
Text Editing (18)
Rule Following (12)
Hallucination (28)

Creative Writing

18 scenarios · 6 subcategories

Top Models

87.20% GPT-5.1
86.93% Qwen 3.5 397B A17B
86.87% GPT-5

Subcategories

79.49% AI-isms
67.74% Prose Variety
74.35% Dialogue
87.65% Purple Prose
84.74% Mechanical Style
77.40% Clichés

Tooling

13 scenarios · 1 subcategory

Subcategories

95.16% XML

Language

9 scenarios · 2 subcategories

Subcategories

81.76% Comprehension
84.10% Generation

Utility

32 scenarios · 5 subcategories

Reasoning

20 scenarios · 2 subcategories

Text Editing

18 scenarios · 3 subcategories

Top Models

99.13% Claude Sonnet 4
99.02% Claude Sonnet 4.5
98.90% GPT-5

Subcategories

81.25% Transformation
92.40% Preservation
98.02% Structural Integrity

Rule Following

12 scenarios · 1 subcategory

Hallucination

28 scenarios · 3 subcategories