openai/gpt-oss-120b

GPT-OSS 120B

Release Date

Aug 5th, 2025

Context Size

131k

Reasoning

Yes

Benchmark Cost

$0.72

Speed

96.8 tok/s

Categories

20%40%60%80%100%Creative Writing67.9%Tooling100.0%Language97.2%Utility92.0%Reasoning92.4%Text Editing91.7%Rule Following55.0%Hallucination95.3%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
747268636268%
716866656367%
777473716973%
757169676569%
727170706870%
746868676568%
Detailed Writing Rules69.20%
genre
666563636063%
656461605962%
726866656367%
706766656467%
686867656065%
676563635763%
genre64.43%
Novelcrafter Default Prompt
696766636065%
666562615863%
737270676670%
747372706972%
747170706871%
727169676468%
Novelcrafter Default Prompt68.00%
67.21%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
959492929193%
979796929195%
939390898891%
979795959596%
93.48%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
10010010010089898995%
100100100100100100100100%
9999999696969597%
10010010010010010099100%
100100100100100100100100%
9999999999999999%
9492929291907990%
10010010010010010099100%
9797959595898894%
Generic Prompt97.23%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
1001001009999979699%
10010010010010010099100%
100100100100100100100100%
9595959595959595%
9898979695525284%
100100100100100100100100%
10096968989898592%
Specific Prompt96.61%
96.92%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
100100100100100100100100100100100%
100.00%