microsoft/wizardlm-2-8x22b

WizardLM 2 8x22b

Release Date

Apr 15th, 2024

Context Size

65k

Reasoning

No

Benchmark Cost

$2.37

Speed

206.8 tok/s

Categories

20%40%60%80%100%Creative Writing79.1%Tooling90.3%Language78.1%Utility67.1%Reasoning67.4%Text Editing88.1%Rule Following28.3%Hallucination70.2%

Subcategories

20%40%60%80%100%AI-ismsProse VarietyDialoguePurple ProseMechanical StyleClichésXMLComprehensionGenerationWord CountingSentence CountingParagraph CountingStructural CountingData ExtractionDeductionAttentionTransformationPreservationStructural IntegrityConstraint AdherenceFalse PositivesContent InventionOutput Corruption

Bad Writing Habits

Detects common prose quality anti-patterns in AI-generated creative writing, including passive voice, past progressive overuse, weak dialogue tags, filter words, purple prose, cliches, AI-ism words/adverbs/names, and more.

Scenario #1 #2 #3 #4 #5 Total
Detailed Writing Rules
807976757577%
878484818183%
858584828284%
908785838285%
858484818183%
908382827983%
Detailed Writing Rules82.57%
genre
727066626066%
706969696669%
827776706975%
787776696673%
696867666567%
908280766278%
genre71.32%
Novelcrafter Default Prompt
838077757077%
888484777481%
818080797980%
858281817681%
868280797981%
858381807581%
Novelcrafter Default Prompt80.18%
78.02%

Codex Extraction

Evaluates a model's ability to extract structured codex entries (characters, locations, objects, lore) from prose passages and return them as well-formed XML.

Scenario #1 #2 #3 #4 #5 Total
979595938994%
969694939394%
969292898891%
999896938995%
93.62%

Text Replacement

Tests deterministic text transformations: renaming characters/locations, expanding contractions, tense rewriting, POV shifts, gender swaps, combined transformations, and word avoidance. Scored by checking each expected change independently.

Scenario #1 #2 #3 #4 #5 #6 #7 Total
Generic Prompt
100100100100100100100100%
100100100100100100100100%
9998979696959396%
100100100100100100100100%
100100100100100100100100%
10074747474747477%
949392867227367%
1001001001006866076%
9695928985858289%
Generic Prompt89.56%
Specific Prompt
100100100100100100100100%
100100100100100100100100%
9999999999999999%
100100100100100100100100%
100100100100100100100100%
100100100100100100100100%
9897969695949395%
100100100100100100100100%
100100100100100100100100%
Specific Prompt99.43%
94.49%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total
1001001001001001001001001006797%
96.67%