Run: 2026-02-17-01KHMME9GF16ZWQDVCZ5W9YSMY
Run Details
- Model:
- WizardLM 2 8x22b
- Test:
- Bad Writing Habits
- Scenario:
- Thriller: chase through city streets
- Duration:
- 1.6s
- Cost:
- $0.000659
- Provider:
- Novita
Evaluation summary
Model input
Model response
Evaluation details
| Result | Evaluator | Details | Meta Data | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100.00% | Adverbs in dialogue tags | Target: ≤10% dialogue tags with adverbs |
| ||||||||||||||||||||||||
| 100.00% | AI-ism adverb frequency | Target: <2% AI-ism adverbs (58 tracked) |
| ||||||||||||||||||||||||
| 100.00% | AI-ism character names | Target: 0 AI-default names (17 tracked, −20% each) |
| ||||||||||||||||||||||||
| 100.00% | AI-ism location names | Target: 0 AI-default location names (33 tracked, −20% each) |
| ||||||||||||||||||||||||
| 100.00% | AI-ism word frequency | Target: <2% AI-ism words (290 tracked) |
| ||||||||||||||||||||||||
| 100.00% | Cliché density | Target: ≤1 cliche(s) per 800-word window |
| ||||||||||||||||||||||||
| 100.00% | Emotion telling (show vs. tell) | Target: ≤3% sentences with emotion telling |
| ||||||||||||||||||||||||
| 100.00% | Filter word density | Target: ≤3% sentences with filter/hedge words |
| ||||||||||||||||||||||||
| 100.00% | Gibberish response detection | Target: ≤1% gibberish-like sentences (hard fail if a sentence exceeds 800 words) |
| ||||||||||||||||||||||||
| 100.00% | Markdown formatting overuse | Target: ≤5% words in markdown formatting |
| ||||||||||||||||||||||||
| 100.00% | Missing dialogue indicators (quotation marks) | Target: ≤10% speech attributions without quotation marks |
| ||||||||||||||||||||||||
| 100.00% | Name drop frequency | Target: ≤1.0 per-name mentions per 100 words | n/a | ||||||||||||||||||||||||
| 100.00% | Narrator intent-glossing | Target: ≤2% narration sentences with intent-glossing patterns |
| ||||||||||||||||||||||||
| 100.00% | "Not X but Y" pattern overuse | Target: ≤1 "not X but Y" per 1000 words |
| ||||||||||||||||||||||||
| 100.00% | Overuse of "that" (subordinate clause padding) | Target: ≤2% sentences with "that" clauses |
| ||||||||||||||||||||||||
| 100.00% | Paragraph length variance | Target: CV ≥0.5 for paragraph word counts |
| ||||||||||||||||||||||||
| 100.00% | Passive voice overuse | Target: ≤2% passive sentences |
| ||||||||||||||||||||||||
| 100.00% | Past progressive (was/were + -ing) overuse | Target: ≤2% past progressive verbs |
| ||||||||||||||||||||||||
| 100.00% | Em-dash & semicolon overuse | Target: ≤2% sentences with em-dashes/semicolons |
| ||||||||||||||||||||||||
| 100.00% | Purple prose (modifier overload) | Target: <4% adverbs, <2% -ly adverbs, no adj stacking |
| ||||||||||||||||||||||||
| 100.00% | Repeated phrase echo | Target: ≤20% sentences with echoes (window: 2) |
| ||||||||||||||||||||||||
| 100.00% | Sentence length variance | Target: CV ≥0.4 for sentence word counts |
| ||||||||||||||||||||||||
| 100.00% | Sentence opener variety | Target: ≥60% unique sentence openers |
| ||||||||||||||||||||||||
| 0.00% | Adverb-first sentence starts | Target: ≥3% sentences starting with an adverb |
| ||||||||||||||||||||||||
| 0.00% | Pronoun-first sentence starts | Target: ≤30% sentences starting with a pronoun |
| ||||||||||||||||||||||||
| 100.00% | Subject-first sentence starts | Target: ≤72% sentences starting with a subject |
| ||||||||||||||||||||||||
| 0.00% | Subordinate conjunction sentence starts | Target: ≥2% sentences starting with a subordinating conjunction |
| ||||||||||||||||||||||||
| 100.00% | Technical jargon density | Target: ≤6% sentences with technical-jargon patterns |
| ||||||||||||||||||||||||
| 100.00% | Useless dialogue additions | Target: ≤5% dialogue tags with trailing filler fragments |
| ||||||||||||||||||||||||
| 100.00% | Dialogue tag variety (said vs. fancy) | Target: ≤10% fancy dialogue tags |
| ||||||||||||||||||||||||
| 90.0000% | |||||||||||||||||||||||||||