alpindale/goliath-120b
Goliath 120B via OpenRouter
Release Date
Nov 5th, 2023Parameters
120BContext Size
6kCreative writing
25.20%Rule following
25.42%Utility
63.17%Mathematics
50.00%Tooling
48.46%Language
64.95%Logic
74.38%Data extraction
Extract key details from a given block of text.
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 95% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 50% | 50% | 50% | 50% | 80% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 50% | 90% | |
| 100% | 100% | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 30% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 95% | |
| 100% | 50% | 50% | 50% | 50% | 50% | 50% | 50% | 50% | 0% | 50% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 50% | 50% | 50% | 50% | 50% | 50% | 50% | 50% | 50% | 50% | 50% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 50% | 50% | 85% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 81.25% | |||||||||||
Dialogue tags
Various tasks related to dialogue tags in text.
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% | 100% | 100% | 100% | 14% | 14% | 1% | 0% | 0% | 0% | 43% | |
| 59% | 50% | 47% | 39% | 36% | 34% | 26% | 18% | 10% | 7% | 33% | |
| 100% | 99% | 55% | 53% | 48% | 47% | 44% | 43% | 1% | 1% | 49% | |
| 58% | 50% | 50% | 50% | 50% | 48% | 41% | 30% | 10% | 1% | 39% | |
| 24% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 3% | |
| 43% | 31% | 14% | 4% | 1% | 0% | 0% | 0% | 0% | 0% | 9% | |
| 10% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 1% | |
| 25.20% | |||||||||||
Language Comprehension
Does the model understand more than just English?
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Total |
|---|---|---|---|---|---|---|
| 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 0% | 80% | |
| 100% | 100% | 0% | 0% | 0% | 40% | |
| 0% | 0% | 0% | 0% | 0% | 0% | |
| 55.00% | ||||||
Language Writing
Can the model generate text in different languages?
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Total |
|---|---|---|---|---|---|---|
| 100% | 73% | 71% | 50% | 38% | 66% | |
| 100% | 83% | 67% | 56% | 43% | 70% | |
| 100% | 88% | 86% | 82% | 55% | 82% | |
| 100% | 70% | 67% | 60% | 57% | 71% | |
| 100% | 100% | 67% | 63% | 50% | 76% | |
| 72.90% | ||||||
Novel outline
Handle questions about the outline of a novel in various formats
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 90% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 100% | 100% | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 30% | |
| 100% | 100% | 100% | 100% | 100% | 0% | 0% | 0% | 0% | 0% | 50% | |
| 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 10% | |
| 100% | 100% | 100% | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 40% | |
| 100% | 50% | 50% | 50% | 0% | 0% | 0% | 0% | 0% | 0% | 25% | |
| 100% | 100% | 100% | 50% | 50% | 50% | 0% | 0% | 0% | 0% | 45% | |
| 49.17% | |||||||||||
Tool usage within Novelcrafter
Output messages that are related to tool usage within Novelcrafter
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 67% | 67% | 67% | 67% | 33% | 33% | 33% | 33% | 0% | 0% | 40% | |
| 40.00% | |||||||||||
N-Length Sentences
Write sentences with exactly N words
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 94% | 78% | 52% | 47% | 39% | 17% | 1% | 1% | 1% | 0% | 33% | |
| 58% | 51% | 25% | 24% | 24% | 3% | 1% | 0% | 0% | 0% | 18% | |
| 15% | 9% | 5% | 3% | 0% | 0% | 0% | 0% | 0% | 0% | 3% | |
| 18.21% | |||||||||||
Voice/dialogue sheets
Extract dialogue from given text as voice sheets.
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 90% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 90% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 10% | |
| 58.00% | |||||||||||
Write N of X
Write exactly N words/sentences/paragraphs...
| Scenario | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Run 6 | Run 7 | Run 8 | Run 9 | Run 10 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 100% | 27% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 13% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 77% | 54% | 54% | 54% | 54% | 27% | 27% | 27% | 27% | 9% | 41% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 27.21% | |||||||||||