deepseek/deepseek-chat

DeepSeek V3 (2024-12-26)

Release Date

Dec 26th, 2024

Parameters

Context Size

163.8k

Benchmark Cost

$0.39

Speed

24.9 tok/s

Creative writing

41.63%

Rule following

72.59%

Utility

82.95%

Mathematics

100.00%

Tooling

80.63%

Language

86.94%

Logic

86.53%

Codex Violation Detection

Detects factual inconsistencies between a story bible and prose passages. The model must output structured XML identifying each violation with paragraph number and substring.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
matrix
0-shot ToolingUtilityLogicRule following
91%90%88%87%86%85%81%77%66%58%81%
0-shot ToolingUtilityLogicRule following
95%93%92%85%85%85%83%83%81%80%86%
0-shot ToolingUtilityLogicRule following
96%93%93%93%93%93%89%89%89%89%91%
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%100%100%100%97%100%
tiers
0-shot ToolingUtilityLogicRule following
100%100%100%100%100%100%93%93%93%76%96%
0-shot ToolingUtilityLogicRule following
94%92%89%86%86%86%79%79%72%72%84%
0-shot ToolingUtilityLogicRule following
94%91%91%86%86%86%82%73%73%65%83%
0-shot ToolingUtilityLogicRule following
97%92%92%84%84%83%80%79%76%75%84%
88.05%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
dialogue-200
0-shot Creative writingRule following
74%50%49%47%26%17%5%2%0%0%27%
0-shot Creative writingRule following
75%68%68%53%51%50%47%41%41%0%49%
0-shot Creative writingRule following
100%100%80%69%50%47%34%30%6%0%52%
dialogue-500
0-shot Creative writingRule following
50%47%45%41%0%0%0%0%0%0%18%
0-shot Creative writingRule following
49%47%43%18%10%7%2%0%0%0%18%
0-shot Creative writingRule following
85%43%39%28%19%16%15%14%14%2%28%
Ungrouped
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%100%100%
41.63%

Language Comprehension

Does the model understand more than just English?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%100%100%
100.00%

Language Writing

Can the model generate text in different languages?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%93%91%83%71%88%
0-shot Language
100%92%89%85%0%73%
0-shot Language
100%100%100%0%0%60%
0-shot Language
100%100%93%80%0%75%
0-shot Language
92%91%89%85%80%87%
76.49%

N-Length Sentences

Write sentences with exactly N words

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Rule following
100%100%100%100%100%100%98%93%86%70%95%
0-shot Rule following
92%90%89%87%87%82%77%68%65%59%80%
0-shot Rule following
53%49%32%21%21%20%18%10%4%0%23%
65.74%

Novel outline

Handle questions about the outline of a novel in various formats

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
outline-count
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
0-shot ToolingUtility
100%100%100%100%0%0%0%0%0%0%40%
pov-count
0-shot ToolingUtility
100%100%100%100%100%50%0%0%0%0%55%
0-shot ToolingUtility
100%100%100%100%100%100%0%0%0%0%60%
0-shot ToolingUtility
100%100%100%100%100%0%0%0%0%0%50%
67.50%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
100.00%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Utility
100%100%100%100%100%0%0%0%0%0%50%
0-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
1-shot Utility
100%100%100%100%100%100%100%0%0%0%70%
Few-shot Utility
100%100%100%100%100%100%100%100%100%0%90%
0-shot Utility
100%100%100%100%100%100%0%0%0%0%60%
74.00%

Write N of X

Write exactly N words/sentences/paragraphs...

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
paragraphs
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
sentences
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Rule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Rule following
100%100%100%100%100%100%100%100%92%2%89%
0-shot Rule following
100%100%100%100%98%98%92%92%27%27%84%
0-shot Rule following
100%100%100%100%100%100%98%0%0%0%70%
words
0-shot Rule following
100%100%100%100%98%98%98%98%92%92%98%
0-shot Rule following
100%100%100%100%100%100%100%100%98%98%100%
0-shot Rule following
100%98%92%77%54%9%9%0%0%0%44%
0-shot Rule following
100%92%92%54%54%27%9%9%2%0%44%
0-shot Rule following
100%98%92%0%0%0%0%0%0%0%29%
81.32%