openai/gpt-4.1

GPT-4.1

Release Date

Apr 14th, 2025

Parameters

Context Size

1m

Creative writing

41.20%

Rule following

74.44%

Utility

89.33%

Mathematics

100.00%

Tooling

87.69%

Language

89.65%

Logic

82.50%

Dialogue tags

Various tasks related to dialogue tags in text.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Creative writingRule following
100%100%100%100%100%100%100%100%100%100%100%
0-shot Creative writingRule following
50%50%50%50%50%50%50%49%43%34%48%
0-shot Creative writingRule following
50%50%50%50%50%50%50%50%49%14%46%
0-shot Creative writingRule following
68%68%68%68%66%66%52%50%48%34%59%
0-shot Creative writingRule following
41%1%0%0%0%0%0%0%0%0%4%
0-shot Creative writingRule following
50%48%34%26%22%3%0%0%0%0%18%
0-shot Creative writingRule following
50%43%38%1%0%0%0%0%0%0%13%
41.20%

Language Comprehension

Does the model understand more than just English?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%100%100%100%100%
0-shot Language
100%100%100%100%0%80%
0-shot Language
100%100%100%100%0%80%
0-shot Language
100%100%100%100%100%100%
90.00%

Language Writing

Can the model generate text in different languages?

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Total
0-shot Language
100%100%100%93%57%90%
0-shot Language
100%91%85%82%79%87%
0-shot Language
100%100%92%85%83%92%
0-shot Language
100%92%85%85%82%89%
0-shot Language
100%100%100%100%45%89%
89.37%

Tool usage within Novelcrafter

Output messages that are related to tool usage within Novelcrafter

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot ToolingUtility
100%100%100%100%100%100%100%100%100%100%100%
100.00%

N-Length Sentences

Write sentences with exactly N words

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Rule following
100%98%98%98%97%96%96%89%88%70%93%
0-shot Rule following
100%95%95%95%92%92%92%90%83%79%91%
0-shot Rule following
93%92%92%89%87%85%84%60%59%57%80%
88.02%

Voice/dialogue sheets

Extract dialogue from given text as voice sheets.

Scenario Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7 Run 8 Run 9 Run 10 Total
0-shot Utility
100%100%100%100%100%100%100%100%100%0%90%
1-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
Few-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
0-shot Utility
100%100%100%100%100%100%100%100%100%0%90%
0-shot Utility
100%100%100%100%100%100%100%100%100%100%100%
96.00%