NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-04-14-01JRTSTZ5EC9VMQ1QF8Y44THXA
Run Details
Model:
GPT-4.1
Test:
Language Writing
Scenario:
Character dialogue (German) in a story
Duration:
6.0s
Cost:
$0.004760
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Parse dialogue
(details)
Model input
System
You are an expert novel author in the cozy mystery genre. Take the following details about characters/locations/... into account: <character name="Melanie"> Melanie is a 30-year old college student on a study abroad program. Born in Germany, she is fluent in German and does not yet know any English. She enjoys landscape photography and is always seen with her camera. </character> <character name="Hunter"> Hunter is a 35-year old detective who is investigating a series of mysterious disappearances in the small town of Alborn, MN. He is a former marine and has a no-nonsense attitude. Hunter has no knowledge of German, not even basic phrases. </character> <location name="Alborn, MN"> Alborne is a small town in the Midwest with a population of 3,000. It is known for its annual apple festival. </location>
User/Human
Write the following scene: Melanie is sitting at a cafe, looking at her camera. Hunter approaches her and asks her about a recent event in the town. She responds troubled and shaken in German, and Hunter looks confused. They talk for a while, until Hunter gets a call and has to leave. Return the dialogue between Melanie and Hunter as a voice sheet: Melanie: "..." Hunter: "..."
Model response
Messages
Meta
AI
Melanie: (studying the camera’s display, quietly to herself in German) "Das darf doch nicht wahr sein... Ich habe wirklich das gesehen?" (This can’t be true… Did I really see that?) Hunter: (approaching, holding a small notebook) "Excuse me, ma’am. I noticed you were at the park last night. I’d like to ask you a few questions about what you might have seen." Melanie: (startled, looks up, responds in rapid German) "Entschuldigung, ich spreche kein Englisch... Was wollen Sie?" (Sorry, I don’t speak English… What do you want?) Hunter: (frowning, noticing her distress) "Uh... I’m Detective Hunter. Did you see anything unusual last night? People… trouble?" Melanie: (shaking her head, voice trembling) "Ich weiß nicht… Es war so dunkel, und dann… jemand ist verschwunden. Ich…", trails off, gripping her camera tight (I don’t know… it was so dark, and then… someone disappeared. I…) Hunter: (confused, tries again) "Do you… understand me? Disappearances? Missing people?" (uses hand gestures as if describing something lost) Melanie: (nods, eyes wide, more slowly in German) "Ja, ja… jemand war da und dann… weg. Ich habe Fotos. Fotos gemacht." (Yes, yes… someone was there and then… gone. I took pictures. Pictures.) Hunter: (scribbles in notebook, frustrated but persistent) "Pictures? Can I… see?" (points to the camera, mimicking taking a photo) Melanie: (hesitating, then flipping through photos, showing him the blurry images, speaking softly in German) "Hier… schauen Sie… Ich habe Angst." (Here… look… I’m scared.) Hunter: (leans in, squinting at the images, gives her a reassuring nod though he doesn’t understand her words) "Thank you… uh, danke, I think. I’ll have someone take a look at these." Melanie: (nods, still clutching the camera, anxiety etched on her face) Hunter: (his phone rings, he answers gruffly) "Hunter here… yeah, I’m coming." (turns to Melanie, apologetic) "Sorry, got to go. Stay safe, okay?" Melanie: (softly, uncertainly in German) "Viel Glück…" (Good luck…)
id
"gen-1744656236-oXkiDqoBZXTqx0DPtgUy"
upstream_id
"chatcmpl-BMJ4enqIzu85HWSJ7L1EFSIfrCKft"
total_cost
0.00476
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:44:02.218491+00:00"
4/14/2025, 6:44:02 PM
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
315
moderation_latency
132
generation_time
4784
tokens_prompt
272
tokens_completion
526
native_tokens_prompt
272
native_tokens_completion
527
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00476
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Parse dialogue
n/a
dialogue
0
name
"Melanie"
dialogue
"Das darf doch nicht wahr sein... Ich habe wirklich das gesehen?"
detectedLang
"de"
heavyLang
"de"
scores
de
0.8480902777777778
pt
0.3217054263565891
nl
0.29292929292929293
sk
0.26160337552742613
eu
0.24731182795698922
pl
0.24731182795698922
fr
0.24242424242424243
cs
0.2222222222222222
fi
0.2009132420091324
reliable
true
passes
true
1
name
"Hunter"
dialogue
"Excuse me, ma’am. I noticed you were at the park last night. I’d like to ask you a few questions about what you might have seen."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8158803222094362
tl
0.5914198161389173
no
0.5189416716776909
da
0.49653870358716173
fr
0.4943109987357775
it
0.44367176634214184
yo
0.3957703927492447
nl
0.38128383604021654
pt
0.3730407523510972
cs
0.3579454253611557
sq
0.35064935064935066
ca
0.34210526315789475
sk
0.33774834437086093
es
0.3327773144286906
lv
0.3185689948892675
hr
0.3144815766923736
pl
0.313893653516295
sl
0.27206551410373064
et
0.27140255009107467
sv
0.25857275254865614
lt
0.23736892278360344
ro
0.22254616132167152
ms
0.21951219512195122
hu
0.2179863147605083
tr
0.15522703273495247
de
0.14984059511158343
fi
0.14712153518123666
is
0.14712153518123666
az
0.14255091103965703
vi
0.14255091103965703
be
0.08045977011494253
eu
0.08045977011494253
uk
0.07300115874855156
reliable
true
passes
true
2
name
"Melanie"
dialogue
"Entschuldigung, ich spreche kein Englisch... Was wollen Sie?"
detectedLang
"de"
heavyLang
"de"
scores
de
0.8170173833485819
nl
0.6610169491525424
en
0.48586118251928023
no
0.3902439024390244
da
0.3620414673046252
pl
0.3620414673046252
sk
0.3254637436762226
ms
0.3150684931506849
sv
0.3079584775086505
ro
0.30675909878682844
hr
0.27140255009107467
sl
0.21568627450980393
hu
0.1902834008097166
lt
0.1836734693877551
yo
0.17864476386036962
fi
0.16666666666666666
sq
0.15074309978768577
is
0.14712153518123666
it
0.14712153518123666
reliable
true
passes
true
3
name
"Hunter"
dialogue
"Uh... I’m Detective Hunter. Did you see anything unusual last night? People… trouble?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8025412087912088
tl
0.6633489461358314
fr
0.4833782569631626
et
0.37971952535059333
ro
0.35101580135440186
pt
0.3502824858757062
nl
0.34285714285714286
es
0.3072289156626506
sq
0.3047158403869408
de
0.2996345919610231
da
0.29447852760736193
ca
0.2936117936117936
it
0.2909987669543773
no
0.2874845105328377
ms
0.27122940430925224
cs
0.20689655172413793
sv
0.19241573033707865
sl
0.1797432239657632
yo
0.11402157164869028
hr
0.1043613707165109
tr
0.09305993690851734
reliable
true
passes
true
4
name
"Melanie"
dialogue
"Ich weiß nicht… Es war so dunkel, und dann… jemand ist verschwunden. Ich…"
detectedLang
"de"
heavyLang
"de"
scores
de
0.8325105782792667
nl
0.5324803149606299
en
0.4444444444444444
no
0.40176322418136023
da
0.3863049095607235
es
0.3790849673202614
hu
0.36918990703851257
sk
0.326241134751773
tr
0.3115942028985507
yo
0.31059506531204645
ca
0.302496328928047
lt
0.26127527216174184
sv
0.2531446540880503
sl
0.24121405750798725
sq
0.2263843648208469
fr
0.22512234910277326
ro
0.22003284072249588
lv
0.20168067226890754
pl
0.19491525423728812
hi
0.18244406196213425
cs
0.1739130434782609
vi
0.17247386759581881
ms
0.15630550621669628
tl
0.15630550621669628
it
0.1316270566727605
is
0.12361623616236161
pt
0.1171003717472119
az
0.11048689138576778
reliable
true
passes
true
5
name
"Hunter"
dialogue
"Do you… understand me? Disappearances? Missing people?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.7995724211651524
nl
0.5915032679738562
fr
0.5593419506462984
no
0.5567375886524822
pt
0.5562130177514794
da
0.5398773006134969
tl
0.5300751879699248
sv
0.45014662756598234
cs
0.44029850746268656
ms
0.4377811094452773
de
0.4140625
sq
0.4140625
it
0.3822075782537067
is
0.3760399334442596
ca
0.37290969899665555
pl
0.37080536912751677
sk
0.37080536912751677
sl
0.3567753001715266
es
0.33745583038869253
et
0.29906542056074764
fi
0.28977272727272724
hr
0.24242424242424243
vi
0.22839506172839508
yo
0.17218543046357618
lv
0.1648106904231626
ro
0.1648106904231626
az
0.14772727272727273
reliable
true
passes
true
6
name
"Melanie"
dialogue
"Ja, ja… jemand war da und dann… weg. Ich habe Fotos. Fotos gemacht."
detectedLang
"de"
heavyLang
"de"
scores
de
0.8226950354609929
nl
0.6380558428128231
da
0.48377581120943947
pl
0.47526236881559225
en
0.44
sk
0.44
sl
0.43548387096774194
pt
0.4345718901453958
lv
0.41569282136894825
ca
0.40978077571669475
yo
0.40476190476190477
et
0.38271604938271603
es
0.37050359712230213
ro
0.36479128856624315
hr
0.34579439252336447
it
0.328214971209213
tr
0.328214971209213
ms
0.3217054263565891
az
0.32038834951456313
fr
0.3069306930693069
fi
0.30417495029821073
no
0.29718875502008035
lt
0.29006085192697767
sq
0.2827868852459016
sv
0.2768595041322314
eu
0.26160337552742613
cs
0.1606714628297362
tl
0.15254237288135594
vi
0.15254237288135594
reliable
true
passes
true
7
name
"Hunter"
dialogue
"Pictures? Can I… see?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8091603053435115
ca
0.719626168224299
et
0.6881496881496882
ro
0.6173469387755102
tl
0.6021220159151194
no
0.5912806539509536
da
0.5844875346260388
sv
0.5833333333333334
fr
0.5614035087719298
tr
0.494949494949495
lv
0.49324324324324326
az
0.4827586206896552
hr
0.4809688581314879
sq
0.471830985915493
pl
0.45652173913043476
it
0.43396226415094336
cs
0.4117647058823529
yo
0.3670886075949367
de
0.32432432432432434
nl
0.32432432432432434
sk
0.32432432432432434
be
0.3181818181818182
hu
0.3087557603686636
is
0.29577464788732394
lt
0.29577464788732394
uk
0.29577464788732394
vi
0.29577464788732394
reliable
true
passes
true
8
name
"Melanie"
dialogue
"Hier… schauen Sie… Ich habe Angst."
detectedLang
"de"
heavyLang
"de"
scores
de
0.8411437648927721
nl
0.6737357259380098
pl
0.531615925058548
sk
0.38271604938271603
fr
0.3055555555555556
da
0.2779783393501805
no
0.2779783393501805
reliable
true
passes
true
9
name
"Hunter"
dialogue
"Thank you… uh, danke, I think. I’ll have someone take a look at these."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8103905953735305
da
0.5483288166214996
tl
0.5340167753960857
no
0.46977730646871685
sk
0.414519906323185
hr
0.40828402366863903
it
0.40476190476190477
cs
0.38650306748466257
sq
0.3796526054590571
vi
0.3573264781491003
sv
0.3548387096774194
ca
0.3540051679586563
sl
0.3438320209973753
nl
0.3315508021390375
pl
0.3160054719562244
ro
0.31412894375857336
hu
0.3084370677731674
de
0.3045897079276773
pt
0.30362116991643456
es
0.29577464788732394
yo
0.2917847025495751
tr
0.22720247295208654
lt
0.2248062015503876
is
0.2163009404388715
az
0.21011058451816747
lv
0.21011058451816747
et
0.20127795527156547
fr
0.18962722852512154
be
0.12280701754385964
eu
0.12280701754385964
fi
0.11190053285968028
ms
0.11190053285968028
uk
0.11190053285968028
reliable
true
passes
true
10
name
"Hunter"
dialogue
"Sorry, got to go. Stay safe, okay?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.7757847533632287
tl
0.6491228070175439
pl
0.5485327313769752
yo
0.5169082125603864
sk
0.39759036144578314
cs
0.3957703927492447
lv
0.37888198757763975
sl
0.37694704049844235
hr
0.3670886075949367
no
0.35275080906148865
da
0.34210526315789475
lt
0.34210526315789475
it
0.3333333333333333
pt
0.3333333333333333
reliable
true
passes
true
11
name
"Melanie"
dialogue
"Viel Glück…"
detectedLang
"de"
heavyLang
"de"
scores
de
0.8369565217391305
nl
0.5714285714285714
reliable
true
passes
true