NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-04-14-01JRTSWD3XBVRWNG715A3MC8W4
Run Details
Model:
GPT-4.1
Test:
Language Writing
Scenario:
Character dialogue (Hindi) in a story
Duration:
9.1s
Cost:
$0.004090
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
45.45%
Parse dialogue
(details)
Model input
System
You are an expert novel author in the cozy mystery genre. Take the following details about characters/locations/... into account: <character name="Selvi"> Selvi is a 30-year old college student on a study abroad program. Born in India, she is fluent in Hindi and does not yet know any English. She enjoys landscape photography and is always seen with her camera. </character> <character name="Hunter"> Hunter is a 35-year old detective who is investigating a series of mysterious disappearances in the small town of Alborn, MN. He is a former marine and has a no-nonsense attitude. Hunter has no knowledge of Hindi, not even basic phrases. </character> <location name="Alborn, MN"> Alborne is a small town in the Midwest with a population of 3,000. It is known for its annual apple festival. </location>
User/Human
Write the following scene: Selvi is sitting at a cafe, looking at her camera. Hunter approaches her and asks her about a recent event in the town. She responds troubled and shaken in Hindi, and Hunter looks confused. They talk for a while, until Hunter gets a call and has to leave. Return the dialogue between Selvi and Hunter as a voice sheet: Selvi: "..." Hunter: "..."
Model response
Messages
Meta
AI
Selvi: (turning her camera over in her hands, voice soft) "Yeh tasveer theek se nahi aayi... par main darr gayi thi, sach mein." Hunter: (pulls out his badge, sits across from her) "Ma’am, I’m Detective Hunter. I saw you here last night, around the time of the incident. Did you see anything unusual?" Selvi: (eyes wide, struggling to find English words, answers in quick Hindi) "Maine who aadmi dekha—kaale coat mein, raat ko... woh aise chhup raha tha. Mujhe samajh nahi aaya kya ho raha hai..." Hunter: (leans forward, picks up on her distress but not her words) "I’m sorry, I don’t understand. Do you speak English at all?" Selvi: (voice trembling, pointing at her camera, still in Hindi) "Camera… photo... main try ki kiya, par... sabh kuch itna jaldi hua." Hunter: (nods slowly, tries to use hand gestures) "Camera? Did you take a picture? Can I see?" Selvi: (nods, hands camera with shaking fingers, says in Hindi) "Yeh rakh lo, dekh sakte hain... mujhe bahut darr lag raha hai." Hunter: (scrolls through images, frowns, then looks up) "Is this from last night? This man—did you see where he went?" Selvi: (shakes her head, quietly) "Nahin, bilkul pata nahi... bas andhera tha." Hunter: (exhales, puts camera down gently) "I wish I understood you. Thank you—I’ll get someone to help translate. You’re safe here." (His phone rings, he picks it up) "Hunter. Yeah? On my way." (stands) "Stay here, okay? I’ll be back." Selvi: (nods faintly, hugs her bag to her chest, whispering to herself in Hindi) "Bhagwan, madad karo..."
id
"gen-1744656283-jgLThhNogjEqTdcvr12O"
upstream_id
"chatcmpl-BMJ5P3wjXC5OWYXtnKZeh6Q7ngwdj"
total_cost
0.00409
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:44:52.751031+00:00"
4/14/2025, 6:44:52 PM
model
"openai/gpt-4.1-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
368
moderation_latency
140
generation_time
8030
tokens_prompt
273
tokens_completion
442
native_tokens_prompt
273
native_tokens_completion
443
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.00409
Evaluation details
Result
Evaluator
Details
Meta Data
45.45%
Parse dialogue
n/a
dialogue
0
name
"Selvi"
dialogue
"Yeh tasveer theek se nahi aayi... par main darr gayi thi, sach mein."
detectedLang
"fr"
heavyLang
"ber"
scores
fr
0.4762569832402235
nl
0.3718592964824121
de
0.35121107266435986
da
0.33035714285714285
sv
0.3231046931407942
eu
0.3055555555555556
lv
0.29245283018867924
vi
0.29245283018867924
cs
0.2661448140900196
hr
0.2618110236220473
sl
0.26035502958579887
es
0.25742574257425743
ro
0.24242424242424243
pt
0.24089068825910928
sq
0.2299794661190965
yo
0.2219917012448133
fi
0.21383647798742136
en
0.20886075949367086
it
0.20550847457627117
ca
0.20382165605095542
no
0.16666666666666666
reliable
false
passes
false
1
name
"Hunter"
dialogue
"Ma’am, I’m Detective Hunter. I saw you here last night, around the time of the incident. Did you see anything unusual?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8046875
tl
0.6754606365159129
no
0.4974059662775616
sq
0.46808510638297873
da
0.45766270118964314
fr
0.449182658137882
nl
0.44879089615931717
ro
0.44603288062902074
ca
0.43306510607168985
et
0.41243366186504926
pt
0.4029275808936826
it
0.3902439024390244
de
0.367862969004894
sv
0.3492863140218304
es
0.33760683760683763
ms
0.33590402742073694
cs
0.32958477508650524
yo
0.32958477508650524
hr
0.31049822064056937
sl
0.2475728155339806
az
0.2115971515768057
sk
0.20594262295081966
pl
0.20185375901132854
is
0.16396979503775622
tr
0.1426991150442478
hu
0.13984461709211987
lt
0.13600891861761427
be
0.08284023668639054
lv
0.07517899761336516
uk
0.07517899761336516
vi
0.07517899761336516
reliable
true
passes
true
2
name
"Selvi"
dialogue
"Maine who aadmi dekha—kaale coat mein, raat ko... woh aise chhup raha tha. Mujhe samajh nahi aaya kya ho raha hai..."
detectedLang
"et"
heavyLang
"tl"
scores
et
0.4748687171792948
tl
0.3710691823899371
en
0.3137254901960784
lv
0.3055555555555556
de
0.279835390946502
eu
0.27835051546391754
fi
0.27611168562564636
no
0.23161361141602635
fr
0.2290748898678414
it
0.2273730684326711
vi
0.18319719953325553
ro
0.17355371900826447
yo
0.15764139590854392
sl
0.15254237288135594
ca
0.14634146341463414
sq
0.14320685434516522
sk
0.13686806411837238
cs
0.1315136476426799
is
0.12718204488778054
lt
0.11949685534591195
da
0.10941475826972011
sv
0.10256410256410256
ms
0.0920881971465629
reliable
false
passes
false
3
name
"Hunter"
dialogue
"I’m sorry, I don’t understand. Do you speak English at all?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8101265822784809
tl
0.6951219512195121
no
0.6280991735537189
da
0.604221635883905
sv
0.5227995758218452
cs
0.47058823529411764
pl
0.4353826850690088
de
0.42820838627700125
hr
0.4238156209987196
sq
0.4178525226390685
sk
0.3943472409152086
it
0.3894165535956581
pt
0.3626062322946175
ro
0.3440233236151603
nl
0.2834394904458599
vi
0.27884615384615385
yo
0.2682926829268293
et
0.25124792013311154
ca
0.23469387755102045
is
0.23339011925042585
sl
0.21052631578947367
ms
0.13793103448275862
be
0.1346153846153846
tr
0.1346153846153846
hu
0.12959381044487428
az
0.12280701754385964
lt
0.12280701754385964
lv
0.12280701754385964
uk
0.12280701754385964
reliable
true
passes
true
4
name
"Selvi"
dialogue
"Camera… photo... main try ki kiya, par... sabh kuch itna jaldi hua."
detectedLang
"fr"
heavyLang
"is"
scores
fr
0.570446735395189
en
0.5432399512789281
nl
0.3194192377495462
sv
0.3181818181818182
it
0.2964352720450282
ro
0.2964352720450282
az
0.2937853107344633
lv
0.29245283018867924
sl
0.28977272727272724
de
0.2857142857142857
tl
0.28435114503816794
hu
0.24395161290322584
yo
0.23155737704918034
eu
0.2203742203742204
tr
0.21711899791231734
ca
0.21383647798742136
es
0.20886075949367086
is
0.20042643923240938
da
0.19871794871794873
cs
0.19006479481641467
et
0.17218543046357618
hi
0.17035398230088497
ms
0.1610738255033557
reliable
false
passes
false
5
name
"Hunter"
dialogue
"Camera? Did you take a picture? Can I see?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8062015503875969
tl
0.6596534653465347
it
0.6376811594202898
ro
0.6283783783783783
ca
0.6222527472527472
hr
0.5506535947712418
cs
0.5338983050847458
fr
0.5258620689655172
et
0.5183887915936953
sv
0.507168458781362
sq
0.5018115942028986
yo
0.5018115942028986
sl
0.47418738049713194
es
0.45652173913043476
pl
0.45652173913043476
tr
0.44889779559118237
hu
0.44779116465863456
no
0.4387755102040817
da
0.434156378600823
az
0.43298969072164945
de
0.43298969072164945
sk
0.4234800838574424
nl
0.34679334916864607
lt
0.34523809523809523
pt
0.3373493975903614
is
0.3341404358353511
ms
0.32926829268292684
lv
0.32598039215686275
vi
0.32598039215686275
be
0.20289855072463767
eu
0.20289855072463767
fi
0.1863905325443787
uk
0.1863905325443787
reliable
true
passes
true
6
name
"Selvi"
dialogue
"Yeh rakh lo, dekh sakte hain... mujhe bahut darr lag raha hai."
detectedLang
"sq"
heavyLang
"no"
scores
sq
0.43267776096822996
vi
0.3980738362760835
it
0.375
nl
0.3206521739130435
no
0.3169398907103825
de
0.31066176470588236
et
0.2732558139534884
es
0.27184466019417475
eu
0.2690058479532163
yo
0.267578125
sv
0.19871794871794873
ca
0.19527896995708155
pt
0.1830065359477124
da
0.16851441241685144
reliable
false
passes
false
7
name
"Hunter"
dialogue
"Is this from last night? This man—did you see where he went?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8425551057130004
tl
0.6783088235294117
nl
0.4897959183673469
yo
0.4582043343653251
de
0.4336569579288026
et
0.41373534338358464
da
0.3670886075949367
no
0.32432432432432434
hu
0.3055555555555556
sv
0.26624737945492666
lv
0.23580786026200873
lt
0.21171171171171171
is
0.2081447963800905
ca
0.20634920634920634
fi
0.20634920634920634
es
0.1935483870967742
cs
0.1822429906542056
sq
0.1822429906542056
fr
0.17452830188679244
reliable
true
passes
true
8
name
"Selvi"
dialogue
"Nahin, bilkul pata nahi... bas andhera tha."
detectedLang
"eu"
heavyLang
"id"
scores
eu
0.636098981077147
tl
0.4802494802494803
sq
0.31880108991825606
fr
0.3093922651933701
ms
0.2857142857142857
reliable
true
passes
false
9
name
"Hunter"
dialogue
"Stay here, okay? I’ll be back."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8076923076923077
tl
0.6354166666666666
hu
0.4006849315068493
lt
0.3902439024390244
yo
0.38596491228070173
sq
0.36363636363636365
reliable
true
passes
true
10
name
"Selvi"
dialogue
"Bhagwan, madad karo..."
detectedLang
"tl"
heavyLang
"hu"
scores
tl
0.5155038759689923
et
0.47257383966244726
lt
0.45652173913043476
sv
0.45652173913043476
reliable
false
passes
false