NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2025-04-14-01JRTSQG02NXMVYHDAHFZC978B
Run Details
Model:
GPT-4.1 Nano
Test:
Language Writing
Scenario:
Character dialogue (Hindi) in a story
Duration:
6.5s
Cost:
$0.000178
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
50.00%
Parse dialogue
(details)
Model input
System
You are an expert novel author in the cozy mystery genre. Take the following details about characters/locations/... into account: <character name="Selvi"> Selvi is a 30-year old college student on a study abroad program. Born in India, she is fluent in Hindi and does not yet know any English. She enjoys landscape photography and is always seen with her camera. </character> <character name="Hunter"> Hunter is a 35-year old detective who is investigating a series of mysterious disappearances in the small town of Alborn, MN. He is a former marine and has a no-nonsense attitude. Hunter has no knowledge of Hindi, not even basic phrases. </character> <location name="Alborn, MN"> Alborne is a small town in the Midwest with a population of 3,000. It is known for its annual apple festival. </location>
User/Human
Write the following scene: Selvi is sitting at a cafe, looking at her camera. Hunter approaches her and asks her about a recent event in the town. She responds troubled and shaken in Hindi, and Hunter looks confused. They talk for a while, until Hunter gets a call and has to leave. Return the dialogue between Selvi and Hunter as a voice sheet: Selvi: "..." Hunter: "..."
Model response
Messages
Meta
AI
Selvi: (her voice trembling, staring at her camera, speaking in Hindi) "Mujhe lagta hai ki kuch bahut kharab hua hai... Maine jo bhi dekha, uske baare mein main soch nahi pa rahi hoon." Hunter: (frowning, glancing confused) "I'm sorry, I didn't catch that. You said what now?" Selvi: (in Hindi, voice strained) "Main... Main kuch aisa mehsoos kar rahi hoon, jo theek nahi lag raha. Kahi kuch galat hone wala hai... someone is missing something, and I saw it." Hunter: (scratches his head, trying to understand) "Umm… I don’t speak Hindi. Could you tell me what’s going on? Are you from here?" Selvi: (shaking her head, voice shaky) "Nahi, main yahan sirf padhne aayi hoon. Yeh shehar achha lagta hai, lekin... kuch toh hai, kuch chhupa hua hai." Hunter: (nodding slowly, trying to piece things together) "Alright, I’ll need you to repeat that in English if you can. Anything you saw or noticed that could be helpful?" Selvi: (hesitates, then sighs) "Mujhe lagta hai ki jo bhi hua hai, usse sirf mein hi nahi, si shayad koi aur bhi jaanta hai... par shabdon mein nahi keh sakti hoon." Hunter: (glancing at his phone as it rings) "Sorry, I’ve got to take this. Thanks for talking with me, Selvi. If you think of anything else, let me know, okay?" Selvi: (nods silently, looking back at her camera as Hunter stands to leave)
id
"gen-1744656124-FznTUAKKvjXZwYIUZx77"
upstream_id
"chatcmpl-BMJ2qJbFSOWcczDJaRWuH3LgO3iBG"
total_cost
0.0001781
cache_discount
(null)
provider_name
"OpenAI"
created_at
"2025-04-14T18:42:09.112213+00:00"
4/14/2025, 6:42:09 PM
model
"openai/gpt-4.1-nano-2025-04-14"
app_id
182717
streamed
true
cancelled
false
latency
303
moderation_latency
105
generation_time
3811
tokens_prompt
273
tokens_completion
378
native_tokens_prompt
273
native_tokens_completion
377
native_tokens_reasoning
0
num_media_prompt
(null)
num_media_completion
(null)
num_search_results
(null)
origin
"https://ncbench.com/"
is_byok
false
finish_reason
"stop"
native_finish_reason
"stop"
usage
0.0001781
Evaluation details
Result
Evaluator
Details
Meta Data
50.00%
Parse dialogue
n/a
dialogue
0
name
"Selvi"
dialogue
"Mujhe lagta hai ki kuch bahut kharab hua hai... Maine jo bhi dekha, uske baare mein main soch nahi pa rahi hoon."
detectedLang
"de"
heavyLang
"tl"
scores
de
0.39759036144578314
sl
0.3688007213706041
lv
0.3076162215628091
yo
0.3055555555555556
sq
0.30417495029821073
fr
0.29718875502008035
et
0.29149797570850206
fi
0.2805755395683453
en
0.2723492723492723
eu
0.2623814541622761
tl
0.24487594390507011
lt
0.21612541993281076
vi
0.18319719953325553
az
0.1822429906542056
ro
0.17355371900826447
hu
0.14738124238733252
hr
0.13366336633663367
tr
0.12935323383084577
da
0.12718204488778054
it
0.125
ca
0.11280101394169835
no
0.10025706940874037
reliable
false
passes
false
1
name
"Hunter"
dialogue
"I'm sorry, I didn't catch that. You said what now?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8217224355458036
tl
0.5683930942895086
it
0.4715447154471545
sv
0.4133574007220217
cs
0.3602362204724409
et
0.35000000000000003
no
0.30851063829787234
da
0.2995689655172413
hr
0.2995689655172413
ca
0.29805615550755943
sq
0.2919389978213508
pl
0.2793791574279379
pt
0.23529411764705882
fr
0.21875
ro
0.21686746987951808
yo
0.2111650485436893
de
0.1875
sk
0.181360201511335
be
0.17721518987341772
tr
0.17721518987341772
hu
0.1709183673469388
az
0.16237113402061853
is
0.16237113402061853
lt
0.16237113402061853
lv
0.16237113402061853
uk
0.16237113402061853
vi
0.16237113402061853
reliable
true
passes
true
2
name
"Selvi"
dialogue
"Main... Main kuch aisa mehsoos kar rahi hoon, jo theek nahi lag raha. Kahi kuch galat hone wala hai... someone is missing something, and I saw it."
detectedLang
"en"
heavyLang
"de"
scores
en
0.6486854034451497
tl
0.5025673940949936
nl
0.4001547987616099
yo
0.3795036028823059
no
0.3426632739609839
sv
0.34154630416312654
da
0.329004329004329
ca
0.32550043516100957
ms
0.31957857769973663
et
0.3074173369079536
lv
0.30617726051924804
sq
0.2889908256880734
is
0.2837338262476894
it
0.26540284360189575
sl
0.25265188042430087
eu
0.24316406250000003
tr
0.24019607843137253
ro
0.23720472440944884
de
0.22654690618762477
hu
0.22188755020080322
vi
0.22110552763819097
fi
0.1952232606438214
lt
0.18248945147679324
fr
0.1711229946524064
es
0.16576964477933262
pt
0.15577342047930284
hr
0.1520787746170678
az
0.14174972314507198
pl
0.13984461709211987
cs
0.11931818181818182
sk
0.08500590318772137
be
0.08284023668639054
uk
0.07517899761336516
reliable
true
passes
false
3
name
"Hunter"
dialogue
"Umm… I don’t speak Hindi. Could you tell me what’s going on? Are you from here?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8189598420013167
tl
0.6463022508038584
sq
0.4602551521099117
et
0.4246861924686192
it
0.41921858500527986
fr
0.36194895591647336
cs
0.3193069306930693
ro
0.30203045685279184
ca
0.301143583227446
no
0.3002544529262086
hr
0.29214929214929214
fi
0.29032258064516125
pl
0.2763157894736842
nl
0.24965893587994542
hi
0.22752808988764048
yo
0.22752808988764048
is
0.22644163150492264
sk
0.2109038737446198
da
0.20174165457184323
sv
0.2005813953488372
es
0.15644171779141106
pt
0.13793103448275862
sl
0.13793103448275862
ms
0.13249211356466878
be
0.11290322580645161
tr
0.11290322580645161
hu
0.1085899513776337
az
0.10277324632952692
lt
0.10277324632952692
lv
0.10277324632952692
uk
0.10277324632952692
vi
0.10277324632952692
reliable
true
passes
true
4
name
"Selvi"
dialogue
"Nahi, main yahan sirf padhne aayi hoon. Yeh shehar achha lagta hai, lekin... kuch toh hai, kuch chhupa hua hai."
detectedLang
"eu"
heavyLang
"tl"
scores
eu
0.4480220791168353
fi
0.23273657289002556
vi
0.2073976221928666
ms
0.18699186991869918
tl
0.1792065663474692
it
0.14285714285714285
en
0.14163090128755365
fr
0.12790697674418602
is
0.12790697674418602
de
0.11504424778761062
reliable
false
passes
false
5
name
"Hunter"
dialogue
"Alright, I’ll need you to repeat that in English if you can. Anything you saw or noticed that could be helpful?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8156241464080851
tl
0.5236414961185604
yo
0.3773062730627306
fr
0.3336623889437315
ms
0.32499999999999996
ro
0.30769230769230765
de
0.30268595041322316
sv
0.29245283018867924
sl
0.2894736842105263
no
0.28191489361702127
da
0.2702702702702703
nl
0.26550598476605003
lv
0.26310043668122274
it
0.25905598243688255
pl
0.24496644295302009
lt
0.24242424242424243
cs
0.23642533936651583
sk
0.23382519863791143
hr
0.21784472769409038
es
0.19642857142857142
ca
0.1945107398568019
az
0.19258373205741627
pt
0.1818181818181818
et
0.17076167076167076
hu
0.14772727272727273
sq
0.12903225806451613
fi
0.12790697674418602
tr
0.10239361702127658
reliable
true
passes
true
6
name
"Selvi"
dialogue
"Mujhe lagta hai ki jo bhi hua hai, usse sirf mein hi nahi, si shayad koi aur bhi jaanta hai... par shabdon mein nahi keh sakti hoon."
detectedLang
"sq"
heavyLang
"la"
scores
sq
0.41373534338358464
ca
0.38704028021015763
sl
0.38217122683142096
tl
0.381078691423519
lv
0.32367149758454106
ro
0.3210475266731329
yo
0.31906614785992216
it
0.3027888446215139
eu
0.2623814541622761
lt
0.25925925925925924
fr
0.2576882290562036
da
0.21787709497206703
es
0.20454545454545453
fi
0.20454545454545453
no
0.18604651162790697
vi
0.18319719953325553
az
0.1822429906542056
sv
0.1822429906542056
et
0.15764139590854392
de
0.15151515151515152
sk
0.149453219927096
hu
0.14738124238733252
cs
0.1452991452991453
tr
0.12935323383084577
pt
0.0967741935483871
en
0.08256880733944955
hr
0.08256880733944955
reliable
false
passes
false
7
name
"Hunter"
dialogue
"Sorry, I’ve got to take this. Thanks for talking with me, Selvi. If you think of anything else, let me know, okay?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.812709894084216
tl
0.6127136752136751
no
0.465339233038348
nl
0.41953562850280224
da
0.38402718776550554
sl
0.35094001790510293
pt
0.32242990654205606
cs
0.3068833652007648
yo
0.25641025641025644
sq
0.25488180883864336
ms
0.24870466321243526
fi
0.2032967032967033
it
0.19889502762430938
sv
0.19084821428571427
de
0.18355855855855857
fr
0.1826381059751973
is
0.1807909604519774
vi
0.16666666666666666
pl
0.1569767441860465
sk
0.15402567094515754
lv
0.14403778040141677
hr
0.13793103448275862
lt
0.12545235223160436
es
0.12333736396614269
et
0.11585365853658537
ca
0.09600997506234414
reliable
true
passes
true