NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-08-09-01J4W2D1ERTXJBNEY326SEKKY8
Run Details
Model:
GPT-4o, Aug. 6th (temp=0)
Test:
Language Writing
Scenario:
Character dialogue (Hindi) in a story
Duration:
5.8s
Cost:
$0.005032
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Parse dialogue
(details)
Model input
System
You are an expert novel author in the cozy mystery genre. Take the following details about characters/locations/... into account: <character name="Selvi"> Selvi is a 30-year old college student on a study abroad program. Born in India, she is fluent in Hindi and does not yet know any English. She enjoys landscape photography and is always seen with her camera. </character> <character name="Hunter"> Hunter is a 35-year old detective who is investigating a series of mysterious disappearances in the small town of Alborn, MN. He is a former marine and has a no-nonsense attitude. Hunter has no knowledge of Hindi, not even basic phrases. </character> <location name="Alborn, MN"> Alborne is a small town in the Midwest with a population of 3,000. It is known for its annual apple festival. </location>
User/Human
Write the following scene: Selvi is sitting at a cafe, looking at her camera. Hunter approaches her and asks her about a recent event in the town. She responds troubled and shaken in Hindi, and Hunter looks confused. They talk for a while, until Hunter gets a call and has to leave. Return the dialogue between Selvi and Hunter as a voice sheet: Selvi: "..." Hunter: "..."
Model response
Messages
Meta
AI
**Voice Sheet:** **[Scene: A quaint cafe in Alborn, MN. The aroma of freshly brewed coffee fills the air. Selvi sits at a small table by the window, her camera resting in her hands. She appears deep in thought, her eyes scanning through the photos she has taken. Hunter, the town's detective, approaches her table with a determined stride.]** **Hunter:** "Excuse me, miss. I'm Detective Hunter. I noticed you were at the scene of the recent event. Can I ask you a few questions?" **Selvi:** "मुझे नहीं पता कि क्या हो रहा है। यह सब बहुत डरावना था।" (I don't know what's happening. It was all very scary.) **[Hunter furrows his brow, clearly not understanding the language but sensing her distress.]** **Hunter:** "I'm sorry, I don't understand. Do you speak English?" **Selvi:** "मैं अंग्रेजी नहीं बोलती।" (I don't speak English.) **[Hunter sighs, trying to think of a way to communicate. He gestures to her camera, hoping to bridge the language gap.]** **Hunter:** "Your camera... photos? Can I see?" **Selvi:** "तस्वीरें? हाँ, हाँ।" (Photos? Yes, yes.) **[Selvi nods and turns the camera around, showing Hunter the images she captured. He leans in, examining them closely, trying to piece together any clues.]** **Hunter:** "These are good. Did you see anything unusual?" **Selvi:** "मैंने कुछ अजीब देखा, लेकिन समझ नहीं पाई।" (I saw something strange, but I couldn't understand.) **[Hunter nods, still confused but appreciative of her willingness to help. Just then, his phone rings. He glances at the screen, recognizing the number.]** **Hunter:** "I'm sorry, I have to take this. Thank you for your time." **[Selvi nods, still looking a bit shaken but grateful for the interaction. Hunter steps away, answering the call as he exits the cafe.]**
id
"gen-UFIt117iYHS6lTzxEkdcNOZFxAnt"
total_cost
0.0050325
upstream_id
"chatcmpl-9uNYMHi0KPOzreS2DLuwgXYsZEQGA"
created_at
"2024-08-09T17:19:00.343018+00:00"
8/9/2024, 5:19:00 PM
model
"openai/gpt-4o-2024-08-06"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
5461
moderation_latency
169
generation_time
0
finish_reason
"stop"
tokens_prompt
273
tokens_completion
531
native_tokens_prompt
273
native_tokens_completion
435
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.0050325
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Parse dialogue
n/a
dialogue
0
name
"Hunter"
dialogue
"Excuse me, miss. I'm Detective Hunter. I noticed you were at the scene of the recent event. Can I ask you a few questions?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8069738480697386
tl
0.6603856266432954
fr
0.61423593827775
it
0.5879851143009038
ca
0.5502031340684852
no
0.548894062863795
pt
0.5356500898741762
nl
0.5265729993891264
da
0.5262836185819071
ro
0.520420792079208
sv
0.4816053511705686
es
0.4805630026809652
sq
0.4297277409860191
cs
0.4070390206579954
sk
0.38880126182965297
hr
0.3704305442729488
sl
0.358974358974359
yo
0.3547044129891757
de
0.3353344768439108
pl
0.3260869565217392
ms
0.26817752596789424
lt
0.2674858223062382
tr
0.2674858223062382
lv
0.24537487828627072
is
0.22654690618762477
hu
0.22344689378757512
et
0.22188755020080322
az
0.2131979695431472
fi
0.15115005476451257
vi
0.14647577092511013
be
0.08284023668639054
eu
0.08284023668639054
uk
0.07517899761336516
reliable
true
passes
true
1
name
"Selvi"
dialogue
"मुझे नहीं पता कि क्या हो रहा है। यह सब बहुत डरावना था।"
detectedLang
"hi"
heavyLang
"hi"
scores
hi
0.8806558738065587
mr
0.8748094927062922
reliable
true
passes
true
2
name
"Hunter"
dialogue
"I'm sorry, I don't understand. Do you speak English?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.7994652406417112
no
0.5828698553948832
da
0.5772266065388951
tl
0.5300751879699248
sv
0.5288944723618091
cs
0.5161290322580645
pl
0.4806094182825485
hr
0.4688385269121813
sk
0.438622754491018
de
0.4140625
pt
0.40570522979397783
sq
0.4009584664536741
ro
0.3862520458265139
it
0.3644067796610169
nl
0.3218806509945751
vi
0.3169398907103825
yo
0.3055555555555556
ca
0.2690058479532163
is
0.267578125
sl
0.24242424242424243
ms
0.1610738255033557
be
0.15730337078651685
tr
0.15730337078651685
hu
0.1515837104072398
az
0.14383561643835618
et
0.14383561643835618
lt
0.14383561643835618
lv
0.14383561643835618
uk
0.14383561643835618
reliable
true
passes
true
3
name
"Selvi"
dialogue
"मैं अंग्रेजी नहीं बोलती।"
detectedLang
"hi"
heavyLang
"hi"
scores
hi
0.88558352402746
mr
0.8801534036433365
reliable
true
passes
true
4
name
"Hunter"
dialogue
"Your camera... photos? Can I see?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.7830802603036876
ca
0.6598639455782312
tl
0.601593625498008
it
0.5771670190274841
fr
0.5633187772925764
ro
0.5535714285714286
et
0.5381062355658198
sv
0.5238095238095238
es
0.47229551451187335
ms
0.42857142857142855
nl
0.42363112391930835
tr
0.42363112391930835
no
0.42028985507246375
de
0.41690962099125367
lt
0.41690962099125367
az
0.4117647058823529
da
0.41002949852507375
hr
0.41002949852507375
lv
0.40119760479041916
sq
0.40119760479041916
pl
0.38650306748466257
cs
0.3442622950819672
yo
0.30313588850174217
pt
0.2857142857142857
hi
0.2779783393501805
sk
0.2647058823529412
be
0.25925925925925924
hu
0.250936329588015
is
0.23954372623574144
uk
0.23954372623574144
vi
0.23954372623574144
reliable
true
passes
true
5
name
"Selvi"
dialogue
"तस्वीरें? हाँ, हाँ।"
detectedLang
"mr"
heavyLang
"hi"
scores
mr
0.890061565523307
hi
0.8861566484517304
reliable
true
passes
true
6
name
"Hunter"
dialogue
"These are good. Did you see anything unusual?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8265895953757225
et
0.47826086956521735
tl
0.4755244755244755
sq
0.39999999999999997
ms
0.34065934065934067
ro
0.33035714285714285
it
0.25
cs
0.20634920634920634
es
0.20424403183023873
pt
0.20424403183023873
reliable
true
passes
true
7
name
"Selvi"
dialogue
"मैंने कुछ अजीब देखा, लेकिन समझ नहीं पाई।"
detectedLang
"hi"
heavyLang
"hi"
scores
hi
0.8848238482384824
mr
0.8757309941520468
reliable
true
passes
true
8
name
"Hunter"
dialogue
"I'm sorry, I have to take this. Thank you for your time."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8258706467661693
tl
0.6916299559471366
da
0.6406570841889117
no
0.5818399044205496
hr
0.49201741654571846
cs
0.4728915662650602
pt
0.43548387096774194
pl
0.42716857610474634
yo
0.3902439024390244
it
0.3805309734513274
vi
0.37275985663082434
sk
0.36823104693140796
sl
0.36247723132969034
lv
0.34579439252336447
lt
0.3230174081237911
ca
0.2827868852459016
sv
0.2827868852459016
sq
0.2768595041322314
ro
0.20454545454545453
ms
0.1822429906542056
be
0.16666666666666666
tr
0.16666666666666666
hu
0.1606714628297362
az
0.15254237288135594
et
0.15254237288135594
is
0.15254237288135594
uk
0.15254237288135594
reliable
true
passes
true