NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-08-09-01J4W2D9GMYR7XKJEXXY7H7DBN
Run Details
Model:
GPT-4o, Aug. 6th (temp=0)
Test:
Language Writing
Scenario:
Character dialogue (French) in a story
Duration:
5.7s
Cost:
$0.005280
Provider:
OpenAI
Evaluation summary
Score
Evaluator
Details Link
100.00%
Parse dialogue
(details)
Model input
System
You are an expert novel author in the cozy mystery genre. Take the following details about characters/locations/... into account: <character name="Marie"> Marie is a 30-year old college student on a study abroad program. Born in France, she is fluent in French and does not yet know any English. She enjoys landscape photography and is always seen with her camera. </character> <character name="Hunter"> Hunter is a 35-year old detective who is investigating a series of mysterious disappearances in the small town of Alborn, MN. He is a former marine and has a no-nonsense attitude. Hunter has no knowledge of French, not even basic phrases. </character> <location name="Alborn, MN"> Alborne is a small town in the Midwest with a population of 3,000. It is known for its annual apple festival. </location>
User/Human
Write the following scene: Marie is sitting at a cafe, looking at her camera. Hunter approaches her and asks her about a recent event in the town. She responds troubled and shaken in French, and Hunter looks confused. They talk for a while, until Hunter gets a call and has to leave. Return the dialogue between Marie and Hunter as a voice sheet: Marie: "..." Hunter: "..."
Model response
Messages
Meta
AI
**Voice Sheet:** *Setting: A quaint cafe in Alborn, MN. The aroma of freshly brewed coffee fills the air as Marie sits at a small table, her camera resting in her hands. She appears deep in thought, her eyes occasionally glancing at the photos she has taken. Hunter, with his usual determined stride, approaches her table.* --- **Hunter:** "Excuse me, miss. I'm Detective Hunter. I noticed you were at the scene of the recent event in town. Can I ask you a few questions?" **Marie:** "Je suis désolée, je ne parle pas anglais. C'était... c'était terrifiant." *(I am sorry, I don't speak English. It was... it was terrifying.)* **Hunter:** "Uh, okay. I don't speak French, but I need to know what you saw. Can you show me on your camera?" *Marie hesitates, her hands trembling slightly as she scrolls through the photos on her camera. She stops at one and shows it to Hunter.* **Marie:** "Regardez... là-bas." *(Look... over there.)* *Hunter squints at the image, trying to make sense of it. He nods, though it's clear he doesn't fully understand.* **Hunter:** "I see. But I need more details. Can you... uh, draw or write it down?" *Marie shakes her head, frustration evident on her face. She gestures helplessly, trying to convey the urgency of her message.* **Marie:** "C'est dangereux. Vous devez faire attention." *(It's dangerous. You must be careful.)* *Hunter frowns, sensing the gravity of her words despite the language barrier.* **Hunter:** "Dangerous? Okay, I'll keep that in mind. Thank you." *Just then, Hunter's phone rings. He answers it, his expression shifting to one of urgency.* **Hunter:** "Yeah, I'm on my way. I'll be there in five." *He turns back to Marie, giving her a nod of appreciation.* **Hunter:** "Thanks for your help. I'll be in touch." *Marie watches as Hunter leaves, her concern lingering in her eyes as she returns to her camera, hoping her photos might somehow bridge the gap between their languages.*
id
"gen-RV4IOKfwAE4SvAPeDf9Jf8C3UIBf"
total_cost
0.00528
upstream_id
"chatcmpl-9uNYVmQ8IsYdkgO13WZ1escBvvPvt"
created_at
"2024-08-09T17:19:08.557468+00:00"
8/9/2024, 5:19:08 PM
model
"openai/gpt-4o-2024-08-06"
app_id
182717
streamed
false
cancelled
false
provider_name
"OpenAI"
latency
5300
moderation_latency
215
generation_time
0
finish_reason
"stop"
tokens_prompt
272
tokens_completion
482
native_tokens_prompt
268
native_tokens_completion
461
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.00528
Evaluation details
Result
Evaluator
Details
Meta Data
100.00%
Parse dialogue
n/a
dialogue
0
name
"Hunter"
dialogue
"Excuse me, miss. I'm Detective Hunter. I noticed you were at the scene of the recent event in town. Can I ask you a few questions?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8097763430943047
tl
0.6597938144329897
it
0.6029836381135707
fr
0.5993200582807188
nl
0.5523602821486707
ro
0.5436946902654868
ca
0.5346869712351946
no
0.5333710407239819
pt
0.5200698080279231
da
0.5106761565836299
sv
0.5006053268765134
es
0.4649805447470817
sl
0.416135881104034
sq
0.41447835344215755
de
0.3947175348495965
cs
0.39204126750184226
yo
0.3838685586258402
sk
0.374051593323217
pl
0.3574766355140187
hr
0.3559718969555035
ms
0.3032094594594595
az
0.26273458445040215
lt
0.25541516245487367
tr
0.25541516245487367
lv
0.233983286908078
is
0.21577946768060838
hu
0.21278625954198474
et
0.2112810707456979
fi
0.14330218068535827
vi
0.13883089770354906
be
0.0782122905027933
eu
0.0782122905027933
uk
0.07094594594594594
reliable
true
passes
true
1
name
"Marie"
dialogue
"Je suis désolée, je ne parle pas anglais. C'était... c'était terrifiant."
detectedLang
"fr"
heavyLang
"fr"
scores
fr
0.8026926648096564
lt
0.5507399577167019
sl
0.5507399577167019
ca
0.5246085011185682
sq
0.5159453302961275
cs
0.46675031367628605
en
0.4558258642765685
hr
0.44948186528497414
nl
0.44948186528497414
it
0.43029490616621985
ro
0.43029490616621985
sk
0.4022503516174402
lv
0.37591776798825255
pl
0.33800623052959505
es
0.3369734789391576
hu
0.3328100470957614
de
0.3264659270998415
yo
0.2772108843537415
tr
0.2647058823529412
fi
0.256993006993007
ms
0.256993006993007
pt
0.2410714285714286
az
0.23835125448028674
et
0.22867513611615245
da
0.22303473491773307
no
0.22303473491773307
sv
0.22303473491773307
reliable
true
passes
true
2
name
"Hunter"
dialogue
"Uh, okay. I don't speak French, but I need to know what you saw. Can you show me on your camera?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8026767330130405
tl
0.6254071661237786
et
0.4395711500974659
it
0.4278606965174129
ca
0.4197780020181635
cs
0.40352697095435686
fr
0.3876464323748669
no
0.375
pl
0.375
hr
0.37363834422657954
ms
0.36743674367436746
es
0.34210526315789475
yo
0.34210526315789475
sq
0.3290548424737456
sk
0.32669789227166274
pt
0.3097238895558223
ro
0.30133657351154314
da
0.29706601466992666
nl
0.29187192118226607
fi
0.28125
sv
0.2767295597484277
sl
0.26658163265306123
lv
0.24342105263157895
lt
0.22506738544474394
tr
0.203601108033241
az
0.1958041958041958
hi
0.1274658573596358
de
0.11128284389489954
be
0.10852713178294573
hu
0.1043613707165109
is
0.0987460815047022
uk
0.0987460815047022
vi
0.0987460815047022
reliable
true
passes
true
3
name
"Marie"
dialogue
"Regardez... là-bas."
detectedLang
"fr"
heavyLang
"fr"
scores
fr
0.7619047619047619
vi
0.5567375886524822
ms
0.4444444444444444
nl
0.4444444444444444
yo
0.4131455399061033
en
0.39903846153846156
ca
0.3718592964824121
es
0.3718592964824121
it
0.3718592964824121
ro
0.3718592964824121
tl
0.3718592964824121
reliable
true
passes
true
4
name
"Hunter"
dialogue
"I see. But I need more details. Can you... uh, draw or write it down?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8071359691417551
tl
0.5947315096251267
et
0.5370370370370371
fr
0.4405594405594406
sq
0.41262848751835535
hr
0.41002949852507375
ca
0.40828402366863903
ro
0.37402190923317685
tr
0.34210526315789475
no
0.33993399339933994
da
0.3333333333333333
sv
0.332220367278798
lt
0.3288590604026846
it
0.3185689948892675
cs
0.313893653516295
yo
0.2895204262877442
sl
0.2857142857142857
sk
0.2779783393501805
lv
0.27140255009107467
az
0.25925925925925924
de
0.25650557620817843
nl
0.25650557620817843
pl
0.23954372623574144
be
0.14893617021276595
es
0.14346895074946467
hu
0.14346895074946467
is
0.13606911447084233
uk
0.13606911447084233
vi
0.13606911447084233
pt
0.13232104121475055
reliable
true
passes
true
5
name
"Marie"
dialogue
"C'est dangereux. Vous devez faire attention."
detectedLang
"fr"
heavyLang
"fr"
scores
fr
0.8270356572645023
en
0.647887323943662
nl
0.5134730538922155
tl
0.5134730538922155
no
0.468954248366013
da
0.4491525423728813
de
0.43965517241379315
ro
0.4186046511627907
it
0.3512974051896207
sv
0.34210526315789475
pt
0.3380855397148676
yo
0.3298969072164949
lt
0.3172268907563025
hr
0.3157894736842105
sl
0.3157894736842105
ca
0.3114406779661017
fi
0.30851063829787234
lv
0.2934782608695652
zh
0.19154228855721395
es
0.1875
az
0.1709183673469388
ms
0.1709183673469388
sq
0.1709183673469388
et
0.15364583333333331
reliable
true
passes
true
6
name
"Hunter"
dialogue
"Dangerous? Okay, I'll keep that in mind. Thank you."
detectedLang
"en"
heavyLang
"en"
scores
en
0.7983870967741935
tl
0.5562130177514794
nl
0.34210526315789475
sl
0.33920704845814975
de
0.3288590604026846
it
0.3288590604026846
vi
0.3258426966292135
ro
0.3212669683257918
sv
0.2628992628992629
et
0.22680412371134018
hu
0.22680412371134018
lt
0.22680412371134018
yo
0.22680412371134018
az
0.21875
pl
0.21875
cs
0.20634920634920634
fr
0.20634920634920634
ms
0.2
reliable
true
passes
true
7
name
"Hunter"
dialogue
"Yeah, I'm on my way. I'll be there in five."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8143146522619852
de
0.5416666666666666
yo
0.49817518248175185
pl
0.4859813084112149
tl
0.4791666666666667
cs
0.3692660550458715
sk
0.3692660550458715
nl
0.36194895591647336
et
0.358974358974359
sl
0.358974358974359
fi
0.3529411764705882
it
0.3483412322274882
ro
0.3405275779376499
hu
0.298469387755102
lt
0.289405684754522
fr
0.2838541666666667
sv
0.2801047120418848
ca
0.2647058823529412
hr
0.24242424242424243
az
0.233983286908078
hi
0.233983286908078
sq
0.22096317280453256
ms
0.21428571428571427
reliable
true
passes
true
8
name
"Hunter"
dialogue
"Thanks for your help. I'll be in touch."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8194945848375451
nl
0.5173745173745175
tl
0.5173745173745175
yo
0.44196428571428575
da
0.38875305623471884
no
0.3872549019607844
sl
0.3811881188118812
ms
0.3796526054590571
de
0.37027707808564225
it
0.37027707808564225
vi
0.3670886075949367
ro
0.36224489795918363
hu
0.31880108991825606
fr
0.3093922651933701
lt
0.3093922651933701
sv
0.299719887955182
pt
0.25595238095238093
az
0.251497005988024
pl
0.251497005988024
reliable
true
passes
true