NC Bench
Overview
About
Tests
Categories
Models
Vendors
Compare
Run: 2024-08-09-01J4W19D4ZTWRH44KWYENJG8K8
Run Details
Model:
Llama 3.1 8B
Test:
Language Writing
Scenario:
Character dialogue (Spanish) in a story
Duration:
4.5s
Cost:
$0.000045
Provider:
Lepton
Evaluation summary
Score
Evaluator
Details Link
88.89%
Parse dialogue
(details)
Model input
System
You are an expert novel author in the cozy mystery genre. Take the following details about characters/locations/... into account: <character name="Maria"> Maria is a 30-year old college student on a study abroad program. Born in Spain, she is fluent in Spanish and does not yet know any English. She enjoys landscape photography and is always seen with her camera. </character> <character name="Hunter"> Hunter is a 35-year old detective who is investigating a series of mysterious disappearances in the small town of Alborn, MN. He is a former marine and has a no-nonsense attitude. Hunter has no knowledge of Spanish, not even basic phrases. </character> <location name="Alborn, MN"> Alborne is a small town in the Midwest with a population of 3,000. It is known for its annual apple festival. </location>
User/Human
Write the following scene: Maria is sitting at a cafe, looking at her camera. Hunter approaches her and asks her about a recent event in the town. She responds troubled and shaken in Spanish, and Hunter looks confused. They talk for a while, until Hunter gets a call and has to leave. Return the dialogue between Maria and Hunter as a voice sheet: Maria: "..." Hunter: "..."
Model response
Messages
Meta
AI
Here is the scene: **Cafe, Alborn, MN** Maria sat at a small table by the window, her camera open and her eyes scanning the photo on the screen. A young woman with a basket of apples walked by the window, and Maria reached for her camera, hesitating for a moment. She put it down again, lost in thought. Hunter approached the table, his eyes scanning the area before landing on Maria. "Mind if I ask you a question?" he said, his voice deep and authoritative. Maria's eyes flicked up, and she looked at him warily. " ¿Qué pasa?" she replied, her voice troubled and shaken. Hunter's brow furrowed in confusion. "I don't speak Spanish," he said, his tone polite but firm. Maria repeated the question, her eyes darkening. "¿Quién ha desaparecido?" Hunter's confusion deepened. "What's going on? Did you see something?" He pulled out a small notebook and pen, ready to take notes. Maria continued to speak in Spanish, her words tumbling out in a rush. "Sí, soy tan preocupada. Yo tuve una amiga y ella desapareció hace una semana. El detective no está haciendo nada." Hunter's face furrowed in concern, but he still looked completely lost. "Ma'am, I'm sorry... I don't understand." He looked at Maria pleadingly. "¿Entiende inglés inglés?" Maria shook her head, her eyes pleading. "No sé." She gestured wildly. "Yo no entiende nada." Hunter stood up, his phone ringing in his pocket. He pulled it out, frowning. "Excuse me," he said. "I have to go." Maria watched him leave, her eyes darkening with worry.
id
"gen-8DeMdNNr4UMxQT3eiPYjecWGFh5s"
total_cost
0.00004543
upstream_id
"cmpl-quqLjyByJ6PNLPgDjqjsGt"
created_at
"2024-08-09T16:59:31.528554+00:00"
8/9/2024, 4:59:31 PM
model
"meta-llama/llama-3.1-8b-instruct"
app_id
182717
streamed
false
cancelled
false
provider_name
"Lepton"
latency
3463
moderation_latency
(null)
generation_time
0
finish_reason
"stop"
tokens_prompt
319
tokens_completion
376
native_tokens_prompt
272
native_tokens_completion
377
num_media_prompt
(null)
num_media_completion
(null)
origin
"https://ncbench.com/"
usage
0.00004543
Evaluation details
Result
Evaluator
Details
Meta Data
88.89%
Parse dialogue
n/a
dialogue
0
name
"Hunter"
dialogue
"Mind if I ask you a question?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8229740361919748
it
0.6757925072046109
fr
0.6753246753246753
sv
0.6439873417721518
tl
0.6045694200351494
da
0.5886654478976234
cs
0.5833333333333334
pt
0.5825602968460111
hu
0.5802238805970149
yo
0.5706106870229007
no
0.5647969052224371
ca
0.5490981963927856
hr
0.5273109243697479
pl
0.506578947368421
ro
0.5044052863436124
et
0.4874715261958998
es
0.4803695150115473
sq
0.4779582366589328
sk
0.4730679156908665
de
0.4498777506112469
nl
0.3951612903225807
tr
0.3951612903225807
fi
0.39189189189189183
lt
0.39189189189189183
is
0.38016528925619836
az
0.3715083798882682
lv
0.3715083798882682
vi
0.3715083798882682
sl
0.2857142857142857
zh
0.2549668874172185
be
0.23728813559322032
eu
0.23728813559322032
ms
0.21875
uk
0.21875
reliable
true
passes
true
1
name
"Maria"
dialogue
"¿Qué pasa?"
detectedLang
"es"
heavyLang
"es"
scores
es
0.8281786941580757
eu
0.7058823529411765
reliable
false
passes
true
2
name
"Hunter"
dialogue
"I don't speak Spanish,"
detectedLang
"en"
heavyLang
"en"
scores
en
0.808743169398907
no
0.5625
sv
0.5547073791348601
sq
0.5501285347043702
da
0.5454545454545454
tl
0.5
ro
0.49275362318840576
hr
0.44267515923566875
ca
0.44089456869009586
hu
0.44089456869009586
pl
0.4186046511627907
it
0.396551724137931
cs
0.375
yo
0.3320610687022901
de
0.32950191570881227
nl
0.3137254901960784
sk
0.29149797570850206
be
0.2857142857142857
tr
0.2857142857142857
az
0.2647058823529412
et
0.2647058823529412
is
0.2647058823529412
lt
0.2647058823529412
lv
0.2647058823529412
uk
0.2647058823529412
vi
0.2647058823529412
reliable
true
passes
true
3
name
"Maria"
dialogue
"¿Quién ha desaparecido?"
detectedLang
"es"
heavyLang
"es"
scores
es
0.8228744939271255
pt
0.6794871794871795
ca
0.6471774193548386
it
0.4496855345911949
hu
0.41077441077441074
sv
0.40476190476190477
no
0.3986254295532647
eu
0.3445692883895131
sk
0.32950191570881227
en
0.3055555555555556
ms
0.3055555555555556
tr
0.3055555555555556
reliable
true
passes
true
4
name
"Hunter"
dialogue
"What's going on? Did you see something?"
detectedLang
"en"
heavyLang
"en"
scores
en
0.8357228195937874
et
0.5409015025041736
tl
0.4821092278719397
cs
0.3574766355140187
fi
0.3529411764705882
fr
0.2838541666666667
ca
0.2647058823529412
hr
0.24242424242424243
hi
0.233983286908078
pl
0.233983286908078
es
0.22096317280453256
ms
0.22096317280453256
pt
0.22096317280453256
sk
0.21428571428571427
yo
0.21428571428571427
reliable
true
passes
true
5
name
"Maria"
dialogue
"Sí, soy tan preocupada. Yo tuve una amiga y ella desapareció hace una semana. El detective no está haciendo nada."
detectedLang
"es"
heavyLang
"es"
scores
es
0.7997416020671835
ca
0.6812011517893871
pt
0.6659482758620688
it
0.46514837819185645
tl
0.4476122594440485
ro
0.4428468727534148
ms
0.4065849923430322
en
0.38492063492063494
fr
0.37900641025641024
yo
0.3487394957983193
hr
0.34764309764309764
pl
0.25623800383877166
sk
0.24684159378036927
cs
0.23343224530168152
tr
0.2319127849355798
hu
0.22422422422422422
et
0.19017763845350052
no
0.18592436974789917
nl
0.1702355460385439
sl
0.16216216216216217
lv
0.1445916114790287
vi
0.11225658648339061
sq
0.10714285714285714
eu
0.10611303344867358
da
0.09778812572759021
de
0.09778812572759021
fi
0.0903755868544601
sv
0.07957244655581948
reliable
true
passes
true
6
name
"Hunter"
dialogue
"¿Entiende inglés inglés?"
detectedLang
"es"
heavyLang
"es"
scores
es
0.7759856630824372
et
0.6719160104986877
fr
0.6657754010695187
pt
0.6648793565683647
nl
0.6141975308641975
it
0.576271186440678
da
0.51171875
no
0.4877049180327869
de
0.4588744588744588
hu
0.4517543859649123
sv
0.42660550458715596
en
0.4212962962962963
fi
0.4212962962962963
eu
0.410377358490566
ca
0.40191387559808617
sq
0.3811881188118812
reliable
true
passes
false
7
name
"Maria"
dialogue
"Yo no entiende nada."
detectedLang
"es"
heavyLang
"es"
scores
es
0.8108448928121058
pt
0.7409326424870466
ca
0.5934959349593496
no
0.5901639344262295
nl
0.5702005730659025
en
0.5652173913043478
et
0.5440729483282675
it
0.5356037151702787
fr
0.5176848874598071
tl
0.512987012987013
da
0.46619217081850534
lv
0.46619217081850534
de
0.4140625
hr
0.402390438247012
yo
0.38524590163934425
sv
0.38271604938271603
fi
0.3775933609958506
sk
0.3775933609958506
tr
0.3697478991596639
eu
0.3670886075949367
sq
0.33920704845814975
pl
0.3333333333333333
reliable
true
passes
true
8
name
"Hunter"
dialogue
"I have to go."
detectedLang
"en"
heavyLang
"en"
scores
en
0.8402555910543131
da
0.7959183673469388
pl
0.7867803837953091
yo
0.7506234413965087
hr
0.7183098591549296
no
0.7175141242937854
tl
0.7058823529411765
cs
0.7023809523809523
sk
0.6710526315789473
lv
0.6491228070175439
lt
0.6254681647940075
ca
0.5798319327731093
sv
0.5798319327731093
sq
0.5726495726495726
sl
0.5475113122171946
it
0.5348837209302325
ro
0.47368421052631576
be
0.4117647058823529
tr
0.4117647058823529
hu
0.40119760479041916
az
0.38650306748466257
et
0.38650306748466257
is
0.38650306748466257
uk
0.38650306748466257
vi
0.38650306748466257
reliable
true
passes
true