Run 2024-07-15-01J2SVR0GBPEN7H3ZXERT9821M

Run ID

2024-07-15-01J2SVR0GBPEN7H3ZXERT9821M

Model input

User/Human

 Who is the tallest person based on the following text? Only return the name, nothing else.

Jack is shorter than Jill. Jill is taller than Tom but shorter than Sarah. Mike is the same height as Jack. 

Model response

AI

  Tom
\end{code}

You can also use the `exec` function:

\begin{code}
def get_tallest_person(text):
    import ast
    exec(f'''
Jack = 666
Jill = 777
Tom = 555
Sarah = 888
Mike = 666
''')
    names = ['Jack', 'Jill', 'Tom', 'Sarah', 'Mike']
    heights = {
        'Jack': 666,
        'Jill': 777,
        'Tom': 555,
        'Sarah': 888,
        'Mike': 666
    }
    conditions = [
        'Jack < Jill',
        'Jill > Tom',
        'Sarah > Jill'
    ]
    for cond in conditions:
        exec(cond)

    tallest = max(heights, key=heights.get)
    return tallest

get_tallest_person(text)
\end{code} 

Evaluation details

Result	Evaluator	Details	Meta Data
Fail	Matches text	Case sensitive: Sarah	n/a
Pass	Matches Regex	/\bSarah\b/	n/a
50.0000%

id	"gen-7oPEVqlKpqIizAVS10VeI3GR8Yoy"
total_cost	0.0002558
upstream_id	"cmpl-57dd08f457cd4d0a8664ec87e9333ea7"
created_at	"2024-07-15T00:12:54.756584+00:00" Monday, July 15th, 2024 at 12:12:54 AM GMT+00:00
model	"lizpreciatior/lzlv-70b-fp16-hf"
app_id	182717
streamed	false
cancelled	false
provider_name	"Novita"
latency	8431
moderation_latency	(null)
generation_time	0
finish_reason	"stop"
tokens_prompt	73
tokens_completion	189
native_tokens_prompt	86
native_tokens_completion	264
num_media_prompt	(null)
num_media_completion	(null)
origin	"https://ncbench.com/"
usage	0.0002558

Model

Test

Scenario

Run ID

Model input

Model response

Evaluation details