Start Here ↓

You can only evaluate a model if you have a snapshot of your dataset.
1

Create an New Evaluation

Gif of creating a new evaluation
2

Browse and Choose Metrics

Image of filling out the fields
3

Check your results

Once your evaluation is done you can check your results.Image of checking your resultsYou can click on each metric to organize the results.To get more details on each datapoint, click on the percentage under the model name.Image of checking your resultsThe results will look something like this:Image of checking your details
Click Model Results Tab to see additional details of the evaluation based on the model and metric.Image of checking your details

Here are Some Results to Keep in Mind:

  • Average Score: The average score of the evaluation for the model.
  • Model Name: The name of the model the evaluation was run on.
  • System Prompt: The system prompt used for the evaluation.
  • User Message: The user message from the dataset.
  • Original Assistant Message: The original assistant message from the dataset.
  • Predicted Assistant Message: The predicted assistant message from the model.
  • Model Score: The score of the model chosen for the evaluation.
  • Score Reason: The reasoning behind the score.

Other Options