Final Evaluations
After models are trained, final evaluations can be triggered.
Run an evaluation:
task eval -- --config taskcluster/configs/eval.yml
Make sure and update the eval.yml
file for your particular model. The evals will be logged to trigger-eval.log
and uploaded to the bucket specified.
LLM Evaluation
An LLM can provide an evaluation using the OpenAI API. This will provide an analysis for an evaluation datasets of the following metrics, with a score of 1-5 and an explanation of the score:
- adequacy
- fluency
- terminology
- hallucination
- punctuation
See pipeline/eval/eval-batch-instructions.md(pipeline/eval/eval-batch-instructions.md) for the full prompt for this analysis.
This evaluation can be viewed using the LLM Evals dashboard by providing the root URL to where the JSON files are located.