Evaluates different Text to SQL models and pipelines with standard benchmarks and metrics.
premsql
to evaluate models or pipelines using these metrics.
Lets setup all the things required for an evaluation. For an evaluation,
we need an evaluation / validation dataset. A generator that will generate and generate and
saves the results. We also need an executor which we will plug in to our evaluator
to execute the SQL queries which will be compared with the ground truth SQL queries.
Lets start with importing the required libraries and setting up our Datasets
and Generators. If you are not familiar with Datasets and Generators, you can checkout out
Datasets and Generators guide.
Output
accuracy
or ves
.filter_by
option to filter the results by db_id
. This will
not only give you the overall accuracy but also it will divide the distribution of the accuracy
over all the available databases. If you have a key called: difficulty
then it will show the result
distribution over different available difficulty. Make sure these keys are available in the dataset, because
if these are available in the dataset then it will be reflected on the model responses and hence on
the results.
Once evaluation is finished it will write three files:
predict.json
: This file contains the responses and the evaluation results.accuracy.json
: This file contains the accuracy results.ves.json
: This file contains the VES results. This will only be written if you have provided the
metric_name to be ves
.predict.json
would look like after evaluation.
db_id
.
predict.json
and the ability to plot graphs plot over different
filters makes it very useful to analyse the results and evaluate the pipelines empirically.