Premsql generators are responsible for converting natural language questions into SQL queries. Think of these as modular inference APIs specific to text-to-SQL. You can integrate various third-party APIs, models, or custom pipelines.

In this guide, we’ll explore how to utilize both Hugging Face and PremAI providers to work with local and hosted models. Finally, we’ll demonstrate how to create custom generators. Let’s get started by importing the necessary packages.

from premsql.generators import Text2SQLGeneratorHF
from premsql.datasets import Text2SQLDataset

Let’s load a sample dataset:

bird_dataset = Text2SQLDataset(
    dataset_name='bird', split="train", force_download=False,
    dataset_folder="/path/to/dataset"
)

If you are not familiar with datasets, then you can checkout out Datasets guide.

How Generators Work

PremSql generators offer two main generation strategies:

  1. Simple Generation: Generates SQL from a prompt that includes the table schema, user question, few-shot examples, etc.
  2. Execution-Guided Decoding: Enhances performance by executing the generated SQL on the DB. If an error occurs, the generator uses the error message to self-correct and retries until it succeeds or reaches the maximum trials.

Input and Output for Generators

The input for generators is a data_blob, which should contains:

  • prompt: The prompt to be passed to the model.
  • db_path: The database path.

The prompt should be contain the tabel schemas and also all the required instructions to generate the SQL query. The output is a SQL query. Let’s define our generator using the Prem-1B-SQL model available on HuggingFace.

generator = Text2SQLGeneratorHF(
    model_or_name_or_path="premai-io/prem-1B-SQL",
    experiment_name="test_generators",
    device="cuda:0",
    type="test"
)

Arguments:

  • model_or_name_or_path: Union[str, transformers.PreTrainedModel] The model name or path to be used for the text-to-SQL generation. It can be a string specifying the model path or a pre-loaded model object.

  • experiment_name: str The name of the experiment. It will create a folder with this name in the ./experiments directory to store the outputs.

  • type: str The type of experiment. This can be either train or test. Based on that it will create an experiment folder. So suppose your name is expeiment is: “test_generators” then it will create a folder ./experiments/test_generators_test or ./experiments/test_generators_train based on the type.

  • experiment_folder: Optional[str], default=None The folder where experiment outputs. By default, it will be created in the ./experiments directory.

  • hf_token: Optional[str], default=None Hugging Face token for loading private models from the Hugging Face Hub.

  • device: Optional[str], default=None The device on which the model will be loaded (‘cpu’, ‘cuda’, etc.). If None, it defaults to the available CUDA device if present, otherwise ‘cpu’.

  • kwargs: dict Additional arguments passed to the model or any other configurations.

Using HuggingFace Transformers

Text2SQLGeneratorHF utilizes Hugging Face Transformers. You can use any model from the Hugging Face model hub. Now let’s generate answer for a single data point.


sample = dataset[0]
response = generator.generate(
    data_blob={
        "prompt": sample["prompt"],
    },
    temperature=0.1,
    max_new_tokens=256
)
print(response)

Arguments:

  • data_blob: dict The data blob containing the prompt and other required information.
  • temperature: Optional[float], default=0.0 The temperature value for sampling. Higher values increase diversity.
  • max_new_tokens: Optional[int], default=256 The maximum number of tokens to generate.
  • postprocess: Optional[bool], default=True Whether to postprocess the generated SQL query.
  • kwargs: dict Additional arguments passed to the model or any other configurations.

The generate method runs a single inference without saving results. To generate and save multiple responses inside the experiment_path, use the generate_and_save method.

responses = generator.generate_and_save_results(
    dataset=bird_dataset,
    temperature=0.1,
    max_new_tokens=256
)

print(responses)

Results are saved in the experiment_path as predict.json. On subsequent runs, the cached results are reused unless force=True is specified.

response = generator.generate_and_save(
    dataset=dataset
    temperature=0.1,
    max_new_tokens=256,
    force=True
)

Now this will force the generator to generate the SQL queries again and save them in the predict.json file.

Execution-Guided Decoding

This strategy executes the generated SQL against the DB and, if it fails, uses the error message for correction, repeating until it gets a valid result or the retries run out.

To use this, you’ll need an executor like SQLiteExecutor. You can use any other executors as well. Here’s how to use it with generate_and_save method:

from premsql.executors import SQLiteExecutor

executor = SQLiteExecutor()
response = generator.generate_and_save_results(
    dataset=bird_dataset,
    temperature=0.1,
    max_new_tokens=256,
    force=True,
    executor=executor,
    max_retries=5 # this is optional (default is already set to 5)
)

This approach employs execution-guided decoding, enhancing the reliability of the generated SQL by iterative correction.

Please note that Text2SQLGeneratorHF is only limited for causal models. So Sequence to Sequence models are not supported in first hand. However you can still use them by creating a custom generator or directly feeding the model object while initializing the generator.

More Supported Generators

Other than huggingface models, we also support Prem AI and Open AI APIs. Here is how you can use them:

Prem AI Generator

Prem AI SDK provides all the best LLMs out there in a very reasonable price. You can use them in the same way as you use the Hugging Face Generators.

Before getting started, make sure you have an account at Prem AI Platform and a valid project id and an API Key. Additionally, you also need to have at least one repository-id as a last requirement.

from premsql.generators import Text2SQLGeneratorPremAI

client = Text2SQLGeneratorPremAI(
    project_id=12234,  # Get your project ID at app.premai.io
    model_name="gpt-4o",
    experiment_name="testing_prem_sdk",
    type="test",
    premai_api_key="xxx-xxx-xxx-xxx" # Get your  API key at https://app.premai.io
)

Apple MLX Generator

Please note that this generator only supports Apple Silicon devices only.

To use this generator, make sure the library mlx is installed. If not you can install it by:

pip install mlx mlx-lm

Here’s how you can use Apple MLX as your generation backend for PremSQL.

from premsql.generators import Text2SQLGeneratorMLX
client = Text2SQLGeneratorMLX(
    model_name_or_path="premai-io/prem-1B-SQL",
    experiment_name="testing_mlx",
    type="test",
)

Ollama Generator

For using Ollama integration, first install Ollama, then install the Python client:

pip install ollama

Now you can go to the Ollama models page and choose the model of you choice and run/pull it. For example, we are going to use Prem-1B-SQL and run it using this command:

ollama pull anindya/prem1b-sql-ollama-fp116

Now you can use our Ollama generator using PremSQL using this few lines of code:

from premsql.generators import Text2SQLGeneratorOllama
client = Text2SQLGeneratorMLX(
    model_name_or_path="anindya/prem1b-sql-ollama-fp116",
    experiment_name="testing_ollama",
    type="test",
)

Open AI Generator

It has a very similar interface like the above:

from premsql.generators import Text2SQLGeneratorOpenAI

client = Text2SQLGeneratorOpenAI(
    model_name="gpt-4o",
    experiment_name="testing_openai",
    type="test",
    openai_api_key="xxxx-xxx-xxxx-xxx"
)

Custom Generators

You can also create custom generators by extending the Text2SQLGenerator class. This will allow you to define your own generation strategies, models, and other configurations but still use the same API as the built-in generators. Here is an example on how you can do that.

To define custom generator, you need to define four methods (mandatory):

  1. load_client: Load the model and other configurations. This method acts like property and should return the model object.

  2. load_tokenizer: Load the tokenizer for the model. This method acts like property and should return the tokenizer object. In case your generator is not using any tokenizer, just return None.

  3. generate: Generate the SQL query from the prompt. This method should return the SQL query. So what ever logic is needed to run the model should be dumped here, such that the output is a single SQL query.

  4. model_name_or_path: Here you need to return a string which will tell either the name of the model used or the path to load the model.

Now let’s see how you can load a custom generator. Let’s use open ai for this.

from typing import Optional 
import os
from openai import OpenAI 
from premsql.generators.base import Text2SQLGeneratorBase

class Text2SQLGeneratorOpenAI(Text2SQLGeneratorBase):
    def __init__(
        self,
        model_name: str,
        experiment_name: str, 
        type: str,
        experiment_folder: Optional[str]=None,
        openai_api_key: Optional[str]=None,
    ):
        self._api_key = openai_api_key or os.environ.get("OPENAI_API_KEY")
        self.model_name = model_name 
        super().__init__(
            experiment_folder=experiment_folder,
            experiment_name=experiment_name,
            type=type,
        )
    
    # Define your openai client 
    @property
    def load_client(self):
        client = OpenAI(api_key=self._api_key)
        return client  

    # Define the tokenizer
    @property
    def load_tokenizer(self):
        pass 

    # Define the model name and path
    @property
    def model_name_or_path(self):
        return self.model_name
    
    # Write the generate function
    def generate(
        self,
        data_blob: dict,
        temperature: Optional[float] = 0.0,
        max_new_tokens: Optional[int] = 256,
        postprocess: Optional[bool] = True,
        **kwargs
    ) -> str:
        prompt = data_blob["prompt"]
        max_tokens = max_new_tokens
        generation_config = {
            **kwargs,
            **{"temperature": temperature, "max_tokens": max_tokens},
        }
        completion = self.client.client.chat.completions.create(
            model=self.model_name,
            messages=[{
                "role": "user",
                "content": prompt
            }],
            **generation_config
        ).choices[0].message
        return self.postprocess(output_string=completion) if postprocess else completion

Thats it, thats how you write a custom generator. Now you can use this generator in the same way as you use the built-in generators.