Introduction

Welcome to the PremSQL library, a powerful tool for building self-hosted, end-to-end Text-to-SQL pipelines. PremSQL offers a modular design where each component functions independently, enabling you to create fully customized workflows.

PremSQL GitHub

Star the project to stay updated with our rapid development for the best local Text-to-SQL solution.

News

[Sep 10th 2024] First release of PremSQL
[Sep 10th 2024] First release of Prem-1B-SQL (fully local Text to SQL model)

Core Components

All the components works indepedently and aims to achieve a single task.It is recommended to check out each of the components sequentially to get an overall idea and how to use them to the fullest.

PremSQL Datasets

Pre-processed datasets hosted on HuggingFace for Text-to-SQL tasks. Useful for evaluation, fine-tuning, and creating custom datasets. Learn more.

PremSQL Generators

Models that generate SQL queries from user input and a specified database source. Learn more.

PremSQL Executors

Connects to databases and executes generated SQL queries to fetch results. Learn more.

PremSQL Evaluators

Evaluates Text-to-SQL models using metrics like execution accuracy and Valid Efficiency Score (VES). Learn more.

PremSQL Error Handling

Helps to make error handling prompts and datasets for error free inference and fine-tuning datasets for enforcing self correction property. .

PremSQL Tuner

Fine-tunes open-source models on Text-to-SQL datasets with custom evaluation methods to ensure optimal training performance. Learn more.

PremSQL Pipelines

End-to-end workflows that integrate generation, execution, and further processing for tasks like database Q&A. Learn more.

Why PremSQL? The Vision

PremSQL focuses on creating local Text-to-SQL workflows. Many times, you don’t want to share data with third-party systems, but still need to build generative AI solutions for productivity and innovation. PremSQL is designed to solve this by keeping your data local.

Use Cases:

Database Q&A
RAG with database integration
SQL autocompletion
AI-powered, self-hosted data analysis
Autonomous agentic pipelines with database access

How is it different from LangChain, Llama-Index, or DSPy?

While these libraries excel in building general AI workflows, they often come with a steep learning curve for customization. PremSQL simplifies this, providing full control over your data and a smooth integration with existing LangChain, Llama-Index, or DSPy workflows.

Getting Started

Install PremSQL with:

pip install -U premsql

Here’s a quick starter code to chat with a sample database:

from premsql.pipelines import SimpleText2SQLAgent
from premsql.generators import Text2SQLGeneratorHF
from premsql.executors import SQLiteExecutor

dsn_or_db_path = "./data/db/california_schools.sqlite"

agent = SimpleText2SQLAgent(
    dsn_or_db_path=dsn_or_db_path,
    generator=Text2SQLGeneratorHF(
        model_or_name_or_path="premai-io/prem-1B-SQL",
        experiment_name="simple_pipeline",
        device="cuda:0",
        type="test"
    ),
)

question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1"

response = agent.query(question)
response["table"]

Explore more detailed tutorials and learn about PremSQL’s offerings and future plans below.

Roadmap

We are excited to announce the successful rollout of the first release of the PremSQL library. Alongside the release, we are committed to continuously improving the existing documentation to enhance the overall developer experience.

Synthesizer Component:
A significant feature of PremSQL is the synthesizer component, designed to generate synthetic datasets from private data. This capability allows for fine-tuning smaller language models, enabling fully private text-to-SQL workflows that safeguard sensitive data.
Agentic Pipelines with Function-Calling Features:
Future releases will incorporate advanced agentic methods with new features, including graph plotting capabilities, natural language analysis, and other enhancements to increase the system’s versatility and power.
Training Better Small Language Models:
We are actively training small language models tailored specifically to PremSQL’s unique requirements. These models will be continually refined and optimized, ensuring they become more efficient and effective in handling designated tasks.
Optimization of Generators and Executors:
Efforts are underway to optimize existing components, such as generators and executors, to enhance their robustness. Planned improvements include parallel processing, significantly speeding up generation and execution times, making the overall system more efficient.
Stability and UI Enhancements:
As we move forward, we aim to include comprehensive stability tests for the entire library. A simple UI will also be rolled out to further improve user interaction and accessibility.

We invite you to join us in our open-source initiative! Your contributions, feedback, and issue submissions are invaluable in helping us grow. For more details on how to contribute, please refer to our contributing guidelines.

Stay tuned and follow our GitHub repository for the latest updates and improvements!

Get started

Guides

Integrations

Resources

Cookbook

PremSQL

Introduction

PremSQL GitHub

News

Core Components

PremSQL Datasets

PremSQL Generators

PremSQL Executors

PremSQL Evaluators

PremSQL Error Handling

PremSQL Tuner

PremSQL Pipelines

Why PremSQL? The Vision

How is it different from LangChain, Llama-Index, or DSPy?

Getting Started

Roadmap

Get started

Guides

Integrations

Resources

Cookbook

PremSQL

PremSQL GitHub

​News

​Core Components

PremSQL Datasets

PremSQL Generators

PremSQL Executors

PremSQL Evaluators

PremSQL Error Handling

PremSQL Tuner

PremSQL Pipelines

​Why PremSQL? The Vision

​How is it different from LangChain, Llama-Index, or DSPy?

​Getting Started

​Roadmap

News

Core Components

Why PremSQL? The Vision

How is it different from LangChain, Llama-Index, or DSPy?

Getting Started

Roadmap