Welcome to the PremSQL library, a powerful tool for building self-hosted, end-to-end autonomous data analysis pipelines powered by Text to SQL. PremSQL offers a modular design where each component functions independently, enabling you to create fully customized workflows. Watch our Quick Demo of the latest PremSQL Agent Server and Playground:
Each component works independently and is designed to accomplish a specific task. While we recommend exploring the components sequentially to gain a comprehensive understanding, it’s not mandatory. Since components operate independently, you can focus on those that meet your immediate needs and return later for a deeper dive into others.
Star the project to stay updated with our rapid development of the best local Text-to-SQL solution.
Start by creating a virtual environment and installing PremSQL:
Note: We currently recommend using Python virtualenv instead of conda, as some users have reported compatibility issues with conda environments.
The latest PremSQL update doesn’t include pre-installed dependencies to accommodate backend variations and maintain a lighter package. Choose your preferred backend:
For Hugging Face Transformers:
For Apple MLX backend:
For Ollama integration, first install Ollama, then install the Python client:
PremSQL is designed to be versatile and hackable, with a simple code structure and decoupled components. Here are the main ways to use it:
Use PremSQL’s pre-built Agent UI with our baseline agent to analyze CSVs, databases, or Kaggle datasets (as demonstrated in the demo video)
Leverage PremSQL as a Python library to:
Run the PremSQL backend API server and integrate it with your preferred programming language
Let’s explore how to use PremSQL’s latest baseline agent with Ollama. We’ve chosen Ollama for this guide because it’s easy to set up, requires minimal computational resources, and runs everything locally at no cost. However, you can also use Apple MLX, Hugging Face Transformers, or other supported backends.
First, ensure PremSQL is installed with the Ollama client. If you haven’t done so, follow the installation instructions above. We’ll use two models: Prem-1B-SQL
and Llama3.2 1B
. Download both models using these commands:
Optional optimization
By default, Ollama runs one model at a time. To optimize PremSQL agent performance with multiple models, configure these environment variables:
Remember to restart Ollama after making these changes.
PremSQL includes a CLI tool for managing the backend API server and Agent UI. Running premsql
in your terminal displays:
This confirms that PremSQL is installed correctly. Verify you have version 0.1.11
or higher. Launch both the backend API server and playground with:
On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:
You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.
To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams
) and paste it into the Upload csvs or use Kaggle
field in the PremSQL navigation. After submission, you’ll see:
You’ll now see a starter code template specific to your chosen backend.
For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:
Run this code in your terminal within your PremSQL environment:
You should see FastAPI server output similar to:
This confirms that PremSQL is installed correctly. Verify you have version 0.1.11
or higher. Launch both the backend API server and playground with:
On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:
You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.
To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams
) and paste it into the Upload csvs or use Kaggle
field in the PremSQL navigation. After submission, you’ll see:
You’ll now see a starter code template specific to your chosen backend.
For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:
Run this code in your terminal within your PremSQL environment:
You should see FastAPI server output similar to:
Copy the localhost URL (http://localhost:8162
) and paste it here:
This is a starter implementation using our baseline agent. You can create custom agents with different functionalities (within data analysis scope) by extending this code. The snippet above demonstrates our baseline implementation for Autonomous Analysis agents.
You’re all set! You can now perform analysis on various data sources like CSVs, Databases and Kaggle csv datasets.
That’s how simple it is! From here, explore the many features PremSQL offers:
Pre-processed datasets hosted on HuggingFace for Text-to-SQL tasks. Ideal for evaluation, fine-tuning, and creating custom datasets.
Models that transform natural language input into SQL queries based on your database schema.
Connects to databases and executes generated SQL queries to fetch results.
Evaluates Text-to-SQL models using metrics like execution accuracy and Valid Efficiency Score (VES).
Creates error handling prompts and datasets to enhance inference reliability and self-correction capabilities.
Fine-tunes open-source models on Text-to-SQL datasets with custom evaluation methods for optimal performance.
End-to-end agentic workflows for querying, analyzing, and visualizing database insights using natural language. Supports custom implementations for specialized use cases.
A ChatGPT-like interface specialized for database interactions. Deploy PremSQL agents with customized configurations for an interactive experience.
PremSQL is focused on creating local Text-to-SQL workflows. In many scenarios, organizations need to maintain data privacy while leveraging generative AI solutions for productivity and innovation. PremSQL addresses this need by keeping your data entirely local.
Key Use Cases:
While many libraries excel at building general AI workflows, they often present a steep learning curve for customization. PremSQL simplifies this process, giving you complete control over your data while seamlessly integrating with existing LangChain, Llama-Index, or DSPy workflows.
We invite you to participate in our open-source initiative! Your contributions, feedback, and issue reports are crucial to our growth. For more information on how to contribute, please check our contributing guidelines.
Stay connected and follow our GitHub repository for the latest updates and improvements!
Welcome to the PremSQL library, a powerful tool for building self-hosted, end-to-end autonomous data analysis pipelines powered by Text to SQL. PremSQL offers a modular design where each component functions independently, enabling you to create fully customized workflows. Watch our Quick Demo of the latest PremSQL Agent Server and Playground:
Each component works independently and is designed to accomplish a specific task. While we recommend exploring the components sequentially to gain a comprehensive understanding, it’s not mandatory. Since components operate independently, you can focus on those that meet your immediate needs and return later for a deeper dive into others.
Star the project to stay updated with our rapid development of the best local Text-to-SQL solution.
Start by creating a virtual environment and installing PremSQL:
Note: We currently recommend using Python virtualenv instead of conda, as some users have reported compatibility issues with conda environments.
The latest PremSQL update doesn’t include pre-installed dependencies to accommodate backend variations and maintain a lighter package. Choose your preferred backend:
For Hugging Face Transformers:
For Apple MLX backend:
For Ollama integration, first install Ollama, then install the Python client:
PremSQL is designed to be versatile and hackable, with a simple code structure and decoupled components. Here are the main ways to use it:
Use PremSQL’s pre-built Agent UI with our baseline agent to analyze CSVs, databases, or Kaggle datasets (as demonstrated in the demo video)
Leverage PremSQL as a Python library to:
Run the PremSQL backend API server and integrate it with your preferred programming language
Let’s explore how to use PremSQL’s latest baseline agent with Ollama. We’ve chosen Ollama for this guide because it’s easy to set up, requires minimal computational resources, and runs everything locally at no cost. However, you can also use Apple MLX, Hugging Face Transformers, or other supported backends.
First, ensure PremSQL is installed with the Ollama client. If you haven’t done so, follow the installation instructions above. We’ll use two models: Prem-1B-SQL
and Llama3.2 1B
. Download both models using these commands:
Optional optimization
By default, Ollama runs one model at a time. To optimize PremSQL agent performance with multiple models, configure these environment variables:
Remember to restart Ollama after making these changes.
PremSQL includes a CLI tool for managing the backend API server and Agent UI. Running premsql
in your terminal displays:
This confirms that PremSQL is installed correctly. Verify you have version 0.1.11
or higher. Launch both the backend API server and playground with:
On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:
You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.
To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams
) and paste it into the Upload csvs or use Kaggle
field in the PremSQL navigation. After submission, you’ll see:
You’ll now see a starter code template specific to your chosen backend.
For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:
Run this code in your terminal within your PremSQL environment:
You should see FastAPI server output similar to:
This confirms that PremSQL is installed correctly. Verify you have version 0.1.11
or higher. Launch both the backend API server and playground with:
On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:
You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.
To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams
) and paste it into the Upload csvs or use Kaggle
field in the PremSQL navigation. After submission, you’ll see:
You’ll now see a starter code template specific to your chosen backend.
For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:
Run this code in your terminal within your PremSQL environment:
You should see FastAPI server output similar to:
Copy the localhost URL (http://localhost:8162
) and paste it here:
This is a starter implementation using our baseline agent. You can create custom agents with different functionalities (within data analysis scope) by extending this code. The snippet above demonstrates our baseline implementation for Autonomous Analysis agents.
You’re all set! You can now perform analysis on various data sources like CSVs, Databases and Kaggle csv datasets.
That’s how simple it is! From here, explore the many features PremSQL offers:
Pre-processed datasets hosted on HuggingFace for Text-to-SQL tasks. Ideal for evaluation, fine-tuning, and creating custom datasets.
Models that transform natural language input into SQL queries based on your database schema.
Connects to databases and executes generated SQL queries to fetch results.
Evaluates Text-to-SQL models using metrics like execution accuracy and Valid Efficiency Score (VES).
Creates error handling prompts and datasets to enhance inference reliability and self-correction capabilities.
Fine-tunes open-source models on Text-to-SQL datasets with custom evaluation methods for optimal performance.
End-to-end agentic workflows for querying, analyzing, and visualizing database insights using natural language. Supports custom implementations for specialized use cases.
A ChatGPT-like interface specialized for database interactions. Deploy PremSQL agents with customized configurations for an interactive experience.
PremSQL is focused on creating local Text-to-SQL workflows. In many scenarios, organizations need to maintain data privacy while leveraging generative AI solutions for productivity and innovation. PremSQL addresses this need by keeping your data entirely local.
Key Use Cases:
While many libraries excel at building general AI workflows, they often present a steep learning curve for customization. PremSQL simplifies this process, giving you complete control over your data while seamlessly integrating with existing LangChain, Llama-Index, or DSPy workflows.
We invite you to participate in our open-source initiative! Your contributions, feedback, and issue reports are crucial to our growth. For more information on how to contribute, please check our contributing guidelines.
Stay connected and follow our GitHub repository for the latest updates and improvements!