Introduction

Welcome to the PremSQL library, a powerful tool for building self-hosted, end-to-end autonomous data analysis pipelines powered by Text to SQL. PremSQL offers a modular design where each component functions independently, enabling you to create fully customized workflows. Watch our Quick Demo of the latest PremSQL Agent Server and Playground:

Core Components

Each component works independently and is designed to accomplish a specific task. While we recommend exploring the components sequentially to gain a comprehensive understanding, it’s not mandatory. Since components operate independently, you can focus on those that meet your immediate needs and return later for a deeper dive into others.

PremSQL GitHub

Star the project to stay updated with our rapid development of the best local Text-to-SQL solution.

News

[Sep 10th 2024] Initial release of PremSQL
[Sep 10th 2024] Launch of Prem-1B-SQL (fully local Text to SQL model)
[Oct 30th 2024] Prem-1B-SQL surpassed 5K+ downloads
[Nov 5th 2024] Release of PremSQL Playground, Agents, and AgentServer
[Nov 10th 2024] Release of Prem-1B-SQL Ollama model with Ollama support

Installation

Start by creating a virtual environment and installing PremSQL:

python -m venv .venv
source .venv/bin/activate
pip install -U premsql

Note: We currently recommend using Python virtualenv instead of conda, as some users have reported compatibility issues with conda environments.

Note

The latest PremSQL update doesn’t include pre-installed dependencies to accommodate backend variations and maintain a lighter package. Choose your preferred backend:

For Hugging Face Transformers:

pip install torch transformers

For Apple MLX backend:

pip install mlx mlx-lm

For Ollama integration, first install Ollama, then install the Python client:

pip install ollama

PremSQL is designed to be versatile and hackable, with a simple code structure and decoupled components. Here are the main ways to use it:

Use PremSQL’s pre-built Agent UI with our baseline agent to analyze CSVs, databases, or Kaggle datasets (as demonstrated in the demo video)
Leverage PremSQL as a Python library to:
Run the PremSQL backend API server and integrate it with your preferred programming language

Quick Start

Let’s explore how to use PremSQL’s latest baseline agent with Ollama. We’ve chosen Ollama for this guide because it’s easy to set up, requires minimal computational resources, and runs everything locally at no cost. However, you can also use Apple MLX, Hugging Face Transformers, or other supported backends.

PremSQL installation with Ollama and model downloads

First, ensure PremSQL is installed with the Ollama client. If you haven’t done so, follow the installation instructions above. We’ll use two models: Prem-1B-SQL and Llama3.2 1B. Download both models using these commands:

ollama run anindya/prem1b-sql-ollama-fp116
ollama run llama3.2:1b

Optional optimization

By default, Ollama runs one model at a time. To optimize PremSQL agent performance with multiple models, configure these environment variables:

export OLLAMA_NUM_PARALLEL=3
export OLLAMA_MAX_LOADED_MODELS=3

Remember to restart Ollama after making these changes.

Launch PremSQL Server and Agent UI

PremSQL includes a CLI tool for managing the backend API server and Agent UI. Running premsql in your terminal displays:

Usage: premsql [OPTIONS] COMMAND [ARGS]...
  PremSQL CLI to manage API servers and Streamlit app
Options:
  --version  Show the version and exit.
  --help     Show this message and exit.
Commands:
  launch  Launch PremSQL services
  stop    Stop all PremSQL services

This confirms that PremSQL is installed correctly. Verify you have version 0.1.11 or higher. Launch both the backend API server and playground with:

linux, windows and mac

premsql launch all

On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:

You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.

Import a dataset from Kaggle

To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams) and paste it into the Upload csvs or use Kaggle field in the PremSQL navigation. After submission, you’ll see:

You’ll now see a starter code template specific to your chosen backend.

Start a PremSQL analysis session

For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:

starter_server.py

from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
from premsql.generators import Text2SQLGeneratorOllama
from premsql.agents.tools import SimpleMatplotlibTool
from premsql.executors import ExecutorUsingLangChain

text2sql_model = Text2SQLGeneratorOllama(
    model_name="anindya/prem1b-sql-ollama-fp116",
    experiment_name="ollama",
    type="test"
)

analyser_plotter_model = Text2SQLGeneratorOllama(
    model_name="llama3.2:1b",
    experiment_name="ollama",
    type="test"
)

db_connection_uri = "sqlite:////Users/anindya/Library/Caches/premsql/kaggle/student.sqlite"
baseline = BaseLineAgent(
    session_name="student",                     # Required unique session name
    db_connection_uri=db_connection_uri,        # Target database connection
    specialized_model1=text2sql_model,          # Text to SQL model
    specialized_model2=analyser_plotter_model,  # Secondary analysis model
    executor=ExecutorUsingLangChain(),         # Database executor
    auto_filter_tables=False,                  # Table filtering option
    plot_tool=SimpleMatplotlibTool()           # Visualization tool
)

agent_server = AgentServer(agent=baseline, port=8162)
agent_server.launch()

Run this code in your terminal within your PremSQL environment:

python starter_server.py

You should see FastAPI server output similar to:

2024-11-10 14:11:51,874 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,879 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,884 - [PIPELINE-MEMORY] - INFO - /Users/anindya/test_apps/premsql/premsql_pipeline_memory.db
2024-11-10 14:11:51,904 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting server on port 8162
INFO:     Started server process [55Here's Part 2 of the document:

````mdx:premsql/introduction.mdx
<Step title="Launch PremSQL Server and Agent UI">
  PremSQL includes a CLI tool for managing the backend API server and Agent UI. Running `premsql` in your terminal displays:

  ```bash
  Usage: premsql [OPTIONS] COMMAND [ARGS]...
    PremSQL CLI to manage API servers and Streamlit app
  Options:
    --version  Show the version and exit.
    --help     Show this message and exit.
  Commands:
    launch  Launch PremSQL services
    stop    Stop all PremSQL services

This confirms that PremSQL is installed correctly. Verify you have version 0.1.11 or higher. Launch both the backend API server and playground with:

linux, windows and mac

premsql launch all

On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:

You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.

Import a dataset from Kaggle

You’ll now see a starter code template specific to your chosen backend.

Start a PremSQL analysis session

For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:

starter_server.py

from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
from premsql.generators import Text2SQLGeneratorOllama
from premsql.agents.tools import SimpleMatplotlibTool
from premsql.executors import ExecutorUsingLangChain

text2sql_model = Text2SQLGeneratorOllama(
    model_name="anindya/prem1b-sql-ollama-fp116",
    experiment_name="ollama",
    type="test"
)

analyser_plotter_model = Text2SQLGeneratorOllama(
    model_name="llama3.2:1b",
    experiment_name="ollama",
    type="test"
)

db_connection_uri = "sqlite:////Users/anindya/Library/Caches/premsql/kaggle/student.sqlite"
baseline = BaseLineAgent(
    session_name="student",                     # Required unique session name
    db_connection_uri=db_connection_uri,        # Target database connection
    specialized_model1=text2sql_model,          # Text to SQL model
    specialized_model2=analyser_plotter_model,  # Secondary analysis model
    executor=ExecutorUsingLangChain(),         # Database executor
    auto_filter_tables=False,                  # Table filtering option
    plot_tool=SimpleMatplotlibTool()           # Visualization tool
)

agent_server = AgentServer(agent=baseline, port=8162)
agent_server.launch()

Run this code in your terminal within your PremSQL environment:

python starter_server.py

You should see FastAPI server output similar to:

2024-11-10 14:11:51,874 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,879 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,884 - [PIPELINE-MEMORY] - INFO - /Users/anindya/test_apps/premsql/premsql_pipeline_memory.db
2024-11-10 14:11:51,904 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting server on port 8162
INFO:     Started server process [55869]
INFO:     Waiting for application startup.
2024-11-10 14:11:51,908 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting up the application
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8162 (Press CTRL+C to quit)
INFO:     ::1:62209 - "GET /session_info HTTP/1.1" 200 OK

Copy the localhost URL (http://localhost:8162) and paste it here:

Note

This is a starter implementation using our baseline agent. You can create custom agents with different functionalities (within data analysis scope) by extending this code. The snippet above demonstrates our baseline implementation for Autonomous Analysis agents.

You’re all set! You can now perform analysis on various data sources like CSVs, Databases and Kaggle csv datasets.

That’s how simple it is! From here, explore the many features PremSQL offers:

PremSQL Datasets

Pre-processed datasets hosted on HuggingFace for Text-to-SQL tasks. Ideal for evaluation, fine-tuning, and creating custom datasets.

PremSQL Generators

Models that transform natural language input into SQL queries based on your database schema.

PremSQL Executors

Connects to databases and executes generated SQL queries to fetch results.

PremSQL Evaluators

Evaluates Text-to-SQL models using metrics like execution accuracy and Valid Efficiency Score (VES).

PremSQL Error Handling

Creates error handling prompts and datasets to enhance inference reliability and self-correction capabilities.

PremSQL Tuner

Fine-tunes open-source models on Text-to-SQL datasets with custom evaluation methods for optimal performance.

PremSQL Agents

End-to-end agentic workflows for querying, analyzing, and visualizing database insights using natural language. Supports custom implementations for specialized use cases.

PremSQL Playground

A ChatGPT-like interface specialized for database interactions. Deploy PremSQL agents with customized configurations for an interactive experience.

Why PremSQL? The Vision

PremSQL is focused on creating local Text-to-SQL workflows. In many scenarios, organizations need to maintain data privacy while leveraging generative AI solutions for productivity and innovation. PremSQL addresses this need by keeping your data entirely local.

Key Use Cases:

Interactive database querying and analysis
RAG systems with database integration
Intelligent SQL autocompletion
Self-hosted AI-powered data analysis
Autonomous agentic pipelines with secure database access

How is it different?

While many libraries excel at building general AI workflows, they often present a steep learning curve for customization. PremSQL simplifies this process, giving you complete control over your data while seamlessly integrating with existing LangChain, Llama-Index, or DSPy workflows.

Join Our Community

We invite you to participate in our open-source initiative! Your contributions, feedback, and issue reports are crucial to our growth. For more information on how to contribute, please check our contributing guidelines.

Stay connected and follow our GitHub repository for the latest updates and improvements!

Get started

Datasets 🗃️

Fine-Tuning 🛠️

Inference 🏃‍♂️

Agentic Evaluations 📈

Playground 🛝

Stats 📊

Resources 🧰

Cookbook 🍳

Introduction

Core Components

PremSQL GitHub

News

Installation

Note

Quick Start

PremSQL installation with Ollama and model downloads

Launch PremSQL Server and Agent UI

Import a dataset from Kaggle

Start a PremSQL analysis session

Import a dataset from Kaggle

Start a PremSQL analysis session

Note

PremSQL Datasets

PremSQL Generators

PremSQL Executors

PremSQL Evaluators

PremSQL Error Handling

PremSQL Tuner

PremSQL Agents

PremSQL Playground

Why PremSQL? The Vision

How is it different?

Join Our Community

Get started

Datasets 🗃️

Fine-Tuning 🛠️

Inference 🏃‍♂️

Agentic Evaluations 📈

Playground 🛝

Stats 📊

Resources 🧰

Cookbook 🍳

​Core Components

PremSQL GitHub

​News

​Installation

Note

​Quick Start

PremSQL installation with Ollama and model downloads

Launch PremSQL Server and Agent UI

Import a dataset from Kaggle

Start a PremSQL analysis session

Import a dataset from Kaggle

Start a PremSQL analysis session

Note

PremSQL Datasets

PremSQL Generators

PremSQL Executors

PremSQL Evaluators

PremSQL Error Handling

PremSQL Tuner

PremSQL Agents

PremSQL Playground

​Why PremSQL? The Vision

​How is it different?

​Join Our Community

Core Components

News

Installation

Quick Start

Why PremSQL? The Vision

How is it different?

Join Our Community