Welcome to the PremSQL library, a powerful tool for building self-hosted, end-to-end autonomous data analysis pipelines powered by Text to SQL. PremSQL offers a modular design where each component functions independently, enabling you to create fully customized workflows. Watch our Quick Demo of the latest PremSQL Agent Server and Playground:

Core Components

Each component works independently and is designed to accomplish a specific task. While we recommend exploring the components sequentially to gain a comprehensive understanding, it’s not mandatory. Since components operate independently, you can focus on those that meet your immediate needs and return later for a deeper dive into others.

PremSQL GitHub

Star the project to stay updated with our rapid development of the best local Text-to-SQL solution.

News

  • [Sep 10th 2024] Initial release of PremSQL
  • [Sep 10th 2024] Launch of Prem-1B-SQL (fully local Text to SQL model)
  • [Oct 30th 2024] Prem-1B-SQL surpassed 5K+ downloads
  • [Nov 5th 2024] Release of PremSQL Playground, Agents, and AgentServer
  • [Nov 10th 2024] Release of Prem-1B-SQL Ollama model with Ollama support

Installation

Start by creating a virtual environment and installing PremSQL:

Note: We currently recommend using Python virtualenv instead of conda, as some users have reported compatibility issues with conda environments.

Note

The latest PremSQL update doesn’t include pre-installed dependencies to accommodate backend variations and maintain a lighter package. Choose your preferred backend:

For Hugging Face Transformers:

pip install torch transformers

For Apple MLX backend:

pip install mlx mlx-lm

For Ollama integration, first install Ollama, then install the Python client:

pip install ollama

PremSQL is designed to be versatile and hackable, with a simple code structure and decoupled components. Here are the main ways to use it:

Quick Start

Let’s explore how to use PremSQL’s latest baseline agent with Ollama. We’ve chosen Ollama for this guide because it’s easy to set up, requires minimal computational resources, and runs everything locally at no cost. However, you can also use Apple MLX, Hugging Face Transformers, or other supported backends.

1

PremSQL installation with Ollama and model downloads

First, ensure PremSQL is installed with the Ollama client. If you haven’t done so, follow the installation instructions above. We’ll use two models: Prem-1B-SQL and Llama3.2 1B. Download both models using these commands:

ollama run anindya/prem1b-sql-ollama-fp116
ollama run llama3.2:1b
2

Launch PremSQL Server and Agent UI

PremSQL includes a CLI tool for managing the backend API server and Agent UI. Running premsql in your terminal displays:

Usage: premsql [OPTIONS] COMMAND [ARGS]...
  PremSQL CLI to manage API servers and Streamlit app
Options:
  --version  Show the version and exit.
  --help     Show this message and exit.
Commands:
  launch  Launch PremSQL services
  stop    Stop all PremSQL services

This confirms that PremSQL is installed correctly. Verify you have version 0.1.11 or higher. Launch both the backend API server and playground with:

linux, windows and mac
premsql launch all

On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:

You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.

3

Import a dataset from Kaggle

To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams) and paste it into the Upload csvs or use Kaggle field in the PremSQL navigation. After submission, you’ll see:

You’ll now see a starter code template specific to your chosen backend.

4

Start a PremSQL analysis session

For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:

starter_server.py
from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
from premsql.generators import Text2SQLGeneratorOllama
from premsql.agents.tools import SimpleMatplotlibTool
from premsql.executors import ExecutorUsingLangChain

text2sql_model = Text2SQLGeneratorOllama(
    model_name="anindya/prem1b-sql-ollama-fp116",
    experiment_name="ollama",
    type="test"
)

analyser_plotter_model = Text2SQLGeneratorOllama(
    model_name="llama3.2:1b",
    experiment_name="ollama",
    type="test"
)

db_connection_uri = "sqlite:////Users/anindya/Library/Caches/premsql/kaggle/student.sqlite"
baseline = BaseLineAgent(
    session_name="student",                     # Required unique session name
    db_connection_uri=db_connection_uri,        # Target database connection
    specialized_model1=text2sql_model,          # Text to SQL model
    specialized_model2=analyser_plotter_model,  # Secondary analysis model
    executor=ExecutorUsingLangChain(),         # Database executor
    auto_filter_tables=False,                  # Table filtering option
    plot_tool=SimpleMatplotlibTool()           # Visualization tool
)

agent_server = AgentServer(agent=baseline, port=8162)
agent_server.launch()

Run this code in your terminal within your PremSQL environment:

python starter_server.py

You should see FastAPI server output similar to:

2024-11-10 14:11:51,874 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,879 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,884 - [PIPELINE-MEMORY] - INFO - /Users/anindya/test_apps/premsql/premsql_pipeline_memory.db
2024-11-10 14:11:51,904 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting server on port 8162
INFO:     Started server process [55Here's Part 2 of the document:

````mdx:premsql/introduction.mdx
<Step title="Launch PremSQL Server and Agent UI">
  PremSQL includes a CLI tool for managing the backend API server and Agent UI. Running `premsql` in your terminal displays:

  ```bash
  Usage: premsql [OPTIONS] COMMAND [ARGS]...
    PremSQL CLI to manage API servers and Streamlit app
  Options:
    --version  Show the version and exit.
    --help     Show this message and exit.
  Commands:
    launch  Launch PremSQL services
    stop    Stop all PremSQL services

This confirms that PremSQL is installed correctly. Verify you have version 0.1.11 or higher. Launch both the backend API server and playground with:

linux, windows and mac
premsql launch all

On first run, it will execute database migrations before starting the server and Streamlit agent UI. A successful launch looks like this:

You can now use pre-built datasets, import CSVs, or import from Kaggle. Let’s try analyzing this student performance dataset from Kaggle.

5

Import a dataset from Kaggle

To import a Kaggle dataset into PremSQL, ensure it contains only CSV files (multiple files are supported). Simply copy the dataset ID (in this case, spscientist/students-performance-in-exams) and paste it into the Upload csvs or use Kaggle field in the PremSQL navigation. After submission, you’ll see:

You’ll now see a starter code template specific to your chosen backend.

6

Start a PremSQL analysis session

For this demo, we’ll use the Ollama starter code. Create a new file anywhere and add this code:

starter_server.py
from premsql.playground import AgentServer
from premsql.agents import BaseLineAgent
from premsql.generators import Text2SQLGeneratorOllama
from premsql.agents.tools import SimpleMatplotlibTool
from premsql.executors import ExecutorUsingLangChain

text2sql_model = Text2SQLGeneratorOllama(
    model_name="anindya/prem1b-sql-ollama-fp116",
    experiment_name="ollama",
    type="test"
)

analyser_plotter_model = Text2SQLGeneratorOllama(
    model_name="llama3.2:1b",
    experiment_name="ollama",
    type="test"
)

db_connection_uri = "sqlite:////Users/anindya/Library/Caches/premsql/kaggle/student.sqlite"
baseline = BaseLineAgent(
    session_name="student",                     # Required unique session name
    db_connection_uri=db_connection_uri,        # Target database connection
    specialized_model1=text2sql_model,          # Text to SQL model
    specialized_model2=analyser_plotter_model,  # Secondary analysis model
    executor=ExecutorUsingLangChain(),         # Database executor
    auto_filter_tables=False,                  # Table filtering option
    plot_tool=SimpleMatplotlibTool()           # Visualization tool
)

agent_server = AgentServer(agent=baseline, port=8162)
agent_server.launch()

Run this code in your terminal within your PremSQL environment:

python starter_server.py

You should see FastAPI server output similar to:

2024-11-10 14:11:51,874 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,879 - [GENERATOR] - INFO - Experiment folder found in: /Users/anindya/Library/Caches/premsql/experiments/test/ollama
2024-11-10 14:11:51,884 - [PIPELINE-MEMORY] - INFO - /Users/anindya/test_apps/premsql/premsql_pipeline_memory.db
2024-11-10 14:11:51,904 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting server on port 8162
INFO:     Started server process [55869]
INFO:     Waiting for application startup.
2024-11-10 14:11:51,908 - [FASTAPI-INFERENCE-SERVICE] - INFO - Starting up the application
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8162 (Press CTRL+C to quit)
INFO:     ::1:62209 - "GET /session_info HTTP/1.1" 200 OK

Copy the localhost URL (http://localhost:8162) and paste it here:

Note

This is a starter implementation using our baseline agent. You can create custom agents with different functionalities (within data analysis scope) by extending this code. The snippet above demonstrates our baseline implementation for Autonomous Analysis agents.

7

You’re all set! You can now perform analysis on various data sources like CSVs, Databases and Kaggle csv datasets.

That’s how simple it is! From here, explore the many features PremSQL offers:

Why PremSQL? The Vision

PremSQL is focused on creating local Text-to-SQL workflows. In many scenarios, organizations need to maintain data privacy while leveraging generative AI solutions for productivity and innovation. PremSQL addresses this need by keeping your data entirely local.

Key Use Cases:

  • Interactive database querying and analysis
  • RAG systems with database integration
  • Intelligent SQL autocompletion
  • Self-hosted AI-powered data analysis
  • Autonomous agentic pipelines with secure database access

How is it different?

While many libraries excel at building general AI workflows, they often present a steep learning curve for customization. PremSQL simplifies this process, giving you complete control over your data while seamlessly integrating with existing LangChain, Llama-Index, or DSPy workflows.

Join Our Community

We invite you to participate in our open-source initiative! Your contributions, feedback, and issue reports are crucial to our growth. For more information on how to contribute, please check our contributing guidelines.

Stay connected and follow our GitHub repository for the latest updates and improvements!