Introduction
Welcome to the PremSQL library, a powerful tool for building self-hosted, end-to-end Text-to-SQL pipelines. PremSQL offers a modular design where each component functions independently, enabling you to create fully customized workflows.
PremSQL GitHub
Star the project to stay updated with our rapid development for the best local Text-to-SQL solution.
News
- [Sep 10th 2024] First release of PremSQL
- [Sep 10th 2024] First release of Prem-1B-SQL (fully local Text to SQL model)
Core Components
All the components works indepedently and aims to achieve a single task.It is recommended to check out each of the components sequentially to get an overall idea and how to use them to the fullest.
PremSQL Datasets
Pre-processed datasets hosted on HuggingFace for Text-to-SQL tasks. Useful for evaluation, fine-tuning, and creating custom datasets. Learn more.
PremSQL Generators
Models that generate SQL queries from user input and a specified database source. Learn more.
PremSQL Executors
Connects to databases and executes generated SQL queries to fetch results. Learn more.
PremSQL Evaluators
Evaluates Text-to-SQL models using metrics like execution accuracy and Valid Efficiency Score (VES). Learn more.
PremSQL Error Handling
Helps to make error handling prompts and datasets for error free inference and fine-tuning datasets for enforcing self correction property. .
PremSQL Tuner
Fine-tunes open-source models on Text-to-SQL datasets with custom evaluation methods to ensure optimal training performance. Learn more.
PremSQL Pipelines
End-to-end workflows that integrate generation, execution, and further processing for tasks like database Q&A. Learn more.
Why PremSQL? The Vision
PremSQL focuses on creating local Text-to-SQL workflows. Many times, you don’t want to share data with third-party systems, but still need to build generative AI solutions for productivity and innovation. PremSQL is designed to solve this by keeping your data local.
Use Cases:
- Database Q&A
- RAG with database integration
- SQL autocompletion
- AI-powered, self-hosted data analysis
- Autonomous agentic pipelines with database access
How is it different from LangChain, Llama-Index, or DSPy?
While these libraries excel in building general AI workflows, they often come with a steep learning curve for customization. PremSQL simplifies this, providing full control over your data and a smooth integration with existing LangChain, Llama-Index, or DSPy workflows.
Getting Started
Install PremSQL with:
pip install -U premsql
Here’s a quick starter code to chat with a sample database:
from premsql.pipelines import SimpleText2SQLAgent
from premsql.generators import Text2SQLGeneratorHF
from premsql.executors import SQLiteExecutor
dsn_or_db_path = "./data/db/california_schools.sqlite"
agent = SimpleText2SQLAgent(
dsn_or_db_path=dsn_or_db_path,
generator=Text2SQLGeneratorHF(
model_or_name_or_path="premai-io/prem-1B-SQL",
experiment_name="simple_pipeline",
device="cuda:0",
type="test"
),
)
question = "please list the phone numbers of the direct charter-funded schools that are opened after 2000/1/1"
response = agent.query(question)
response["table"]
Explore more detailed tutorials and learn about PremSQL’s offerings and future plans below.
Roadmap
We are excited to announce the successful rollout of the first release of the PremSQL library. Alongside the release, we are committed to continuously improving the existing documentation to enhance the overall developer experience.
-
Synthesizer Component:
A significant feature of PremSQL is the synthesizer component, designed to generate synthetic datasets from private data. This capability allows for fine-tuning smaller language models, enabling fully private text-to-SQL workflows that safeguard sensitive data. -
Agentic Pipelines with Function-Calling Features:
Future releases will incorporate advanced agentic methods with new features, including graph plotting capabilities, natural language analysis, and other enhancements to increase the system’s versatility and power. -
Training Better Small Language Models:
We are actively training small language models tailored specifically to PremSQL’s unique requirements. These models will be continually refined and optimized, ensuring they become more efficient and effective in handling designated tasks. -
Optimization of Generators and Executors:
Efforts are underway to optimize existing components, such as generators and executors, to enhance their robustness. Planned improvements include parallel processing, significantly speeding up generation and execution times, making the overall system more efficient. -
Stability and UI Enhancements:
As we move forward, we aim to include comprehensive stability tests for the entire library. A simple UI will also be rolled out to further improve user interaction and accessibility.
We invite you to join us in our open-source initiative! Your contributions, feedback, and issue submissions are invaluable in helping us grow. For more details on how to contribute, please refer to our contributing guidelines.
Stay tuned and follow our GitHub repository for the latest updates and improvements!