End-to-end workflows customizable Text to SQL Agentic workflows for querying, analysing and plotting charts in databases all in natural language.
/query
: This will tell agent to go, write a SQL query and execute to the DataBase./analyse
: This will analyse the dataframe and based on userβs query will give an concise natural language answer./plot
: This will plot basic charts / graphs from the output dataframes and userβs input query./followup
: If the user do not start with the above three markers or
when the wokers (LLM / SLM) of the agent fails then it goes into this route. Here the worker tells what to do next.baseline
agent. In other words, Baseline means the minimal agentic implementation that can do all the following tasks. However using the premsql library you can make task specific Database RAG agents or extend the existing ones.
analyse
/ plot
and followup
, we are going to use this model.
Text2SQLGeneratorHF
/ Text2SQLGeneratorOpenAI
or Text2SQLGeneratorPremAI
/ Text2SQLGeneratorOllama
to use other open or closed source models as well. Now, letβs proceed with instantiating our BaseLine agent.
session_name
: A string identifier for the current session, likely used for tracking or logging purposes.
db_connection_uri
: A string containing the database connection URI that specifies how to connect to the database.
specialized_model1
: A Text2SQL model instance used primarily for SQL generation tasks. Based on the code, this is used in the text2sql_worker.
specialized_model2
: A second Text2SQL model instance used for analysis, plotting, and followup tasks. This model is shared among multiple workers.
executor
: An instance of BaseExecutor that handles the execution of SQL queries against the database.
plot_tool
: An instance of BasePlotTool that provides plotting capabilities for data visualization.
session_db_path
: An optional string parameter specifying the path to store session-related data.
include_tables
: An optional list of tables to include in the database operations, used for filtering available tables.
exclude_tables
: An optional list of tables to exclude from database operations, used for filtering available tables.
auto_filter_tables
: An optional boolean flag that likely controls automatic table filtering behavior.
route_worker_kwargs
: An optional dictionary containing additional keyword arguments for different worker routes (query, analyse, plot, etc.).
Note about route_worker_kwargs
route_worker_kwargs
parameter lets you customize how different parts of the agent behave. Think of it as a control panel for fine-tuning each workerβs settings.Hereβs a simple example:/analyse
):
/query
):
/query
route. It searches for the latest output
and feed that as the input and then provide the analysis.
For plotting charts
/query
route. It searches for the latest output
and feed that as the input and then go for selecting the columns of the latest input/output dataframe and selects which columns and which plot type needs to be chosen from userβs input. Current supported plot types are: scatter
, line
, area
, histogram
and bar
plot.
Everything else followup route
ExitWorkerOutput
meaning, when one conversation cycle finishes it accumulates all the information and outputs it in a structural format. Here are all the information ExitWorkerOutput
provides you:
server_mode
to be on while calling the agent which looks like this:
AgentOutput
Which is a less granular form of ExitWorkerOutput
which contains
the following attributes:
agent(...)
, for routes like /analyse
and /plot
then it assumes that it would do
the analysis on the latest output dataframe which was output from /query
plot. So suppose if you want an analysis from
what how are the main topics of the comments table. Then you need to previously query the agent to output some dataframe that
has the comments information./query
./analyse
.plot
./query
. However if you use this worker independently then there is no need to do that. Here is how we define the Text to SQL worker inside BaseLine agent.
db_connection_uri (str)
: Database connection string to connect to the target database.generator (Text2SQLGeneratorBase)
: Main model for generating SQL queries from text.helper_model (Optional[Text2SQLGeneratorBase])
: Secondary model for table filtering and error correction.
Defaults to None.executor (Optional[BaseExecutor])
: Custom executor for running SQL queries. If None, uses default.include_tables (Optional[list])
: List of tables to specifically include in schema context.
If None, includes all tables.exclude_tables (Optional[list])
: List of tables to exclude from schema context.
If None, no tables are excluded.auto_filter_tables (Optional[bool])
: Whether to automatically filter relevant tables for each query.
Defaults to False. This will call the model one more time and it is recommended to use when model is strong and you have very large number of schemas.question (str)
: The natural language query to be converted to SQLadditional_knowledge (Optional[str])
: Extra context or information to help with query generationfewshot_dict (Optional[dict])
: Dictionary of example question-SQL pairs for few-shot learningtemperature (Optional[float])
: Controls randomness in SQL generation (0.0 = deterministic, 1.0 = most random)max_new_tokens (Optional[int])
: Maximum number of tokens to generate in the SQL queryrender_results_using (Optional[Literal["json", "dataframe"]])
: Output format for query resultsprompt_template (Optional[str])
: Custom template for the main SQL generation prompterror_handling_prompt_template (Optional[str])
: Custom template for error correction prompts/analyse
. Hereβs how to initialize the Analyser Worker:
generator (Text2SQLGeneratorBase)
: Model used for generating analysis from the data.question (str)
: The natural language question about what to analyze in the datainput_dataframe (pd.DataFrame)
: The dataframe to analyzedo_chunkwise_analysis (Optional[bool])
: Whether to break analysis into chunks for large datasets. Defaults to Falsechunk_size (Optional[int])
: Number of rows per chunk when doing chunkwise analysis. Defaults to 20max_chunks (Optional[int])
: Maximum number of chunks to analyze. Defaults to 20temperature (Optional[float])
: Controls randomness in analysis generation (0.0 = deterministic). Defaults to 0.19max_new_tokens (Optional[int])
: Maximum number of tokens to generate in the analysis. Defaults to 600analysis_prompt_template (Optional[str])
: Custom template for the main analysis promptanalysis_merger_template (Optional[str])
: Custom template for merging chunk analysesverbose (Optional[bool])
: Whether to print detailed progress. Defaults to Falsedo_chunkwise_analysis=True
. This process:
chunk_size
/plot
. However, it can be used independently without this prefix. Hereβs how to initialize the Chart Plot Worker in the BaseLine agent:
generator (Text2SQLGeneratorBase)
: Model for generating plot configurations from textplot_tool (BasePlotTool)
: Tool for creating and rendering plots (e.g., PlotlyTool)question (str)
: Natural language description of the desired visualizationinput_dataframe (pd.DataFrame)
: Data to be plottedtemperature (Optional[float])
: Controls randomness in configuration generation (default: 0.1)max_new_tokens (Optional[int])
: Maximum tokens for configuration generation (default: 100)plot_image (Optional[bool])
: Whether to generate base64 image of plot (default: True)prompt_template (Optional[str])
: Custom template for plot configuration generation