LangChain is a framework for developing applications powered by large language models (LLMs). It provides open-source building blocks for development, LangSmith for monitoring and optimizing production chains, and LangServe for turning chains into deployable APIs.

Installation and setup

We start by installing langchain and premai-sdk. You can type the following command to install:

pip install premai langchain

Before proceeding further, please make sure that you have made an account on PremAI and already created a project. If not, please refer to the quick start guide to get started with the PremAI platform. Create your first project and grab your API key.

from langchain_core.messages import HumanMessage, SystemMessage
from langchain_community.chat_models import ChatPremAI

Setup PremAI client in LangChain

Once we imported our required modules, let’s setup our client. For now let’s assume that our project_id is 8. But make sure you use your project-id, otherwise it will throw error.

To use langchain with prem, you do not need to pass any model name or set any parameters with our chat-client. By default it will use the model name and parameters used in the LaunchPad.

If you change the model or any other parameters like temperature or max_tokens while setting the client, it will override existing default configurations, that was used in LaunchPad.

import os
import getpass

if "PREMAI_API_KEY" not in os.environ:
    os.environ["PREMAI_API_KEY"] = getpass.getpass("PremAI API Key:")

chat = ChatPremAI(project_id=8)

Chat Completions

ChatPremAI supports two methods: invoke (which is the same as generate) and stream.

The first one will give us a static result. Whereas the second one will stream tokens one by one. Here’s how you can generate chat-like completions.

human_message = HumanMessage(content="Who are you?")


You can provide system prompt here like this:

system_message = SystemMessage(content="You are a friendly assistant.")
human_message = HumanMessage(content="Who are you?")

chat.invoke([system_message, human_message])

You can also change generation parameters while calling the model. Here’s how you can do that:

    [system_message, human_message],
    temperature = 0.7, max_tokens = 20, top_p = 0.95

If you are going to place system prompt here, then it will override your system prompt that was fixed while deploying the application from the platform.

You can find all the optional parameters here. Any parameters other than these supported parameters will be automatically removed before calling the model.

Native RAG Support with Prem Repositories

Prem Repositories which allows users to upload documents (.txt, .pdf etc) and connect those repositories to the LLMs. You can think Prem repositories as native RAG, where each repository can be considered as a vector database. You can connect multiple repositories. You can learn more about repositories here.

Repositories are also supported in langchain premai. Here is how you can do it.

query = "what is the diameter of individual Galaxy"
repository_ids = [1991, ]
repositories = dict(

First we start by defining our repository with some repository ids. Make sure that the ids are valid repository ids. You can learn more about how to get the repository id here.

Please note: Similar like model_name when you invoke the argument repositories, then you are potentially overriding the repositories connected in the launchpad.

Now, we connect the repository with our chat object to invoke RAG based generations.

response = chat.invoke(query, max_tokens=100, repositories=repositories)

print(json.dumps(response.response_metadata, indent=4))

This is how an output looks like.

The diameters of individual galaxies range from 80,000-150,000 light-years.
    "document_chunks": [
            "repository_id": 1991,
            "document_id": 1307,
            "chunk_id": 173926,
            "document_name": "Kegy 202 Chapter 2",
            "similarity_score": 0.586126983165741,
            "content": "n thousands\n                                                                                                                                               of           light-years. The diameters of individual\n                                                                                                                                               galaxies range from 80,000-150,000 light\n                                                                                                                       "
            "repository_id": 1991,
            "document_id": 1307,
            "chunk_id": 173925,
            "document_name": "Kegy 202 Chapter 2",
            "similarity_score": 0.4815782308578491,
            "content": "                                                for development of galaxies. A galaxy contains\n                                                                                                                                               a large number of stars. Galaxies spread over\n                                                                                                                                               vast distances that are measured in thousands\n                                       "
            "repository_id": 1991,
            "document_id": 1307,
            "chunk_id": 173916,
            "document_name": "Kegy 202 Chapter 2",
            "similarity_score": 0.38112708926200867,
            "content": " was separated from the               from each other as the balloon expands.\n  solar surface. As the passing star moved away,             Similarly, the distance between the galaxies is\n  the material separated from the solar surface\n  continued to revolve around the sun and it\n  slowly condensed into planets. Sir James Jeans\n  and later Sir Harold Jeffrey supported thisnot to be republishedalso found to be increasing and thereby, the\n                                                             universe is"

So, this also means that you do not need to make your own RAG pipeline when using the Prem Platform. Prem uses it’s own RAG technology to deliver best in class performance for Retrieval Augmented Generations.

Ideally, you do not need to connect Repository IDs here to get Retrieval Augmented Generations. You can still get the same result if you have connected the repositories in prem platform.


In this section, let’s see how we can stream tokens using langchain and PremAI. Here’s how you do it.

import sys

for chunk in"hello how are you"):

Similar to above, if you want to override the system-prompt and the generation parameters, you need to add the following:

import sys

for chunk in
    "hello how are you",
    system_prompt = "You are an helpful assistant", temperature = 0.7, max_tokens = 20

This will stream tokens one after the other.

Please note: As of now, RAG with streaming is not supported. However we still support it with our API. You can learn more about that here.


In this section we are going to dicuss how we can get access to different embedding model using PremEmbeddings with LangChain. Lets start by importing our modules and setting our API Key.

import os
import getpass
from langchain_community.embeddings import PremEmbeddings

if os.environ.get("PREMAI_API_KEY") is None:
    os.environ["PREMAI_API_KEY"] = getpass.getpass("PremAI API Key:")

We support lots of state of the art embedding models. You can view our list of supported LLMs and embedding models here. For now let’s go for text-embedding-3-large model for this example. .

model = "text-embedding-3-large"
embedder = PremEmbeddings(project_id=8, model=model)

query = "Hello, this is a test query"
query_result = embedder.embed_query(query)

# Let's print the first five elements of the query embedding vector


Setting model_name argument in mandatory for PremAIEmbeddings unlike chat.

Finally, let’s embed some sample document

documents = [
    "This is document1",
    "This is document2",
    "This is document3"

doc_result = embedder.embed_documents(documents)

# Similar to the previous result, let's print the first five element
# of the first document vector

print(f"Dimension of embeddings: {len(query_result)}")

Dimension of embeddings: 3072



[-0.02129288576543331, 0.0008162345038726926, -0.004556538071483374, 0.02918623760342598, -0.02547479420900345]