Installation and setup
We start by installingllama-index
and premai-sdk
. You can type the following command to install:
Setup PremAI client with LlamaIndex
Once we imported our required modules, let’s setup our client. For now let’s assume that ourproject_id
is 8
. But make sure you use your project-id, otherwise it will throw error.
In order to use llama-index with PremAI, you do not need to pass any model name or set any parameters with our chat-client. By default it will use the model name and parameters used in the LaunchPad.
If you change the
model
or any other parameters like temperature
or max_tokens
while setting the client, it will override existing default configurations, that was used in LaunchPad.Chat Completions
Now you are all set. We can now start with interacting with our application. Let’s start by building simple chat request and responses using llama-index.ChatMessage
like this:
In both of the scenarios, you are going to override your system prompt that was fixed while deploying the application from the platform. And, specifically in this case, if you override system prompt while instantiating the PremAI class then system message in
ChatMessage
won’t provide any affect.So if you want to override system prompt for any experimental cases, either you need to provide that while instantiating the client or while writing it in ChatMessage
with a role system
.If you are going to place system prompt here, then it will override your system prompt that was fixed while deploying the application from the platform.You can find all the optional parameters here. Any parameters other than these supported parameters will be automatically removed before calling the model.
Native RAG Support with Prem Repositories
Prem Repositories which allows users to upload documents (.txt, .pdf etc) and connect those repositories to the LLMs. You can think Prem repositories as native RAG, where each repository can be considered as a vector database. You can connect multiple repositories. You can learn more about repositories here. Repositories are also supported in langchain premai. Here is how you can do it.Please note: Similar like
model_name
when you invoke the argument repositories
, then you are potentially overriding the repositories connected in the launchpad.Now, we connect the repository with our chat object to invoke RAG based generations.Ideally, you do not need to connect Repository IDs here to get Retrieval Augmented Generations. You can still get the same result if you have connected the repositories in prem platform.
Streaming
In this section, let’s see how we can stream tokens using llama-index and PremAI. It is very similar to above methods. Here’s how you do it.complete
method, we have stream_complete
method which does streaming of tokens for completion.
Embeddings
In this section we are going to discuss how we can get access to different embedding model usingPremAIEmbeddings
with llama-index
text-embedding-3-large
model for this example.
Setting
model_name
argument in mandatory for PremAIEmbeddings unlike chat.Calling the Embedding Model
Now we are all set. Now let’s start using our embedding model with a single query followed by multiple queries (which is also called as a document)Dimension of embeddings: 3072
[-0.02129288576543331, 0.0008162345038726926, -0.004556538071483374, 0.02918623760342598, -0.02547479420900345]