URL Content Summarizer
Welcome to the Prem AI cookbook section. In this recipe, we will build a simple URL Summarizer tool step by step using LangChain Prem AI. The Prem SDK can be used with LangChain for extended development. This example shows one of the ways we can extend development with Prem AI. To give a nice visualization, we use Streamlit, and here is how the final app would look:
So without further ado, let’s get started. You can find the full code here.
Objective
This recipe aims to show developers and users how to get started with Prem’s Generative AI Platform and build different use cases around it. Through this tutorial, we also want to show how you can easily integrate Prem with existing Open Source LLM orchestration tools like LangChain. The overall tutorial can be completed in four simple steps:
- Setting up all the required packages.
- We will introduce LangChain Prem AI and how to get started with it.
- Understand the MapReduce summarization technique and implement the summarization pipeline.
- Writing out a simple frontend using Streamlit.
Setting up the project
Let’s start by creating a virtual environment and installing dependencies.
Before getting started, make sure you have an account at Prem AI Platform and a valid project ID and an API Key. As a last requirement, you also need to have at least one repository-id.
Next, we need to install some dependencies. You can check all the dependencies in this requirements file and install them from here.
Now, create one folder named .streamlit
. Inside that folder, create a file called secrets.toml
. Inside secrets.toml
add your PREMAI_API_KEY
and PROJECT_ID
as shown here:
So now our folder structure looks something like this:
Inside app.py
we import our required libraries as shown below:
Please make sure you have a valid project_id. You can learn more about how to find your project_id here.
Understanding Map Reduce Summarization
We are going to implement the Map Reduce Summarization chain from LangChain. First, it is essential to understand how this chain operates on a high level. After this, we will implement it using Prem AI and LangChain.
The actual Map Reduce method
The original concept comes from the famous Map-Reduce programming model, which is part of the Hadoop framework from Apache. Primarily, Map-Reduce is used to simplify the processing of vast datasets across many machines in a cluster. The model consists of two main steps:
Map: This step converts an input dataset into a set of key-value pairs. Each input element is processed independently, producing intermediate key-value pairs.
Reduce: This step takes the intermediate key-value pairs produced by the Map step, groups them by key, and processes each group to make the final output.
You can deeply dive into the original workings here.
Map Reduce method in Summarization
The MapReduce summarization method is used for lengthy documents that exceed the token limit of the language model. So, this method tries to reduce the token length by summarizing it using the following steps:
Map step: The map step divides the documents into smaller chunks and then applies the summarization pipeline (LLMChain) to each chunk.
Reduce Step: Once all the chunks (parts of documents) are summarized, we combine all those summarized chunks and then call the LLM once again to summarize them, which gives us the final summarization.
Implement Map Reduce Summarization
Now let’s dive into coding our summarization chain.
Define Prompt Templates
We start by defining our prompt templates. We need to define two templates. The first one, map_template
, instructs the LLM to take all the documents (in this case, the chunks of the documents extracted from the URL), identify the central theme, extract useful information, etc. In the second template, reduce_template
, we ask it to summarize the document (a combined summary of the chunks) into a single summary that provides valuable insights to the reader.
Writing the summarization pipeline
Now, let’s write our summarization pipeline. This can be done with the following steps.
- First, load the langchain
WebBaseLoader
and the document from the given URL. We also load the CharacterTextSplitter to split the documents into chunks. - Initialize the
map_template
andreduce_template
from the above templates. - Initialize the
map_chain
andreduce_chain
from langchainLLMChain
, which will use an LLM (in our case, a ChatPremAI client) and the corresponding template. - We define a
combined_document_chain
which combines all the summarized chunks (stacking each of them one after the other) using the StuffDocumentsChain. - Then, a
reduce_document_chain
takes those combined documents and runs the reduce (i.e. another summarization) on those stuffed documents. - Finally, a map_reduce_chain combines all the above chains into a single chain.
- Run the chain and return the results
Phew, that was a lot. Congratulations if you made it this far. Now we will code our frontend interface using Streamlit.
Writing the frontend using Streamlit
We are only going to focus on the main summarizing component of the application. In the actual code, some additional styling is applied to enhance the appearance of the app.
Define the summarization component
Let’s write a simple helper function that uses the summarize_url
function and creates an expandable card-like container which contains the summary of the URL.
Finally, let’s use the function with the main running code. We create a form with a text area where we dump all the URLs. Our code first checks the URLs. For all successful runs, it makes a card that, when expanded, contains the summary. It also returns a JSON object that indicates which links are not summarized or invalid (if any).
We intentionally used _
because there is no use for this returned list of summaries. However, we can extend this by uploading all the summaries and documents (collected from passed) to Prem Repositories. We will show that in our following tutorial.
Congratulations you have created your first application using Prem AI. To run this application you just need to run the following command:
You can check out more tutorials in our cookbook, as well as their full source code.
Was this page helpful?