Creating a Web-Savvy Chatbot with LangChain and Streamlit
Written on
Chapter 1: Introduction to the Chatbot Journey
Greetings, tech aficionados and inquisitive souls!
Today, I'm excited to unveil the story behind my latest personal venture—a chatbot designed to navigate the web, gather information, and engage in friendly conversation. Why embark on such an adventure? Well, the internet is a treasure trove of knowledge waiting to be explored! 🌐
The vastness of the web is both daunting and exhilarating, and sometimes we require a bit of assistance to uncover its gems. This is precisely where my chatbot shines, harnessing the capabilities of LangChain paired with a sleek Streamlit interface.
The Motivation Behind the Project
The driving force behind this project was sheer curiosity! I envisioned a way to chat about online content as if conversing with a close friend. My aim was to create an interactive and enjoyable learning experience drawn from the web.
What Did I Build?
The result is a chatbot that serves as your personal internet explorer. By utilizing Streamlit, it delves into sites like Wikipedia, retrieving information so we can engage in discussions on a variety of topics in real-time.
How Does It Operate?
To get started, you simply input a URL. The chatbot then reads the webpage, converting the text into a format it can comprehend. When you pose questions, it sifts through the relevant information and responds with insightful answers, transforming every web exploration into a lively conversation.
This endeavor was not just about crafting a nifty tool; it aimed to revolutionize our interaction with and understanding of the internet, one chat at a time.
#### Key Topics Covered
- Python Programming
- Web Scraping with BeautifulSoup
- Natural Language Processing (NLP)
- Machine Learning and Text Vectorization
- Streamlit for Web App Development
- API Integration Skills
- Understanding Vector Databases (e.g., Chroma)
- Managing Environmental Variables
Exploring the Code
Environment Setup
To kick things off, let's install the necessary libraries:
pip install streamlit langchain langchain-openai beautifulsoup4 python-dotenv chromadb
Next, we import the essential Python libraries. Streamlit creates the app interface, LangChain handles language tasks, BeautifulSoup aids in web scraping, and python-dotenv manages environment variables.
Functions to Fetch and Process Data
def get_vectorstore_from_url(url):
# Function to process the URL and return vector store
return vector_store
The get_vectorstore_from_url function accepts a URL, retrieves the webpage's text, splits it into manageable chunks, and converts those chunks into vector embeddings for semantic search.
def get_context_retriever_chain(vector_store):
# Sets up a retriever chain for relevant info retrieval
return retriever_chain
This function establishes a retriever chain responsible for fetching pertinent information based on the conversation context.
def get_conversational_rag_chain(retriever_chain):
# Constructs a full retrieval-augmented generation (RAG) chain
return create_retrieval_chain(retriever_chain, stuff_documents_chain)
The get_conversational_rag_chain function builds on the retriever chain to create a comprehensive RAG chain, orchestrating the generation of responses based on retrieved documents and conversational history.
def get_response(user_input):
# Generates a response to the user's input
return response['answer']
The get_response function coordinates the entire process of crafting a response to user inquiries, engaging the conversation RAG chain with the current chat history and user input.
Streamlit App Configuration
st.set_page_config(page_title="Chat with websites", page_icon="🤖")
st.title("Chat with websites")
This sets up the Streamlit app with a title and an icon.
User Interface and State Management
with st.sidebar:
website_url = st.text_input("Website URL")
A sidebar is available for users to input a website URL, which the chatbot will utilize to gather information.
if "vector_store" not in st.session_state:
st.session_state.vector_store = get_vectorstore_from_url(website_url)
This checks if the vector store exists in the session state; if not, it initializes it with the website data.
Main Loop for Interaction
user_query = st.chat_input("Type your message here...")
if user_query:
response = get_response(user_query)
The main loop captures user input and retrieves a response from the chatbot, which is displayed on the interface.
Displaying the Conversation History
for message in st.session_state.chat_history:
if isinstance(message, AIMessage):
with st.chat_message("AI"):
st.write(message.content)
The chat history is rendered in the Streamlit app, showcasing messages from both the AI and the user in a conversational format.
Detailed Explanation of Vectorization
Vectorization transforms website text into numerical formats, or "embeddings," that machines can interpret.
def get_vectorstore_from_url(url):
loader = WebBaseLoader(url) # Load the website text
document = loader.load() # Store the text in 'document'
text_splitter = RecursiveCharacterTextSplitter() # Initialize the text splitter
document_chunks = text_splitter.split_documents(document) # Split text into chunks
vector_store = Chroma.from_documents(document_chunks, OpenAIEmbeddings()) # Create embeddings
return vector_store
Streamlit Interface Overview
Streamlit provides a user-friendly web interface. It allows for the creation of input fields, buttons, and a chat display format.
st.set_page_config(page_title="Chat with websites", page_icon="🤖") # Set up the page
st.title("Chat with websites") # Display a title
with st.sidebar: # Set up the sidebar for input
website_url = st.text_input("Website URL") # Input field for the URL
if website_url: # If a URL is provided
if "vector_store" not in st.session_state:
st.session_state.vector_store = get_vectorstore_from_url(website_url) # Get the vector store
Logic Flow for Conversational Intelligence
The logic flow governs the chatbot's memory and state, retrieves relevant information based on user input, and generates appropriate responses.
if user_query:
response = get_response(user_query) # Get a response from the chatbot logic
st.session_state.chat_history.append(HumanMessage(content=user_query)) # Append user query to history
st.session_state.chat_history.append(AIMessage(content=response)) # Append bot response to history
With this setup, we can extract website content, process it through vectorization, interact with users via the Streamlit interface, and maintain an ongoing conversation that builds on previous exchanges.
Video Demonstrations
Building Chatbots to Chat with Your Data | Retrieval QA Chain & Streamlit UI | Part 3
This video guides you through the process of creating chatbots that can interact with your data using a Retrieval Question-Answering chain and a Streamlit user interface.
RAG with LangChain & Streamlit: LLM Chatbot Tutorial for Beginners
In this tutorial, you’ll learn how to build a LangChain-based chatbot with Streamlit, perfect for beginners wanting to explore LLM capabilities.
This framework serves as a foundation for designing your own LLM. If you’ve undertaken a similar project, please share your experiences in the comments. Happy learning! 😇😇