Question Answering with YouTube Videos Using RAG in LangChain

usmanmalik57 2 Tallied Votes 32 Views Share

In previous articles, I explained how to use natural language to interact with PDF documents and SQL databases, using the Python LangChain module and OpenAI API.

In this article, you will learn how to use LangChain and OpenAI API to create a question-answering application that allows you to retrieve information from YouTube videos. So, let's begin without ado.

Importing and Installing Required Libraries

Before diving into the code, let's set up our environment with the necessary libraries.

We will use the Langchain module to access the vector databases and execute queries on large language models to retrieve information about YouTube videos. We will also employ the YouTube Transcript API for fetching video transcripts, the Pytube library for downloading YouTube videos, and the FAISS vector index for efficient similarity search in large datasets.

The following script installs these modules and libraries.


!pip install -qU langchain
!pip install -qU langchain-community
!pip install -qU langchain-openai
!pip install -qU youtube-transcript-api
!pip install -qU pytube
!pip install -qU faiss-cpu

The script below imports the required libraries into our Python application.


from langchain_community.document_loaders import YoutubeLoader
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_core.documents import Document
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
import os

Creating Text Documents from YouTube Videos

The first step involves converting YouTube video content into text documents. You can use the from_youtube_url() method of the LangChain YoutubeLoader class, which extracts video transcripts and additional video information. You need to pass a list of YouTube video URLs to this method.

The following script converts YouTube videos into text documents and prints the transcript of the first YouTube video in our list of YouTube videos.


urls = [
    "https://www.youtube.com/watch?v=C5BkxbbLbIY",
    "https://www.youtube.com/watch?v=7qbJvucsU-Y",
    "https://www.youtube.com/watch?v=DNNMS7l6A-g",
    "https://www.youtube.com/watch?v=1rh-P2N0DR4",
    "https://www.youtube.com/watch?v=AW2kppo3rtI",
    "https://www.youtube.com/watch?v=lznP52OEpQk",
    "https://www.youtube.com/watch?v=ul14Qs24-s8",
    "https://www.youtube.com/watch?v=sB3N0ZqjPyM",
    "https://www.youtube.com/watch?v=JG1zkvwrn3E"
]
docs = []
for url in urls:
    docs.extend(YoutubeLoader.from_youtube_url(url, add_video_info=True).load())


print("=====================================")
print(f"Total number of videos: {len(docs)}")
print("=====================================")
print("Contents of video 1")
print("=====================================")
print(docs[0])

Output:

image1.png

<img src="images\image1.png">

The transcript of each video is stored in a LangChain document format, as you can see from the following script.


type(docs[0])

Output:


langchain_core.documents.base.Document

Once you have text in the LangChain document format, you can split it using any LangChain splitter, create text embeddings, and use it for retrieval augmented generation like any other text document. Let's see these steps.

Splitting and Embedding YouTube Video Documents

To handle large documents and prepare them for retrieval, we split them into smaller chunks using the LangChain RecursiveCharacterTextSplitter. This facilitates more efficient processing and embedding.

The embeddings are generated using OpenAIEmbeddings, which transform the text chunks into numerical vectors. These vectors are stored in a FAISS vector index, enabling fast and efficient similarity searches.


openai_key = os.environ.get('OPENAI_KEY2')

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
)

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

embeddings = OpenAIEmbeddings(openai_api_key = openai_key)
vector = FAISS.from_documents(documents, embeddings)

Question Answering with YouTube Videos

The rest of the process is straightforward. We create an object of the ChatOpenAI LLM and define a retriever that will retrieve the documents that match our query from the FAISS vector index.


llm = ChatOpenAI(
    openai_api_key = openai_key ,
    model = 'gpt-4',
    temperature = 0.5
)

retriever = vector.as_retriever()

Next, we will create two chains. The first chain, history_retriever_chain, will be responsible for maintaining memory. The input to this chain will be the chat history in the form of a list of messages and the user prompt. The output from this chain will be context relevant to the user prompt and chat history. The following script defines the history_retriever_chain.


prompt = ChatPromptTemplate.from_messages([
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    ("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation")
])

history_retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

Next, we will create a document_chain. This chain will receive the user prompt and chat history as input, as well as the output context from the history_retriever_chain.


prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the user's questions based on the below context:\n\n{context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}")
])
document_chain = create_stuff_documents_chain(llm, prompt)

Finally, we will create our retrieval_chain that combines the history_retriever_chain and document_chain to retrieve the final response from the LLM.

In the following script, we ask the LLM a question about the top three things to do in Paris. Notice that we did not ask the LLM to return information about the top three things to do in Paris. The LLM is intelligent enough to infer from the context containing YouTube video documents that we are asking a question about the top three things to do in Paris.


retrieval_chain = create_retrieval_chain(history_retriever_chain, document_chain)

result = retrieval_chain.invoke({
    "chat_history": [],
    "input": "What are the top 3 things to do?"
})

print(result['answer'])

Output:


The top three things to do based on the context are:

1. Visiting the top of the Ark (Arc de Triomphe) for a view of the city.
2. Having a picnic near the Eiffel Tower with food from a local market.
3. Renting bikes and riding around the city with no specific destination.

A Command Line YouTube Question Answering Chatbot

Now that we know how to get answers to our questions related to YouTube videos, we can create a simple command-line application for question-answering. To do so, we define an empty chat_history list and the generate_response_with_memory() method that accepts a user prompt as a parameter.

Inside the method, we generate a response for the user prompt and then append both the input prompt and response to the chat_history list. Thanks to the' chat_history' list, the command line application will be able to answer follow-up questions.

chat_history = []

def generate_response_with_memory(query):
    result = retrieval_chain.invoke({
    "chat_history": chat_history,
    "input": query
    })

    response = result['answer']
    chat_history.extend([HumanMessage(content = query),
                       AIMessage(content = response)])

    return response

The following script executes a while loop. Inside the loop, the user asks a question that is passed on to the generate_response_with_memory() method. The method's response is displayed on the console. The process continues until the user types bye.


print("=======================================================================")
print("A Chatbot that tells you things to do in Paris. Enter your query")
print("=======================================================================")

query = ""
while query != "bye":
    query = input("\033[1m User >>:\033[0m")

    if query == "bye":
        chat_history = []
        print("\033[1m Chatbot>>:\033[0m Thank you for your messages, have a good day!")
        break
    response = generate_response_with_memory(query)
    print(f"\033[1m Chatbot>>:\033[0m {response}")

Output:

image2.png

Conclusion

YouTube videos are a valuable source of information. However, not everyone has time to watch complete YouTube videos. In addition, a viewer might only be interested in certain parts of videos. The YouTube question-answering chatbot allows you to chat with any YouTube videos. I encourage you to test this application and let me know your feedback.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.