Generative AI Series

Retrieval Augmented Generation(RAG) — Chatbot for Youtube with LlamaIndex

Implement the RAG technique using Langchain, and LlamaIndex for conversational chatbot on YouTube.

A B Vijay Kumar
4 min readFeb 12, 2024


This blog is an ongoing series on GenerativeAI and is a continuation of the previous blog, which talks about the RAG pattern and how RAG is used to augment prompts and enhance the content and context of an LLM, with specific data.

In this blog, we will build a Q&A chatbot for content fetched from YouTube. We will be using the LlamaIndex reader YoutubeTranscriptReader

Please go through the following blogs

  1. Prompt Engineering: Retrieval Augmented Generation(RAG)
  2. Retrieval Augmented Generation(RAG) — Chatbot for documents with LlamaIndex
  3. Retrieval Augmented Generation(RAG) — Chatbot for Database (Text2SQL)
  4. Retrieval Augmented Generation(RAG) — Chatbot for Wikipedia with LlamaIndex

Let’s walk through the code. I won’t be walking through all the code, as it is very similar to what I had published and explained in my previous blogs. I will just call out specific changes made.

The following screenshot shows the requirements.txt. We will be using llama-hub to use the YoutubeTranscriptReader. We will also be using youtube_transcript_api.

The implementation is very simple, using the youtube_transcript_api to extract the transcript from a YouTube video, use that to create the index, and use that for RAG.

We will be using YoutubeTranscriptReader to extract the YouTube transcript. we will be using is_youtube_video to check if the URL that is provided is a valid YouTube URL.

We will build a simple interface using Streamlit. We will have an input text field to get the YouTube URL in the sidebar, and then we will reuse the chat interface we built in previous blogs. The following code shows the sidebar and the main window.

The following code is used to clean the vectorDB and index when we load a new YouTube video

The following code loadYoutubeURL() is used to load a new URL. in this function, we are cleaning the current index files, and then using YoutubeTranscriptReader to grab the YouTube transcript to create the new index and vectors.

The following code, I have already explained in previous blogs. Please refer to the older blogs. This is the chat interface, where the user asks the question, and we render the answer, based on the RAG-based query.

If you have read my previous blogs, this is straightforward. Please read my other blogs on llama index.

Lets run the code

streamlit run

The following screenshot shows the application running. As you can see some of the responses. And I have pasted the screenshots of the youtube video, that covers these topics.

The following screenshot shows the various responses that the LLM explored based on similarity.

There you go. With this, I am ending this llama index series on how to extract content from various sources and apply RAG patterns, so many other readers/connectors can be used to extract content from APIs, Chats, etc. You can read about various connectors here.

I will be exploring other topics in llamaindex in my future blogs.

I hope this blog was useful, Please leave your feedback, comments, and improvements.

You can find the code in my github here



A B Vijay Kumar

IBM Fellow, Master Inventor, Mobile, RPi & Cloud Architect & Full-Stack Programmer