In this series of blog posts, the author explains how they built an AI assistant to summarize YouTube videos. The first post focuses on capturing the transcript of a YouTube video using OpenAI’s Whisper, an open-source voice-to-text model. The author discusses the motivation behind building the AI assistant and the limitations of existing online services. They outline the solution blueprint, which includes steps such as getting the transcript from YouTube, transcribing using Whisper, and providing a user interface. The author also provides code snippets and discusses alternative approaches for implementing each step. Finally, they mention the option of using HuggingFace Hub APIs for cloud-based inference.