A curated library of book-formatted transcripts from YouTube self-development content. The site turns long-form video essays into readable, chapter-structured markdown books with cover images, tags, and reading time estimates. No client-side JavaScript. No API routes. Just static HTML served fast.
What Problem This Solves
YouTube is the largest free university on the internet, but it has a format problem. A 45-minute video essay on discipline or stoicism cannot be searched, bookmarked, or read on a slow connection. You cannot skim it. You cannot quote it without transcribing manually. You cannot read it offline.
I wanted a personal library of the ideas that mattered to me, in a format I could actually use: text, structured, searchable, and fast. Existing solutions either required manual copy-pasting or locked content behind paywalls. I wanted something I owned, something I could add to with a single command, and something that respected the reader’s time and connection speed.
How It Works
The pipeline is simple: a YouTube URL goes in, a markdown book comes out.
User input → transcript extraction → text cleaning → markdown formatting → frontmatter injection → static build → deployed site
The youtube-transcript npm package fetches caption data. A custom Node.js script cleans timestamps, fixes punctuation spacing, and splits text into paragraphs. It then wraps everything in markdown with YAML frontmatter: title, source URL, channel, author, date, tags, excerpt, and cover image path. The resulting file drops into src/content/books/ and Astro’s content collection picks it up automatically.
The homepage renders a grid of book cards sorted by date. Each card shows a cover image, primary tag, title, channel name, and excerpt. Clicking through lands on a detail page with the full book content, reading time estimate, tag list, and a link back to the original video. Tag filters on the homepage link to /tags/{tag} routes.
Two image systems run in parallel. Cover images live in src/assets/images/books/ and are imported through a centralized map in src/utils/images.ts, rendered with Astro’s <Image> component for automatic responsive sizing and format conversion. Inline chapter images live in public/images/books/ and are handled by a custom imageProcessor integration that builds a manifest and generates responsive WebP and JPEG variants at 400w, 800w, 1200w, and 1600w. A rehype plugin swaps standard <img> tags for <picture> elements with srcset and lazy loading.
Why I Chose Astro Over Next.js
I considered Next.js because I know it well, but it felt like the wrong tool. Next.js wants hydration, API routes, and server components. This site has no interactivity. No forms. No auth. No data fetching at runtime. Astro’s content-first static site generation produces plain HTML files with zero client-side JavaScript. The build output is 48 pages of static HTML and processed images. No hydration overhead means faster time-to-first-byte and no runtime dependencies to break.
I also looked at Eleventy and Hugo. Eleventy would have worked, but I wanted TypeScript and Astro’s native content collections with Zod schema validation. Hugo is fast, but I prefer JavaScript/TypeScript environments for extensibility. Astro hit the sweet spot: content collections, image processing, and TypeScript, with a build process I understood.
Why a Custom Image Pipeline Instead of Astro’s Built-In Service
Astro’s built-in image service handles src/assets/ well, but the chapter images inside markdown are referenced by path strings, not imports. The built-in service does not automatically process images referenced in markdown content. I wrote a custom integration that:
- Scans
public/images/books/at build time - Generates responsive WebP and JPEG variants for each image
- Writes a manifest mapping original paths to dimensions
- Injects a rehype plugin that transforms
<img>tags into responsive<picture>elements
The trade-off is build-time complexity. The build takes ~4.5 seconds for 12 books and ~100 images. I accepted this because it means zero runtime image processing and perfect cache headers via vercel.json.
Why Tailwind CSS v4
Tailwind v4’s new Vite plugin integration meant no PostCSS config file and faster builds. The dark theme uses stone-950 for the background and amber-400 for accents. Custom fonts (Cormorant Garamond for headings, Inter for body) are self-hosted as subsetted WOFF2 files with unicode-range declarations, so browsers only download the glyphs they need.
What I Would Do Differently
The tag filter links on the homepage currently 404 because I never built the /tags/{tag} route pages. I planned to add them but deprioritized in favor of adding more books. In hindsight, I should have stubbed the route with a simple filter page before shipping the links.
The content collection schema uses z.coerce.date() for the date field, which has caused subtle issues with string dates in frontmatter. I should have used z.string() and parsed explicitly.
The youtube-transcript package occasionally fails on videos without captions or with auto-generated captions disabled. I have no fallback for these cases. A future improvement would be to integrate a speech-to-text pipeline for videos without available transcripts.
What It Achieved
- 12 books published, ~4,800 lines of content
- 48 static pages built in 4.54 seconds
- Zero client-side JavaScript
- All images served as responsive WebP with 1-year cache headers
- Single-command book addition via
node scripts/youtube-transcript.js <url> book
What I Learned
Writing a custom Astro integration taught me how the build hooks work. The astro:config:setup hook runs before the build starts, which is the right place to generate assets and inject rehype plugins. I also learned that Sharp’s Node.js API is more reliable than its CLI for batch image processing. The CLI frequently fails with argument parsing errors; the programmatic API has never let me down.
Mocking the image manifest in tests was harder than expected because the integration runs at build time, not test time. I ended up writing a separate test script that builds a temporary Astro project and asserts on the output HTML. It is slow but accurate.
Next Steps
- Build the
/tags/{tag}filter pages - Add a search index via Pagefind or a lightweight client-side search
- Add dark mode toggle (currently hardcoded dark)
- Explore a speech-to-text fallback for uncaptioned videos