How to bulk-transcribe a 400-episode podcast back catalog

May 7, 2026·By Podscribie

People keep asking the same thing on r/podcasts and r/ObsidianMD: “I want to transcribe a podcast back catalog — 200, 400, sometimes 800 episodes — and feed the whole thing into Claude or NotebookLM. What's the actual playbook?” Here's a realistic one, including what it costs, how long it takes, and where the failure modes are.

A podcast back-catalog being bulk transcribed in a single dashboard.

Step 1 — Scope the job honestly

Before you start: open the show in any podcast player and count three numbers.

Episode count — RSS feed length, easily 500+ for a long-running show.
Median episode length — eyeball it; The Daily is 25 min, Lex Fridman is 180 min, Acquired is 240+.
Total audio hours = episodes × median / 60. This is the only number that determines your cost and your time-to-finish.

A 400-episode show averaging 90 minutes is 600 hours of audio. That number drives every decision below.

Step 2 — Pick a transcription engine

Three real options, each with a different cost shape:

Whisper, locally. Free at the wallet, expensive in clock time. On an M2 Pro running whisper.cpp with the medium model you get roughly 3× realtime → 600 hours of audio is 200 hours of compute. No speaker labels unless you bolt on pyannote. Quality is good but not Apple-good for music-heavy intros.
Per-minute API (Deepgram, AssemblyAI, OpenAI). Roughly $0.0043-$0.0065 per minute → $150-$240 for 600 hours, with speaker diarization included on the first two. Quality is excellent. You write the orchestration: download MP3, upload, poll, store, retry on transient failures.
Hosted bulk tool. What we built. Paste the show URL, hit Select all, walk away. Cost shape is subscription rather than per-minute and the orchestration is already there. We use Deepgram Nova-3 underneath.

For one show, Whisper-locally is romantic but slow. For an ongoing workflow, the per-minute API or a hosted tool is strictly faster and the quality difference matters more than the price difference.

Step 3 — Get speaker labels right

If you skip this step you'll regret it. Without speaker labels, a 90-minute interview reads as one long monologue with random pronoun shifts, and Claude/ChatGPT will silently attribute the host's questions to the guest.

Three things to insist on:

Diarization on by default.Deepgram and AssemblyAI both support this; OpenAI gpt-4o-transcribe doesn't natively.
Real names, not “Speaker 0”. Pull host + guest names from the episode title or RSS metadata and rename labels in post.
Verify the first 30 seconds of each episode. Mis-attribution almost always happens in the cold-open. If the first 30 seconds are right, the rest usually follows.

Step 4 — Store as Markdown + JSON, not just text

The format you save in determines what you can do with the back catalog later. The combination that ages well:

One Markdown file per episode with YAML frontmatter (title, guest, date, duration, source URL). This drops cleanly into Obsidian or NotebookLM, and Claude Projects accepts it as a file.
One manifest.jsonwith episode-level metadata across the whole show. Lets you do “all episodes between 2023-01-01 and 2024-06-30” without re-parsing every Markdown file.
Inline timestamps like [00:14:32]at every speaker change. You will want them the first time you ask Claude “where did he say X?” and need to verify.

Avoid plain .txt dumps and avoid SRT/VTT — the first loses speaker labels, the second is built for video and fights every LLM tokenizer.

Step 5 — Feed it into Claude / ChatGPT / NotebookLM

Long-context models are extremely good at some kinds of bulk-transcript queries and surprisingly bad at others. What works:

Claude Projects — drop 10-30 episodes (≈ 300k-800k tokens) into a Project and ask comparative questions across them. Sonnet 4 holds the whole context.
NotebookLM— paste up to 50 sources; excellent at “summarize the evolution of X across these episodes.” See our NotebookLM walkthrough.
ChatGPT with file uploads — best for one episode at a time, weak across many. See the ChatGPT pattern.

What does not work: dumping all 400 episodes (millions of tokens) into a single context window and expecting precise recall. You need either Projects-style grouping by ~10-30 episodes, or a real RAG pipeline.

The shortcut

The above is the playbook if you're building this yourself. If you'd rather skip steps 2-4 entirely, drop a show URL into Podscribie and hit Select all. We do the orchestration, the diarization, the Markdown + JSON formatting, the speaker labeling, and give you the ZIP. A 400-episode back-catalog finishes in roughly 30-90 minutes of wall-clock time.

Two free transcripts a day let you test it on the show you care about before committing. If it doesn't fit your workflow, the playbook above is yours to keep.

Stop copy-pasting Apple Podcasts transcripts 200 words at a time.

Two free transcripts a day. No credit card. Works on any platform.

Free · 2 episodes/day, up to 45 minutes each.