How to bulk-transcribe a 400-episode podcast back catalog
·By Podscribie

Step 1 — Scope the job honestly
Before you start: open the show in any podcast player and count three numbers.
- Episode count — RSS feed length, easily 500+ for a long-running show.
- Median episode length — eyeball it; The Daily is 25 min, Lex Fridman is 180 min, Acquired is 240+.
- Total audio hours = episodes × median / 60. This is the only number that determines your cost and your time-to-finish.
A 400-episode show averaging 90 minutes is 600 hours of audio. That number drives every decision below.
Step 2 — Pick a transcription engine
Three real options, each with a different cost shape:
- Whisper, locally. Free at the wallet, expensive in clock time. On an M2 Pro running
whisper.cppwith the medium model you get roughly 3× realtime → 600 hours of audio is 200 hours of compute. No speaker labels unless you bolt onpyannote. Quality is good but not Apple-good for music-heavy intros. - Per-minute API (Deepgram, AssemblyAI, OpenAI). Roughly $0.0043-$0.0065 per minute → $150-$240 for 600 hours, with speaker diarization included on the first two. Quality is excellent. You write the orchestration: download MP3, upload, poll, store, retry on transient failures.
- Hosted bulk tool. What we built. Paste the show URL, hit Select all, walk away. Cost shape is subscription rather than per-minute and the orchestration is already there. We use Deepgram Nova-3 underneath.
For one show, Whisper-locally is romantic but slow. For an ongoing workflow, the per-minute API or a hosted tool is strictly faster and the quality difference matters more than the price difference.
Step 3 — Get speaker labels right
If you skip this step you'll regret it. Without speaker labels, a 90-minute interview reads as one long monologue with random pronoun shifts, and Claude/ChatGPT will silently attribute the host's questions to the guest.
Three things to insist on:
- Diarization on by default.Deepgram and AssemblyAI both support this; OpenAI gpt-4o-transcribe doesn't natively.
- Real names, not “Speaker 0”. Pull host + guest names from the episode title or RSS metadata and rename labels in post.
- Verify the first 30 seconds of each episode. Mis-attribution almost always happens in the cold-open. If the first 30 seconds are right, the rest usually follows.
Step 4 — Store as Markdown + JSON, not just text
The format you save in determines what you can do with the back catalog later. The combination that ages well:
- One Markdown file per episode with YAML frontmatter (title, guest, date, duration, source URL). This drops cleanly into Obsidian or NotebookLM, and Claude Projects accepts it as a file.
- One
manifest.jsonwith episode-level metadata across the whole show. Lets you do “all episodes between 2023-01-01 and 2024-06-30” without re-parsing every Markdown file. - Inline timestamps like
[00:14:32]at every speaker change. You will want them the first time you ask Claude “where did he say X?” and need to verify.
Avoid plain .txt dumps and avoid SRT/VTT — the first loses speaker labels, the second is built for video and fights every LLM tokenizer.
Step 5 — Feed it into Claude / ChatGPT / NotebookLM
Long-context models are extremely good at some kinds of bulk-transcript queries and surprisingly bad at others. What works:
- Claude Projects — drop 10-30 episodes (≈ 300k-800k tokens) into a Project and ask comparative questions across them. Sonnet 4 holds the whole context.
- NotebookLM— paste up to 50 sources; excellent at “summarize the evolution of X across these episodes.” See our NotebookLM walkthrough.
- ChatGPT with file uploads — best for one episode at a time, weak across many. See the ChatGPT pattern.
What does not work: dumping all 400 episodes (millions of tokens) into a single context window and expecting precise recall. You need either Projects-style grouping by ~10-30 episodes, or a real RAG pipeline.
The shortcut
The above is the playbook if you're building this yourself. If you'd rather skip steps 2-4 entirely, drop a show URL into Podscribie and hit Select all. We do the orchestration, the diarization, the Markdown + JSON formatting, the speaker labeling, and give you the ZIP. A 400-episode back-catalog finishes in roughly 30-90 minutes of wall-clock time.
Two free transcripts a day let you test it on the show you care about before committing. If it doesn't fit your workflow, the playbook above is yours to keep.
Stop copy-pasting Apple Podcasts transcripts 200 words at a time.
Two free transcripts a day. No credit card. Works on any platform.
Free · 2 episodes/day, up to 45 minutes each.