Zoom Recording Archival
Table of Contents
Zoom Recording Archival describes approaches for preserving Zoom meeting recordings – both cloud-hosted and locally saved – into git-annex repositories. Unlike the other tools in this section, there is no single turnkey tool for Zoom archival. Instead, this page documents the building blocks and patterns for constructing a Zoom archival pipeline.
The Problem #
Zoom is ubiquitous in research: lab meetings, seminars, thesis defenses, collaborator calls, and conference sessions are routinely recorded. But these recordings are fragile:
- Cloud recordings are deleted after a configurable retention period (often 30-120 days) or when storage quotas are exceeded
- Local recordings live on individual laptops and are rarely backed up systematically
- Institutional Zoom accounts may be deprovisioned when personnel leave
- Zoom’s built-in sharing is not designed for long-term preservation
For research groups, losing these recordings means losing institutional knowledge – context, decisions, and discussions that are never written down anywhere else.
Cloud Recordings via the Zoom API #
Zoom provides a REST API that includes endpoints for listing and downloading cloud recordings. This is the most reliable path for automated archival.
API Approach #
# Conceptual workflow -- not a complete implementation
import requests
# 1. Authenticate via Server-to-Server OAuth
access_token = get_zoom_access_token(account_id, client_id, client_secret)
# 2. List recordings for a user or date range
recordings = requests.get(
"https://api.zoom.us/v2/users/{userId}/recordings",
headers={"Authorization": f"Bearer {access_token}"},
params={"from": "2025-01-01", "to": "2025-12-31"}
).json()
# 3. Download each recording file
for meeting in recordings["meetings"]:
for file in meeting["recording_files"]:
download_url = file["download_url"] + f"?access_token={access_token}"
# Download and store in git-annex
Key API Endpoints #
| Endpoint | Purpose |
|---|---|
GET /users/{userId}/recordings | List cloud recordings for a user |
GET /meetings/{meetingId}/recordings | Get recordings for a specific meeting |
GET /accounts/{accountId}/recordings | List recordings across an account (admin) |
| Download URL from recording object | Download the actual recording files |
Authentication #
Zoom’s API uses Server-to-Server OAuth apps (recommended) or JWT (deprecated). Setting up a Server-to-Server OAuth app requires:
- Creating an app in the Zoom App Marketplace
- Configuring scopes:
recording:read:admin(orrecording:readfor user-level) - Obtaining
account_id,client_id, andclient_secret
Integration with git-annex #
A practical archival script would:
#!/bin/bash
# zoom-archive.sh - Conceptual archival workflow
ARCHIVE_DIR="$1"
cd "$ARCHIVE_DIR" || exit 1
# Run the Python download script
python zoom_download.py --output ./recordings/
# Each meeting gets a directory:
# recordings/2025-03-15_Lab-Meeting/
# video.mp4 -> git-annex
# audio_only.m4a -> git-annex
# transcript.vtt -> git (text, searchable)
# chat.txt -> git (text)
# meeting_info.json -> git (metadata)
# Add to git-annex
git annex add --include='*.mp4' --include='*.m4a'
git add --all # transcripts, chat logs, metadata go to git
git commit -m "Zoom archive update $(date -I)"
Local Recording Archival #
For locally saved Zoom recordings (stored on the host machine), the approach is simpler but requires discipline:
Default Local Recording Location #
Zoom saves local recordings to:
- macOS:
~/Documents/Zoom/ - Windows:
%USERPROFILE%\Documents\Zoom\ - Linux:
~/Documents/Zoom/
Each recording creates a directory with:
video.mp4– the recording videoaudio_only.m4a– audio-only trackchat.txt– in-meeting chat log (if any)
Scripted Import #
#!/bin/bash
# import-local-zoom.sh - Import local Zoom recordings to git-annex
ZOOM_DIR="$HOME/Documents/Zoom"
ARCHIVE_DIR="$1"
cd "$ARCHIVE_DIR" || exit 1
# Copy new recordings
for meeting_dir in "$ZOOM_DIR"/*/; do
meeting_name=$(basename "$meeting_dir")
if [ ! -d "recordings/$meeting_name" ]; then
cp -r "$meeting_dir" "recordings/$meeting_name"
fi
done
# Add to git-annex
git annex add recordings/ --include='*.mp4' --include='*.m4a'
git add --all
git commit -m "Import local Zoom recordings $(date -I)"
Zoom’s Built-in Transcripts #
Zoom can generate transcripts for cloud recordings automatically. These are valuable for AI readiness:
- Audio transcript (.vtt format) – time-stamped text of spoken content
- Meeting summary (if AI Companion is enabled) – structured summary of the meeting
When downloading via the API, transcript files are included as separate recording file entries with file_type: "TRANSCRIPT". These should be stored in git (not annex) since they are small text files that benefit from version tracking and full-text search.
Enhancing Transcripts #
Zoom’s auto-generated transcripts can be improved with:
- Whisper (OpenAI) – run on the downloaded audio for higher-quality transcription
- Speaker diarization – identify who said what (Zoom transcripts sometimes include this; Whisper-based tools like
whisperxcan add it) - Manual correction – for critical meetings, reviewing and correcting the auto-generated transcript
AI Readiness #
Zoom recordings are ai-manual out of the box, but Zoom’s transcript features significantly improve this:
| Component | AI Ready? | Notes |
|---|---|---|
| Transcripts (.vtt) | Yes | Time-stamped text, directly usable |
| Chat logs | Yes | Plain text, immediately usable |
| Meeting metadata | Yes | JSON from API, structured |
| Video files | No | Require transcription |
| Audio files | No | Require transcription (Whisper) |
| AI Companion summaries | Yes | Structured meeting summaries |
If Zoom’s transcription is enabled for cloud recordings, a substantial portion of the meeting content becomes text-searchable and AI-accessible immediately upon archival.
Tooling Status #
The maturity level for Zoom archival is concept. There is no single, maintained, open-source tool that provides a complete Zoom-to-git-annex pipeline. The pieces exist:
- Zoom’s API is well-documented and stable
- git-annex handles the storage layer well
- Python libraries for Zoom API interaction exist (though none are specifically designed for archival)
What is missing is a purpose-built tool analogous to annextube for YouTube. This represents an opportunity for the con/serve ecosystem: a con/zoomtube or similar tool that handles authentication, incremental download, transcript extraction, and DataLad integration in a single package.
Existing Partial Solutions #
- zoom-recording-downloader – Python script for bulk downloading cloud recordings via Zoom API
- zoomdl – Go-based Zoom recording downloader (archived)
- Various institutional scripts shared on GitHub – search for “zoom recording backup” or “zoom archive script”
None of these integrate with git-annex or DataLad, but they demonstrate the API patterns needed.