Media

Table of Contents

Research media – conference talks, lab meeting recordings, tutorial videos, podcast episodes, and image datasets – is some of the hardest content to preserve. Files are large, hosting platforms impose retention limits, and binary formats resist version control.

git-annex was designed precisely for this problem: content-addressed storage that tracks large files without bloating the git repository.

This section catalogs tools for downloading, organizing, and archiving media artifacts into git-annex/DataLad repositories.

Platforms and Formats #

YouTube – The dominant platform for research talks, tutorials, and conference recordings. annextube provides DataLad-native YouTube archival; yt-dlp offers a more general-purpose approach.

Zoom – Ubiquitous for lab meetings and virtual conferences. Recordings often have expiration dates, making proactive archival essential. See Zoom Archival.

Podcasts and Audio – Research podcasts, interview recordings, and audio datasets. yt-dlp handles most podcast feeds; specialized tools exist for specific use cases.

Image Galleries – Figures, microscopy images, and photo documentation. gallery-dl archives images from numerous hosting platforms.

AI Readiness #

Media files are inherently ai-manual – binary content that requires transcription or captioning before an LLM can work with it. However, many tools also capture structured metadata (titles, descriptions, timestamps, chapter markers) that is ai-ready on its own. A practical archival strategy preserves both the media files (in git-annex) and their metadata (in git) so that AI workflows can operate on the metadata while the full media remains available for human review.

con/annextube

12 February 2026·5 mins

ai-partial Media native-datalad Youtube CON Youtube Video Datalad Git-Annex Metadata Archival Yt-Dlp

Flagship tool for archiving YouTube channels and playlists into git-annex repositories with full metadata preservation. Built on yt-dlp with native DataLad integration for incremental, content-addressed video archival.

gallery-dl

12 February 2026·4 mins

ai-manual Media git-annex Images Images Galleries Download Metadata

Command-line tool for downloading image galleries from numerous hosting sites. Extracts images and metadata in structured formats suitable for git-annex archival.

yt-dlp

12 February 2026·4 mins

ai-partial Media git-annex Youtube Youtube Video Audio Download Metadata

The Swiss army knife of video downloading. Foundation for many archival workflows, usable standalone with git-annex import or as the engine behind annextube’s DataLad-native integration.

Zoom Recording Archival

12 February 2026·5 mins

ai-manual Media git-annex Zoom Zoom Video Meetings Recordings Api

Concept-level guide to archiving Zoom meeting recordings using the Zoom API for cloud recordings and filesystem integration for local recordings, with git-annex storage and transcript extraction for AI readiness.