A cloud-native, headless browser-based web crawler that creates high-fidelity WARC archives, capturing JavaScript-rendered content that traditional crawlers miss.
Command-line tool for downloading image galleries from numerous hosting sites. Extracts images and metadata in structured formats suitable for git-annex archival.
Concept-level guide to archiving Zoom meeting recordings using the Zoom API for cloud recordings and filesystem integration for local recordings, with git-annex storage and transcript extraction for AI readiness.