A self-hosted internet archiving tool that takes URLs and saves them in multiple formats – HTML, PDF, screenshot, WARC, media files – for long-term preservation.
Flagship tool for archiving YouTube channels and playlists into git-annex repositories with full metadata preservation. Built on yt-dlp with native DataLad integration for incremental, content-addressed video archival.
A single-file Python file server supporting HTTP, WebDAV, SFTP, FTP, and TFTP with resumable uploads, content-based deduplication, media indexing, and a full-featured web interface. Useful as an ingestion front-end and quick file sharing tool within research infrastructure.
Integrates Docker and Singularity/Apptainer container images into DataLad datasets, enabling reproducible computational workflows where both data and execution environments are version-controlled.
A fork of Forgejo that adds native git-annex protocol support, enabling self-hosted web browsing, cloning, and collaboration on DataLad datasets. Foundation of the DataLad Hub service.
A fork of Gogs with git-annex support for versioning large research data files, operated by G-Node (German Neuroinformatics Node) at LMU Munich. Funded by the same NSF+BMBF CRCNS program as DataLad.
A mature, well-established website mirroring tool that creates offline-browsable copies of entire websites, preserving directory structure and link integrity.
Python tool for exporting Matrix room messages to structured YAML files, with support for media downloads, E2E encrypted rooms, and SSO authentication.
Universal cloud storage adapter supporting 70+ providers. Serves a dual role in con/serve: ingestion (pull files from cloud) and distribution (push archives to cloud as a git-annex special remote). The universal adapter between the git-annex vault and the cloud storage ecosystem.
A browser extension (and CLI tool) that saves a complete web page – including CSS, images, fonts, and iframes – into a single, self-contained HTML file.
The Swiss army knife of video downloading. Foundation for many archival workflows, usable standalone with git-annex import or as the engine behind annextube’s DataLad-native integration.