DataLad Hub
Table of Contents
Overview #
DataLad Hub is a public deployment of Forgejo-Aneksajo – the Forgejo fork with native git-annex support. It provides a web interface for publishing, browsing, cloning, and collaborating on DataLad datasets without the need to deploy and maintain your own infrastructure.
DataLad Hub is not a separate tool or codebase – it is an instance of Forgejo-Aneksajo, much like github.com is an instance of GitHub’s proprietary forge. The underlying technology is described on the Forgejo-Aneksajo page.
Key Features #
- Dataset hosting – push DataLad datasets including both git metadata and git-annex content
- Web browsing – explore dataset contents, file trees, and metadata through the Forgejo web interface
- Standard git workflows – clone, fork, pull request, and collaborate using familiar git patterns
- git-annex support – full git-annex protocol support means
datalad pushanddatalad getwork seamlessly - Organizations and teams – group datasets by lab, project, or collaboration with appropriate access controls
git-annex / DataLad Integration #
Integration level: native-datalad.
# Create a sibling on DataLad Hub
datalad create-sibling-gogs --name hub \
--api https://hub.datalad.org/api/v1 \
--credential datalad-hub-token
# Push dataset (git refs + annexed content)
datalad push --to hub
Because the backend is Forgejo-Aneksajo, both git and git-annex content are handled by the same server. There is no need to configure separate special remotes for annexed content.
Others to Consider #
datalad-registry (live at registry.datalad.org) – a service for auto-discovery and metadata extraction of DataLad datasets. Rather than hosting datasets, it indexes datasets discovered on GitHub and other hosts, extracting metadata to make them searchable. Could be a useful complement to DataLad Hub for making archived datasets discoverable.
AI Readiness #
Level: ai-partial.
The API provides programmatic access to repository metadata, file listings, and issue discussions – all structured and AI-consumable. The actual dataset content depends on the specific datasets hosted and may require domain-specific processing.
Relationship to Lab-in-a-Box #
DataLad Hub can be thought of as a managed Forgejo-Aneksajo deployment focused on dataset hosting. Research groups that want the same capabilities on their own hardware can deploy Lab-in-a-Box instead.
| DataLad Hub | Lab-in-a-Box | |
|---|---|---|
| Hosting | Managed | Self-hosted |
| Setup effort | Create account | Deploy server |
| Customization | Limited | Full control |
| Data location | Hub servers | Your servers |
| Additional services | Dataset hosting only | Forgejo + HedgeDoc + more |
See Also #
- Forgejo-Aneksajo – the technology powering DataLad Hub
- Lab-in-a-Box – self-hosted alternative