Infrastructure
The tools in the Tools section pull artifacts into git-annex/DataLad repositories. But a repository on a single workstation is not an infrastructure – it is a single point of failure with an audience of one.
This section catalogs the services and deployment systems that turn a collection of local repositories into a resilient, collaborative, self-hosted research platform.
Core Services #
Forgejo-Aneksajo – A Forgejo fork with native git-annex support. It serves as the self-hosted forge for browsing, cloning, and collaborating on DataLad datasets through a web interface. It is the foundation of DataLad Hub.
HedgeDoc – Collaborative real-time markdown editing for documentation, meeting notes, and lab notebooks. Documents are exported and committed to git for preservation.
DataLad Hub – A hosted service built on Forgejo-Aneksajo for publishing and sharing DataLad datasets.
Deployment #
pyinfra – Python-based infrastructure automation used to deploy and configure all the services above.
Lab-in-a-Box – A pyinfra-based deployment that bundles Forgejo-Aneksajo, HedgeDoc, and other services into a single reproducible “lab infrastructure” stack. One command, one box, everything a research group needs.
Design Principles #
The infrastructure stack follows the same principles as the data it manages:
- Configuration as code – all deployment logic lives in git, versioned and auditable.
- Self-hosted – no dependency on third-party SaaS for core operations.
- Composable – services can be deployed individually or as a bundle.
- Git-native – the forge, the datasets, and the deployment configs all live in git repositories.