Skip to main content

Infrastructure

The tools in the Tools section pull artifacts into git-annex/DataLad repositories. But a repository on a single workstation is not an infrastructure – it is a single point of failure with an audience of one.

This section catalogs the services and deployment systems that turn a collection of local repositories into a resilient, collaborative, self-hosted research platform.

Core Services #

Forgejo-Aneksajo – A Forgejo fork with native git-annex support. It serves as the self-hosted forge for browsing, cloning, and collaborating on DataLad datasets through a web interface. It is the foundation of DataLad Hub.

HedgeDoc – Collaborative real-time markdown editing for documentation, meeting notes, and lab notebooks. Documents are exported and committed to git for preservation.

DataLad Hub – A hosted service built on Forgejo-Aneksajo for publishing and sharing DataLad datasets.

Deployment #

pyinfra – Python-based infrastructure automation used to deploy and configure all the services above.

Lab-in-a-Box – A pyinfra-based deployment that bundles Forgejo-Aneksajo, HedgeDoc, and other services into a single reproducible “lab infrastructure” stack. One command, one box, everything a research group needs.

Design Principles #

The infrastructure stack follows the same principles as the data it manages:

  • Configuration as code – all deployment logic lives in git, versioned and auditable.
  • Self-hosted – no dependency on third-party SaaS for core operations.
  • Composable – services can be deployed individually or as a bundle.
  • Git-native – the forge, the datasets, and the deployment configs all live in git repositories.