Infrastructure
Table of Contents
The tools in the Tools section pull artifacts into git-annex/DataLad repositories. But a repository on a single workstation is not an infrastructure – it is a single point of failure with an audience of one.
This section catalogs the services and deployment systems that turn a collection of local repositories into a resilient, collaborative, self-hosted research platform.
Core Services #
Forgejo-Aneksajo – A Forgejo fork with native git-annex support. It serves as the self-hosted forge for browsing, cloning, and collaborating on DataLad datasets through a web interface. It is the foundation of DataLad Hub.
HedgeDoc – Collaborative real-time markdown editing for documentation, meeting notes, and lab notebooks. Documents are exported and committed to git for preservation.
DataLad Hub – A hosted service built on Forgejo-Aneksajo for publishing and sharing DataLad datasets.
Deployment #
pyinfra – Python-based infrastructure automation used to deploy and configure all the services above.
Lab-in-a-Box – A pyinfra-based deployment that bundles Forgejo-Aneksajo, HedgeDoc, and other services into a single reproducible “lab infrastructure” stack. One command, one box, everything a research group needs.
Annotation #
Annotation Garden – Open infrastructure for collaborative annotation of neuroscience stimuli. Uses git branches as stackable annotation layers, BIDS/HED standards for interoperability, and AI-accelerated annotation generation with human refinement. Particularly relevant for ReproStim-captured audio/video stimuli, and generalizable to any experiment with media needing annotation.
Visualization & Browsing #
PhotoPrism – AI-powered photo management with face recognition, automatic categorization, and map views. The heaviest option, best for large collections where AI-assisted organization adds value.
Photoview – Lightweight photo gallery that reads directly from the filesystem. Already deployed as a service in Lab-in-a-Box. Best for well-organized collections that need simple web browsing.
copyparty – While primarily a file server, copyparty’s built-in image gallery and grid-view thumbnails make it a zero-setup option for quick photo album browsing over git-annex working trees.
These tools embody the data-visualization separation principle: photos live in git-annex, and any of these frontends can be attached (or replaced) without touching the archived data.
Design Principles #
The infrastructure stack follows the same principles as the data it manages:
- Configuration as code – all deployment logic lives in git, versioned and auditable.
- Self-hosted – no dependency on third-party SaaS for core operations.
- Composable – services can be deployed individually or as a bundle.
- Git-native – the forge, the datasets, and the deployment configs all live in git repositories.