Skip to main content

Infrastructure

The tools in the Tools section pull artifacts into git-annex/DataLad repositories. But a repository on a single workstation is not an infrastructure – it is a single point of failure with an audience of one.

This section catalogs the services and deployment systems that turn a collection of local repositories into a resilient, collaborative, self-hosted research platform.

Core Services #

Forgejo-Aneksajo – A Forgejo fork with native git-annex support. It serves as the self-hosted forge for browsing, cloning, and collaborating on DataLad datasets through a web interface. It is the foundation of DataLad Hub.

HedgeDoc – Collaborative real-time markdown editing for documentation, meeting notes, and lab notebooks. Documents are exported and committed to git for preservation.

DataLad Hub – A hosted service built on Forgejo-Aneksajo for publishing and sharing DataLad datasets.

Deployment #

pyinfra – Python-based infrastructure automation used to deploy and configure all the services above.

Lab-in-a-Box – A pyinfra-based deployment that bundles Forgejo-Aneksajo, HedgeDoc, and other services into a single reproducible “lab infrastructure” stack. One command, one box, everything a research group needs.

Annotation #

Annotation Garden – Open infrastructure for collaborative annotation of neuroscience stimuli. Uses git branches as stackable annotation layers, BIDS/HED standards for interoperability, and AI-accelerated annotation generation with human refinement. Particularly relevant for ReproStim-captured audio/video stimuli, and generalizable to any experiment with media needing annotation.

Visualization & Browsing #

PhotoPrism – AI-powered photo management with face recognition, automatic categorization, and map views. The heaviest option, best for large collections where AI-assisted organization adds value.

Photoview – Lightweight photo gallery that reads directly from the filesystem. Already deployed as a service in Lab-in-a-Box. Best for well-organized collections that need simple web browsing.

copyparty – While primarily a file server, copyparty’s built-in image gallery and grid-view thumbnails make it a zero-setup option for quick photo album browsing over git-annex working trees.

These tools embody the data-visualization separation principle: photos live in git-annex, and any of these frontends can be attached (or replaced) without touching the archived data.

Design Principles #

The infrastructure stack follows the same principles as the data it manages:

  • Configuration as code – all deployment logic lives in git, versioned and auditable.
  • Self-hosted – no dependency on third-party SaaS for core operations.
  • Composable – services can be deployed individually or as a bundle.
  • Git-native – the forge, the datasets, and the deployment configs all live in git repositories.

Annotation Garden

An initiative providing open infrastructure for collaborative annotation of stimuli used in neuroscience research. Uses git branches as stackable annotation layers, BIDS/HED standards for interoperability, and AI-accelerated annotation generation with human refinement. Directly relevant to ReproStim-captured audio/video stimuli and generalizable to any experiment with media needing temporal annotation.

PhotoPrism

A self-hosted, AI-powered photo management application with face recognition, automatic categorization, map views, and album management. In the con/serve architecture, PhotoPrism serves as a rich visualization frontend over git-annex-tracked photo archives, applying the data-visualization separation principle: photos live in git-annex, PhotoPrism provides the browsable UI.