Concepts
The Tools section catalogs individual tools. The Infrastructure section describes the services that host them. This section describes the patterns and architectural concepts that tie everything together.
These are not tool-specific – they apply across artifact types and describe how the con/serve ecosystem works as a whole.
Topics #
Ingestion Patterns – Common strategies for pulling data into git-annex repositories: direct download, API extraction, crawling, mounting, and bridging.
Conservation to External Resources – How to publish and back up from your git-annex vault to cloud storage, domain archives, and institutional repositories.
Vault Organization – Survey of directory organization approaches – PARA, Johnny Decimal, BagIt, OCFL, RO-Crate, BIDS, hive partitioning – and the principles that should guide the layout of a heterogeneous DataLad superdataset vault.
Data-Visualization Separation – The MVC principle applied to archived data: keep collected data in standard formats (TSV, JSON, Parquet), build hierarchical summaries for navigation, and let use-case-appropriate viewers (VisiData, Datasette, custom HTML) attach freely.
Automation and Pipelines – Triggering ingestion on external events, multi-step data transformation (ETL), human-and-AI-in-the-loop curation, branch-based workflow orchestration (BIDS-flux), observability dashboards, and idempotent processing over git/git-annex/DataLad.
Experience Ledger – Compressing operational experiences into reusable knowledge: extracting failure patterns, resource baselines, and operational heuristics from execution logs (con/duct, con/tinuous, ReproMan) into a condensed, queryable knowledge base – a Frozen Frontier over raw operational data.
Domain Extensions – How the generic con/serve platform extends to domain-specific workflows like neuroimaging, genomics, or digital humanities, adding specialized formats, conversion pipelines, and publishing targets.