Skip to main content
  1. Concepts/

Experience Ledger

·10 mins·

Every pipeline run produces logs. Most of them are never looked at again – until something goes wrong, and then someone wishes they had been paying attention all along.

An experience ledger is a structured, compressed record of what the vault’s automated operations have learned over time: which processing steps fail and why, which datasets need special handling, how resource requirements vary across data types, and what resolutions worked for past problems.

It is another application of the Frozen Frontier pattern: raw execution logs are the deep stratum, the ledger is the working surface – a condensed, queryable knowledge base with back-pointers to the original evidence.

The Raw Material #

The vault already produces execution records at multiple levels:

LayerToolWhat it capturesFormat
Provenancedatalad runCommand, inputs, outputs, environmentJSON in git commit metadata
Telemetrycon/ductCPU, memory (peak/avg RSS, VSZ), I/O, wall time, child processesJSON Lines (.duct/)
CI historycon/tinuousBuild logs, artifacts, success/failure status from GitHub Actions, Travis, AppveyorText logs + metadata in git-annex
Remote jobsReproManJob submission scripts, scheduler logs, resource allocationProvenance records in git history
AI sessionsClaude Code hooks, Entire.ioAgent reasoning, decisions, session transcriptsJSON / Markdown

Each layer produces useful data in isolation. The experience ledger connects them: a con/duct log showing OOM links to the datalad run commit that recorded the command, to the con/tinuous archive of the CI run that triggered it, and to the dataset version that was being processed.

Compressing Experience #

Raw logs are too voluminous to consult directly. The ledger compresses them into actionable patterns:

Failure Patterns #

A failure pattern captures:

  • What failed: processing step, tool, version
  • How it failed: exit code, failure mode (OOM, timeout, disk full, data error, configuration error)
  • On what: dataset identity, data characteristics (number of subjects, file sizes, modalities)
  • Evidence: pointer to the con/duct log, CI log, and git commit
  • Resolution: what fixed it (more memory, different parameters, data cleanup, upstream bug fix)
  • Recurrence: how often this pattern has appeared
failure_pattern:
  id: fp-001
  type: oom
  tool: fmriprep/24.1.1
  step: bold_hmc (head motion correction)
  trigger: datasets with >300 BOLD volumes per run
  peak_rss_gb: 42.7  # from con/duct
  allocated_gb: 32
  first_seen: 2025-08-14
  occurrences: 17
  affected_datasets:
    - ds003456 (sub-01, sub-07, sub-12)
    - ds004789 (sub-03)
  resolution: "Request 64GB node; upstream fix in fmriprep/25.0.0"
  evidence:
    - duct_log: .duct/fmriprep-ds003456-sub01-20250814.jsonl
    - ci_run: ci/github/push/2025/08/14/fmriprep-process/8834/
    - commit: a1b2c3d4

Resource Baselines #

Aggregate con/duct telemetry into per-tool, per-data-type baselines:

ToolData typeTypical peak RSSTypical wall timeKnown edge cases
fMRIPrepSingle-session functional8-12 GB4-8 hoursMulti-echo: 2x memory; >300 volumes: risk OOM at 32 GB
MRIQCStructural T1w2-4 GB15-30 minLarge FOV: 2x time
HeuDiConvDICOM session0.5-1 GB2-10 minNon-standard series descriptions: may need custom heuristic

These baselines guide resource requests for new runs and trigger alerts when a run deviates significantly from the expected profile.

Operational Heuristics #

Distilled rules of thumb from accumulated experience:

  • “fMRIPrep on datasets from scanner X consistently needs --skull-strip-t1w force
  • “slackdump incremental exports fail silently if the API token has expired; check freshness first”
  • “annextube re-runs on channels with deleted videos produce empty stubs; filter before committing”
  • “MRIQC group reports require raw BIDS data even after individual processing; keep inputs available”

These are the kind of knowledge that accumulates in a team’s collective memory and is lost when people move on. The ledger makes it explicit, version-controlled, and queryable.

Concrete Use Case: OpenNeuroDerivatives #

OpenNeuroDerivatives runs fMRIPrep and MRIQC across 784+ OpenNeuro datasets on the TACC Frontera supercomputer using BABS (which wraps execution in datalad run).

At this scale, failures are routine:

  • Subjects that run out of memory on 32 GB nodes
  • Datasets with non-standard BIDS structures that trip validation
  • Processing steps that timeout on unusually large acquisitions
  • Intermittent infrastructure failures (node crashes, filesystem hiccups)

Without an experience ledger, each failure is investigated from scratch. The operator reads the log, diagnoses the issue, applies a fix, and moves on – carrying the knowledge only in their head.

With a ledger, the pattern is recorded: the next time a similar dataset arrives, the system (or a pipeline-operator agent) can consult past experience and preemptively allocate more memory, apply the known workaround, or flag the dataset for manual review before wasting a compute allocation on a predictable failure.

Dataset Identity and the Ledger #

The experience ledger must track which dataset was processed, but dataset identity in a DataLad ecosystem is more nuanced than a single file path.

The same DataLad dataset (git/git-annex repository) can exist at multiple locations in different versions:

  • The canonical copy on Forgejo-aneksajo
  • A sibling on GitHub or GIN
  • A clone on a compute cluster
  • A published snapshot on OpenNeuro or DANDI
  • A local working copy on a researcher’s laptop

DataLad identifies datasets using multiple layers of identity:

Identity layerWhat it identifiesPersistence
Dataset UUIDThe dataset as a whole, across its entire historyPermanent – created once at datalad create
Git commit SHAA specific version (snapshot) of the datasetImmutable – content-addressed
Annex keyA specific file content, regardless of path or datasetImmutable – content-addressed (typically SHA256)
PID (DOI, Handle)A published version, citable and resolvablePermanent – assigned at publication

The DataLad concepts vocabulary formalizes these identity layers using LinkML:

  • Thing as the base class with a pid (persistent identifier) slot and additional context-specific identifiers
  • Dataset with version_of, revision_of, derived_from, alternate_of relations for tracking how versions relate
  • Distribution for modeling the same dataset at multiple locations
  • Checksum (subtypes: DOI, ORCID, ISSN) for integrity and identity verification

The experience ledger links execution records to datasets via their UUID and commit SHA – so “fMRIPrep on ds003456 at commit abc123” is an unambiguous reference regardless of where that dataset lives. When the same dataset is processed again at a later version, the ledger can compare outcomes across versions.

PROV-DM and the Execution Record #

The DataLad concepts vocabulary includes a PROV-DM interface (things-prov schema) that models:

  • Activities: things that occur over time and act upon entities (a pipeline run, an ingestion step, an AI curation session)
  • Entities: things that are used and generated by activities (datasets, files, configuration, the ledger itself)
  • Agents: things that bear responsibility for activities (human operators, AI assistants, automated pipelines)

Each activity connects to entities through qualified relationships: Generation (activity produced entity), Usage (activity consumed entity), Derivation (output entity derived from input entity), Revision (new version of an existing entity).

The experience ledger extends this provenance model with execution-specific attributes:

# Extending DataLad concepts for execution tracking
ExecutionActivity:
  is_a: ActivityMixin
  slots:
    - command           # the datalad-run-recorded command
    - exit_code
    - started_at
    - ended_at
    - resource_usage    # -> ResourceTelemetry (from con/duct)
    - compute_context   # -> ComputeResource (local, HPC, cloud)
    - failure_mode      # success | oom | timeout | disk_full | data_error | config_error
    - resolution        # what fixed it, if it failed

ResourceTelemetry:
  is_a: Thing
  slots:
    - peak_rss_mb
    - avg_cpu_percent
    - wall_time_seconds
    - disk_io_bytes
    - captured_by       # con/duct version, configuration

ComputeResource:
  is_a: Thing
  slots:
    - resource_type     # local | hpc_slurm | hpc_condor | cloud_aws
    - hostname
    - managed_by        # ReproMan resource or Forgejo Actions runner

This is not a formal proposal for extending the DataLad concepts schema – it is a sketch of how execution experiences could be modeled using the same vocabulary and patterns, so that the ledger’s metadata is interoperable with the broader DataLad metadata ecosystem.

The Ledger as Frozen Frontier #

The experience ledger is itself a Frozen Frontier:

StratumContentConsumers
Raw logscon/duct JSON Lines, CI build output, job scheduler logs, ReproMan provenance recordsForensic debugging, root cause analysis
Condensed ledgerFailure patterns, resource baselines, operational heuristics, resolution historyPipeline operators, AI agents, capacity planning
DashboardItems needing attention, overdue ingestions, health metricsLab managers, daily operations

Each level is a compressed context over the one below. The raw logs live in the vault (archived by con/tinuous, committed alongside data by con/duct). The ledger summarizes them. The dashboard summarizes the ledger. And because everything is in git-annex, you can always drill down from the dashboard through the ledger to the original log line that explains the anomaly.

Agents and the Ledger #

The pipeline-operator agent (defined in .claude/agents/pipeline-operator.md) is the primary consumer of the experience ledger. When investigating a failure, it should:

  1. Query the ledger for similar past failures (same tool, same failure mode, similar dataset characteristics)
  2. Check whether a known resolution exists
  3. If so, propose applying the known fix
  4. If not, investigate from scratch using the raw logs, then record the new pattern in the ledger

This is how operational knowledge accumulates: each incident enriches the ledger, and the agent’s effectiveness improves over time – not because the model changes, but because the knowledge base it consults grows.

The ingestion-curator agent also benefits: it can check the ledger for source-specific quirks before formalizing a new ingestion step (“the last three times we ingested from this API, rate limiting kicked in after 1000 requests; add backoff to the wrapper”).

Dataset Characteristics as Input to the Ledger #

Operational requirements – how much memory to request, how many subjects to parallelize, how long a job can run before timing out – depend on the dataset at hand. A dataset with 16 subjects, each having 3 short BOLD runs, makes very different demands than one with 200 subjects and hour-long multi-echo acquisitions.

To make informed decisions about resource allocation and job parameters, the vault needs summary tables of dataset characteristics extracted from the data itself. The OpenNeuroStudies project demonstrates this pattern with per-dataset summary files like sourcedata+subjects.tsv:

source_idsubject_idbold_numbold_sizebold_duration_totalbold_voxels_totalt1w_numt1w_sizedatatypes
ds000001sub-0131418713031800.040550415663237anat,func
ds000001sub-0231394658211800.040550415487612anat,func

These summaries distill the operationally relevant properties of each subject: number of acquisitions, file sizes, scan durations, voxel counts, modalities present. The extraction should be formalized and automated – every dataset entering the vault should get a summary table generated as a datalad run step, so the characteristics are available before any processing begins.

From Characteristics to Strategies #

When the experience ledger correlates dataset characteristics with execution outcomes (from con/duct telemetry), it becomes possible to establish informed strategies:

  • Resource limits: “BOLD runs with >300 volumes (identifiable from bold_voxels_total) need 64 GB nodes for fMRIPrep; smaller runs fit in 32 GB.”
  • Parallelization: “Subjects with >500 MB total BOLD size should be processed one at a time per node; smaller subjects can run 4-way parallel.”
  • Timeout thresholds: “fMRIPrep wall time scales roughly linearly with bold_duration_total; set timeout to 2x the predicted duration.”
  • Preemptive flagging: “Datasets with mixed datatypes (e.g., anat,func,dwi,fmap) have historically higher failure rates in HeuDiConv conversion; route to manual review.”

Without these summaries, resource requests are guesswork or worst-case over-provisioning. With them, the ledger can match a new dataset’s characteristics against past execution profiles and recommend parameters before the first job is submitted.

Generalizing Beyond Neuroimaging #

The same principle applies to any artifact type in the vault. For a Slack workspace: number of channels, messages per channel, attachment sizes, date range. For a YouTube channel: number of videos, total duration, caption availability, resolution. For a citations collection: number of references, proportion with full-text PDFs, average PDF size.

Each artifact type has its own operationally relevant dimensions. Formalizing the extraction of these summaries into standardized tables (TSV, Parquet, or similar) is a prerequisite for an experience ledger that can reason about resource requirements and failure risk.

Open Questions #

  • Schema formalization – should the experience ledger use a formal extension of the DataLad concepts vocabulary, or is a lighter-weight format (YAML records, git-annex metadata) sufficient?
  • Granularity – at what level do we compress? Per-subject, per-dataset, per-tool, per-failure-mode?
  • Automation boundary – which ledger entries can be extracted automatically from logs vs. which require human annotation (e.g., “this failure was caused by a known scanner firmware bug”)?
  • Cross-vault knowledge – can experience ledgers be shared across vault instances (e.g., our lab’s fMRIPrep experience is useful to another lab running the same pipeline)? What are the privacy implications?
  • Relationship to DataLad Catalog – the DataLad Catalog already provides metadata browsing; could the experience ledger be surfaced through a similar interface?

See Also #