Skip to main content
  1. User Stories/

Neuroimaging Lab

·7 mins·

The Goal #

A neuroimaging research lab wants a unified vault that captures everything their experiments produce – from raw scanner data through processed derivatives – alongside the communications, code, and scheduling artifacts that surround the science.

The lab runs MRI experiments, collects behavioral and stimulus data, communicates over Slack, tracks events in Google Calendar, and develops processing code on GitHub. Today these live in disconnected silos: DICOMs on a PACS server, event files on a lab workstation, Slack threads in Slack’s cloud, code on GitHub, and processed results scattered across lab members’ home directories.

Data Sources #

MRI Acquisitions (DICOMs) #

The primary data stream. The lab acquires structural and functional MRI data using the ReproIn naming convention, which encodes BIDS-compatible metadata directly in DICOM series descriptions. This means conversion to BIDS can be fully automated via HeuDiConv.

SourceFormatVolume per sessionFrequency
Structural (T1w, T2w)DICOM200-500 MBPer subject
Functional (BOLD)DICOM2-10 GBMultiple runs per session
Field mapsDICOM50-200 MBPer session
Diffusion (DWI)DICOM1-5 GBOptional

Stimulus Capture (ReproStim) #

ReproStim captures the actual audio/video stimuli presented during scanning sessions – screen recordings with QR-code-embedded timing synchronization. This is critical for relating brain activity to specific stimulus events.

The captured media lands in git-annex as binary content and can later be annotated via Annotation Garden to produce BIDS-compliant events files with HED tags.

Behavioral Events (CurDes BIRCH) #

The CurDes BIRCH response box records participant button presses, response times, and event timing during MRI experiments. These event logs are the behavioral counterpart to the fMRI data – they document what the participant did and when.

Event files need to be converted to BIDS events format (*_events.tsv with onset, duration, and trial type columns) and aligned with the functional imaging data timing.

Slack (Lab Communication) #

The lab uses Slack for internal communication: experiment coordination, data quality discussions, analysis troubleshooting, paper drafts, and general lab life.

slackdump archives these conversations into structured JSON with full threading, reactions, and file attachments.

Key channels to archive:

ChannelContentPrivacy
#experimentsSession scheduling, scanner issues, protocol changesprivate
#analysisProcessing questions, pipeline debugging, results discussionprivate
#papersManuscript drafts, reviewer responses, submission coordinationprivate
#generalLab announcements, social coordinationprivate

Google Calendar (Scheduling) #

Scanner time slots, lab meetings, deadlines, conference dates. Available via Google Takeout or CalDAV API export. Low volume but useful for correlating events (“when did we change the protocol?” maps to a calendar entry).

GitHub (Code and Project Management) #

The lab maintains repositories for:

  • Processing pipelines – scripts that orchestrate HeuDiConv, MRIQC, fMRIPrep
  • Analysis code – statistical analysis, figures, manuscripts
  • Stimulus code – PsychoPy/PsychToolbox experiment scripts
  • Lab wiki/docs – protocols, onboarding materials

The code is already in git, but associated artifacts – issues, pull request discussions, wiki pages, CI logs – are platform-hosted and at risk of loss. con/tinuous archives CI logs, git-bug bridges issues into git, and python-github-backup captures the full repository metadata.

The lab’s Forgejo-Aneksajo instance (deployed via Lab-in-a-Box) can mirror GitHub repositories and use GitHub as an OAuth2 authentication source, so lab members log in with their existing GitHub accounts. See the Software Project story for a deeper treatment of GitHub organization archival.

Processing Pipeline #

Once data enters the vault, the processing pipeline runs:

DICOMs (ReproIn convention)
    → HeuDiConv + ReproIn heuristic → BIDS dataset
        → BIDS validator → pass/fail gate
            → MRIQC → QC reports (visual review)
                → fMRIPrep → preprocessed derivatives

Each step is wrapped in datalad run (via datalad-container for containerized BIDS Apps) so the full processing provenance is recorded. con/duct captures resource telemetry (memory, CPU, wall time) for each step.

BIDS Apps #

AppPurposeContainer
MRIQCImage quality metrics and visual reportsSingularity via datalad-container
fMRIPrepStandardized fMRI preprocessing (motion correction, registration, confound estimation)Singularity via datalad-container

These are run as containerized BIDS Apps registered in the dataset via datalad containers-add, following the ReproNim/containers collection pattern.

Hypothetical Vault Organization #

TODO: AI-generated layout, to be curated.

A lab typically runs multiple studies concurrently. The vault groups data by study under studies/, with shared sourcedata at the top level and per-study BIDS datasets and derivatives below. The BIDS-converted data lives under sourcedata/bids-raw/ following the BIDS convention for raw data placement (see bids-specification#2191 for related discussions on directory naming). Preprocessing may happen per recording session (since sessions arrive incrementally from the scanner), with study-level aggregation happening later.

lab-vault/                               # DataLad superdataset
    ├── sourcedata/                      # Raw acquisitions (all studies)
    │   ├── dicoms/                      # Raw DICOMs (ReproIn naming)
    │   │   ├── {date}/{session}/        # Per-session, routed by ReproIn study name
    │   │   └── ...
    │   ├── reprostim/                   # Stimulus capture recordings
    │   │   └── {date}/{session}/
    │   ├── birch/                       # Behavioral event logs
    │   │   └── {date}/{session}/
    │   └── physio/                      # Physiological recordings (if any)
    ├── studies/                          # Per-study BIDS datasets
    │   ├── study-taskswitch/            # One study
    │   │   ├── sourcedata/bids-raw/    # BIDS-converted (aggregated from sourcedata)
    │   │   │   ├── dataset_description.json
    │   │   │   ├── participants.tsv
    │   │   │   └── sub-01/
    │   │   │       ├── ses-01/
    │   │   │       │   ├── anat/
    │   │   │       │   ├── func/
    │   │   │       │   └── fmap/
    │   │   │       └── ...
    │   │   └── derivatives/
    │   │       ├── mriqc/               # QC reports for this study
    │   │       └── fmriprep/            # Preprocessed data
    │   ├── study-language/              # Another study
    │   │   ├── sourcedata/bids-raw/
    │   │   └── derivatives/
    │   └── ...
    ├── code/                            # Processing scripts, heuristics
    │   ├── heudiconv-heuristic.py
    │   └── processing-pipeline.sh
    ├── communications/
    │   └── slack/                       # Archived Slack channels
    ├── calendar/                        # Exported Google Calendar events
    ├── docs/                            # Lab protocols, SOPs
    └── .datalad/

DICOMs arrive per session and land in sourcedata/dicoms/. The ReproIn study name in the DICOM headers routes converted data to the correct study under studies/. Derivatives can be produced per session as data arrives (fMRIPrep on a single session) and later aggregated into study-level summaries.

Each study, each derivative, and the communications dataset are nested DataLad subdatasets, following YODA principles.

Distribution and Privacy #

ContentDistributionRationale
BIDS sourcedata/bids-raw (defaced)OpenNeuro / DANDIPublic sharing after defacing
DerivativesOpenNeuro (as derivative dataset)Public, no PII
Raw DICOMsPrivate (encrypted backup only)Contains facial features, PHI
Slack archivesPrivate (lab remote only)Confidential communications
CalendarPrivateLab scheduling details
CodeGitHub (public or private per repo)Already public in most cases
ReproStim recordingsPrivate or restrictedMay contain faces, voices
BIRCH event logsPublic (with BIDS dataset)No PII in button presses

Use git-annex wanted expressions with distribution-restrictions metadata to enforce these policies automatically. See Privacy and Access Control.

Workflow Overview #

TODO: AI-generated layout, to be curated.

flowchart TD scanner[MRI Scanner] -->|DICOMs with ReproIn naming| dicoms[sourcedata/dicoms/] reprostim[ReproStim] -->|screen capture + QR timing| stim[sourcedata/reprostim/] birch[CurDes BIRCH] -->|event timing logs| events[sourcedata/birch/] dicoms -->|HeuDiConv + ReproIn routing| studies["studies/*/sourcedata/bids-raw/"] stim -->|timing extraction + annotation| studies events -->|convert to BIDS events.tsv| studies studies -->|BIDS validator| validate{valid?} validate -->|yes| mriqc[MRIQC -- QC reports] validate -->|no| fix[Fix issues] fix --> studies mriqc -->|visual review| qc_gate{QC pass?} qc_gate -->|yes| fmriprep[fMRIPrep -- preprocessing] qc_gate -->|flag| review[Human review] fmriprep --> derivatives["studies/*/derivatives/"] slack[Slack] -->|slackdump| comms[communications/slack/] gcal[Google Calendar] -->|export| calendar[calendar/] github[GitHub] -->|CI logs via con/tinuous| code[code/] subgraph vault[Lab Vault -- git-annex / DataLad] dicoms stim events studies mriqc derivatives comms calendar code end derivatives -->|datalad push| openneuro[OpenNeuro / DANDI] studies -->|defaced| openneuro

Relevant Tools #

ComponentToolStatus
DICOM to BIDS conversionHeuDiConv + ReproInMature, production-ready
Stimulus captureReproStimActive development
Stimulus annotationAnnotation GardenAlpha
Quality controlMRIQCMature
PreprocessingfMRIPrepMature
Container managementdatalad-containerStable
Resource telemetrycon/ductStable
Slack archivalslackdumpWorking
CI log archivalcon/tinuousStable
Issue archivalgit-bugStable
Repository backuppython-github-backupStable
Self-hosted forgeForgejo-AneksajoBeta
DeploymentLab-in-a-BoxAlpha

See Also #