Publications

Scope #

Citation Discovery and Curation – Finding all works that cite, are cited by, or are related to a given set of publications. citations-collector automates this across CrossRef, OpenCitations, DataCite, and OpenAlex.

PDF Acquisition – Obtaining full-text PDFs through legal open-access channels (Unpaywall, publisher OA repositories, preprint servers) and archiving them with provenance metadata in git-annex.

Reference Management – Synchronizing curated collections with reference managers like Zotero for collaborative bibliography management, BetterBibTeX export, and integration with writing workflows.

Why Version-Control References? #

A reference collection is a living dataset. New citations appear as papers accumulate downstream citations. PDFs get updated when authors post corrections. Metadata improves as aggregators reconcile records.

By storing references and PDFs in a DataLad dataset, every change is tracked, reproducible, and attributable – the same principles that apply to code and data apply to the scholarly record itself.

citations-collector

12 February 2026·5 mins

ai-ready Publications native-datalad Publications LinkML JSON YAML JSON-LD CON Citations Scholarly Crossref Opencitations Datacite Openale Zotero Pdf Provenance

Discovers citations across CrossRef, OpenCitations, DataCite, and OpenAlex; syncs with Zotero; acquires PDFs with git-annex provenance tracking; and stores everything in a DataLad dataset using a LinkML schema aligned with CiTO and FaBiO ontologies.

Zotero Integration

12 February 2026·3 mins

ai-ready Publications external Publications Zotero References Bibliography Bibtex Export

Integration with the Zotero reference manager for synchronizing curated reference collections with DataLad datasets. Export BibTeX, JSON, and structured metadata for git-tracked bibliography management.