Lars Vilhuber
2024-11-13
Cornell University
Implementing Increased Transparency and Reproducibility in Economics 2020 Video
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. “The FAIR Guiding Principles for scientific data management and stewardship.” Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Old method: send the journal a ZIP file
Source: Your laptop
Destination: random file on a journal website
Questions/ What-ifs:
Old method: send the journal a ZIP file
Source: Your laptop
Destination: random file on a journal website
Questions/ What-ifs:
These are provenance questions.
Old method: send the journal a ZIP file
Source: Your laptop
Destination: random file on a journal website
Questions/ What-ifs:
These are FAIR questions
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
“Good data management is not a goal in itself, but rather is the key conduit leading to knowledge discovery and innovation, and to subsequent data and knowledge integration and reuse by the community after the data publication process.”
“FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.”
Industry-proposed data lifecycle:
… which might really be a line
National Academies of Sciences Engineering and Medicine}, Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs, Washington, DC: The National Academies Press, 2020. https://dx.doi.org/10.17226/25639
Once you have collected the data
Once you have registered your analysis plan
Let's consider the preservation part separately:
Preserve as you go
Publication typically involves making information about the data, as well as the data themselves, available to others.
To be Findable:
To be Accessible:
To be Interoperable:
To be Re-usable:
Interoperable: Structured metadata about the data
Accessible: Structured metadata about the deposit
Findable: persistent identifier, indexed
Re-usable: License permits it!
(this was actually hidden in the metadata)
IAB: Establishment History Panel (BHP) - Version 7518 v1 at https://doi.org/10.5164/IAB.BHP7518.de.en.v1
Access conditions involve application process.
But information ABOUT the access process (=metadata) is available.
In decreasing order of “freely available”
The role of journals is to provide a permanent record of scientific knowledge.
Trusted Repositories
Journals and institutions have assessed a number of trusted repositories:
Here: Sandbox for Zenodo
In one of my day jobs:
We will NOT use the regular Zenodo; rather, we will test in the Sandbox.
Check your URL bar! There's no other indication that this is not the real Zenodo!
https://library.cfa.harvard.edu/data-archiving-and-sharing (Harvard Center for Astrophysics)
Let's go to Zenodo:
- Presenter: https://sandbox.zenodo.org/deposit/910136
- Viewers: https://sandbox.zenodo.org/
Since I have already defined the survey, and created some test data, I can … publish it!
Goal: Robustness and automation - getting close to push-button reproducibility
Goal: Correctly document reproducible research