Part 6: Template README and Reproducible Practices

2023-09-26

Follow-up

Things you can do on your own:

Write your own README
How to run your software
Try out a cloud service
Versioning (using Github and creating releases)
Licensing
Depositing a replication package

Overview

Part 1:

Reproducible practices
The role of the template README

Summarizes the data sources
Summarizes necessary resources
Summarizes reproduction procedures

Template README

Published in December 2020

Overview

guides a reader through the available material and a route to replicating the results in the research paper, including

the description of the origins of data and/or description of programs.
provides a brief overview of the available material and
provides a brief guide as to how to proceed from beginning to end
then dives into the specifics.

Data and Code Availability Statement (DCAS)

It contains information about the sources of data used in the replication package, in addition to or instead of such detailed description in the manuscript.

Not just a data citation
describes additional information necessary for the obtention of the data.

These may include

required registrations,
memberships,
application procedures,
monetary cost, or
other qualifications.

Computational Requirements

For simple replication packages, may appear to be trivial (a laptop and some common software)

What if requirement is expensive commercial software and a super computer cluster?

Computational Requirements

In order to assess the complexity of the task of replicating, authors should specify each of the following elements:

Software used, including version number as used. If the code is expected to run with a lower version number, that should be added.
Any additional packages, including their version number or similar, as used.
The computer hardware specification as used by the author, in terms of OS, CPU generation and quantity, memory and necessary disk space. If multiple computers were used, the specification for each should be identified.
The wall-clock time given the provided computer hardware, expressed in appropriate units (minutes, days).

Details of the README

Expectations

The README is strongly suggested, but sometimes ignored.

You should nevertheless treat all replication packages as if they should have had the same information, easily accessible.

More details

Important: The information should describe ALL data used, regardless of whether they are provided as part of the replication archive or not, and regardless of size or scope.

For instance, if using GDP deflators, the source of the deflators (e.g. at the national statistical office) should also be listed here.

Rights and licenses

can we OBTAIN data when authors say we cannot
journals may check if authors are ALLOWED to provide the data when the data are included
can we obtain data as per instructions by the authors

Availability of data

Listing of data sets

Data sources translate into datasets. Ideally, the README lists them:

list of datasets

Computational requirements

What do you need to run the analysis?

Computers
Software
Time

Computational requirements

You will need to figure out if you can do it

Can you run on your laptop?
Do we need more resources?

Portions of the code were last run on a 12-node AWS R3 cluster, consuming 20,000 core-hours.

The code

This should provide some details, but ideally:

explain summarily what the code does
might explain in detail what the code does

Instructions

Warning

In many of the READMEs you will see, not everything is as clear as what we just outlined.

Follow-up

Things you can do on your own:

Write your own README
How to run your software
Try out a cloud service
Versioning (using Github and creating releases)
Licensing
Depositing a replication package

Follow-up

Overview

Template README

Published in December 2020

Overview

Overview

Data and Code Availability Statement (DCAS)

Data and Code Availability Statement (DCAS)

Computational Requirements

Computational Requirements

Computational Requirements

Details of the README

Expectations

More details

More details

Rights and licenses

Rights and licenses

Availability of data

Listing of data sets

Computational requirements

Computational requirements

Computational requirements

The code

The code

Instructions

Warning

Warning

Next

Follow-up