2024-08-24

The Plan

Overview

Part 1:

  • Reproducible practices
  • The role of the template README
  • Summarizes the data sources
  • Summarizes necessary resources
  • Summarizes reproduction procedures

Template README

Published in December 2020

Overview

Overview

guides a reader through the available material and a route to replicating the results in the research paper, including

  • the description of the origins of data and/or description of programs.
  • provides a brief overview of the available material and
  • provides a brief guide as to how to proceed from beginning to end
  • then dives into the specifics.

Data and Code Availability Statement (DCAS)

Data and Code Availability Statement (DCAS)

It contains information about the sources of data used in the replication package, in addition to or instead of such detailed description in the manuscript.

  • Not just a data citation
  • describes additional information necessary for the obtention of the data.

These may include

  • required registrations,
  • memberships,
  • application procedures,
  • monetary cost, or
  • other qualifications.

Computational Requirements

Computational Requirements

For simple replication packages, may appear to be trivial (a laptop and some common software)

What if requirement is expensive commercial software and a super computer cluster?

Computational Requirements

In order to assess the complexity of the task of replicating, authors should specify each of the following elements:

  • Software used, including version number as used. If the code is expected to run with a lower version number, that should be added.
  • Any additional packages, including their version number or similar, as used.
  • The computer hardware specification as used by the author, in terms of OS, CPU generation and quantity, memory and necessary disk space. If multiple computers were used, the specification for each should be identified.
  • The wall-clock time given the provided computer hardware, expressed in appropriate units (minutes, days).

Details of the README

Expectations

The README is strongly suggested, but sometimes ignored.

You should nevertheless treat all replication packages as if they should have had the same information, easily accessible.

More details

More details

Important: The information should describe ALL data used, regardless of whether they are provided as part of the replication archive or not, and regardless of size or scope.

For instance, if using GDP deflators, the source of the deflators (e.g. at the national statistical office) should also be listed here.

Rights and licenses

Rights and licenses

  • We attempt to check if we can OBTAIN data when authors say we cannot
  • We attempt to check if authors are ALLOWED to provide the data when the data are included
  • We may obtain data as per instructions by the authors
    • At its simplest, we check that the URL works, and that the landing page provides enough information to obtain the data

Availability of data

Availability of data

For the AEA submissions, this information is also available (somewhat different) as part of the “Data and Code Availability Form” (DCAF):

extract DCAF

Listing of data sets

Data sources translate into datasets. Ideally, the README lists them:

list of datasets

Computational requirements

To some extent, the crux of the matter: what do you need to run the analysis?

  • Computers
  • Software
  • Time

Computational requirements

Computational requirements and your job

You will need to figure out if you can do it (we’ll get to that part).

  • You do not need to run on your laptop
  • You should not run on your laptop if it will take too long!

Portions of the code were last run on a 12-node AWS R3 cluster, consuming 20,000 core-hours.

The code

The code

This should provide some details, but ideally:

  • explain summarily what the code does
  • might explain in detail what the code does

Instructions

Warning

Warning

In many of the READMEs you will see, not everything is as clear as what we just outlined.

You will need to find the information.

So what if the information is not there?

What if the info is not there?

After lunch, we will talk about the report you will prepare.

Data Citations