Appendix#

Keeping on top of provenance#

  • Licenses

  • Streamlining for reproducibility

Licenses#

Where does the file come from?

  • How can we describe this later to somebody?

    • Point and click is long to describe.

    • What are the rights we have?

Examples:

License applying to Geodist data

Downloading via code#

Easiest:

Stata

use "$URL" , clear

Why not?

  • Will it be there in two months? In six years?

  • What if the internet connection is down?

Easy:

Stata

global URL "https://www.cepii.fr/distance/dist_cepii.dta"
copy "$URL" (outputfile), replace

R

rootdir <- getwd()
datadir <- paste(rootdir, "data", sep = "/")

Creating a README#

  • Template README

  • Cite both dataset and working paper.

  • Add data URL and time accessed (can you think of a way to automate this?).

  • Add a link to license (also: download and store the license).

Additional training resources#

Examples of replication packages#

This textbook’s source: labordynamicsinstitute/reproducibility-confidential

Licensed under by_cc)