2023-09-26

The Plan

9:00

  • Konnichi wa こんにちは
  • Overview of reproducibility and replicability in economics, including replication packages (presentation)

10:00-10:50 … Let’s make a replication package (hands-on using a simple example, from directory structure to (almost) deposit at a trusted repository) 0

11:00-11:50 … Principles and structure of replication packages (including README files) 1 2 For confidential data: 1b

11:50-13:00 … Lunch time

13:00-13:50 … Foundations of reproducible programming practices (in any programming language) 3 4

14:00-14:50 … Continuation and open discussion about specific examples (if you have one, bring your own project!) 5

15:00-15:50 … Discussions on institutional support for replicability in Japanese context (including Q&A) 15:50-16:00 … Closing remarks

Environments

What is an environment?

An environment is a software-specific restricted area where you can run code. It overlaps with, but is not identical to the project directory (some environments are defined within the project, but stored outside).

Stata

Back in the Stata part, we created an environment, by redefining the Stata adopath list of search paths:

// Remove those unnecessary places
adopath - OLDPLACE
adopath - PERSONAL
// Add a project specific one, by repurposing the PLUS directory
cap mkdir "${rootdir}/adofiles"
sysdir set PLUS "${rootdir}/adofiles"

R

Various methods exist.

  • renv() is a popular one.
  • The basic search path is defined by .libPaths(), and can be defined independently of renv.
  • install.packages() is discouraged, as remotes::install_version() is preferred.

Python

Various methods exist:

  • if using Anaconda, conda is the preferred method of managing environments, see 1. conda usually uses environment.yml files to define environments.
  • if using native Python, venv is the preferred method of managing environments, see 2. venv usually uses requirements.txt files to define environments, in conjunction with pip.

Caution

In both R and Python, it is possible to overspecify the environment. Users should take care to specify only the packages used to CREATE an environment, not all the packages that were ultimately installed in the environment due to dependencies. The latter are relevant for documenting the environment, but usually have platform/OS-dependent elements, and break cross-platform compatibility.