| Time | April 11, 2026 (Saturday) |
|---|---|
| 8:00 | Breakfast |
| 9:00 | Introduction (with World Bank intro) |
| 10:00 | Reproducible Practices, Template README |
| 11:00 | Data provenance, Data Citations |
| 12:00 | Lunch Break |
| 13:00 | What will you be doing in the Lab |
| 14:00 | Command Line/Git/Markdown/Version control |
| 15:00 | A prototypical replication report |
| 16:00 | A walkthrough of the workflow |
| 17:00 | How to run Stata code |
| 18:00 | Restaurant |
Part 1:
While we are assessing reproducibility of others, our work must be reproducible as well.
git. We will show you how.REPLICATION.md)Here’s a few generic guidelines for researchers. You will be on the lookout for these things!
Structure your project
Version your project (git)!
Track metadata
/inputs
/outputs
/code
/paper
/datos/
/brutos
/limpiados
/finales
/codigo
/articulo
It doesn’t really matter, as long as it is logical. We will get to how this translates to confidential or big data in a moment!
It might be “Future You!”
The replicator is the first (?) reader of the instructions who will need to reproduce the analysis.
$CONFDATA, $TABLES, $CODE)Use programming-language specific code as much as possible
Avoid
or
Most languages have appropriate code:
R:
Stata:
Use neutral pathnames (mostly forward slashes)
R: Use functions to combine paths (and/or use forward slashes), packages to make code more portable.
basepath <- rprojroot::find_root(rprojroot::has_file("README.md"))
data <- read.dta(file.path(basepath,"path","data.dta"))
Stata: always use forward slashes, even on Windows
global rootdir "/my/computer"
use "$rootdir/path/data.dta"
Use neutral pathnames (mostly forward slashes)
R: Use functions to combine paths (and/or use forward slashes), packages to make code more portable.
basepath <- rprojroot::find_root(rprojroot::has_file("README.md"))
data <- read.dta(file.path(basepath,"path","data.dta"))
Stata: always use forward slashes, even on Windows
global rootdir : pwd
use "$rootdir/path/data.dta"
TIER Protocol again
Simplified replication package structure
Data are big
Confidential data
Confidential data in enclaves
This may no longer work:
/data/
/raw
/clean
/final
/code
/article
But this might
/project123/
/data/
/raw
/clean
/final
/code
/article
/confidential (read-only)
/taxes (read-only)
/wages (read-only)
File structure thus becomes more complex, but fundamentally not so different:
global taxdata "/confidential/taxes"
global salarydata "/confidential/wages"
global outputdata "/project/data/clean" // this is where you would write the data you create in this project
global results "/project/article" // All tables for inclusion in your paper go here
global programs "/project/code" // All programs (which you might "include") are to be found hereOr even more robust:
global rootdir "/project123"
global confbase "/data/provided"
global project "$rootdir/project"
global taxdata "$confbase/taxes"
global salarydata "$confbase/wages"
global outputdata "$project/data/clean" // this is where you would write the data you create in this project
global results "$project/article" // All tables for inclusion in your paper go here
global programs "$project/code" // All programs (which you might "include") are to be found here