Based on an earlier presentation and tutorial at the Cornell Day of Data 2021.
2021-10-29
Based on an earlier presentation and tutorial at the Cornell Day of Data 2021.
Part 1:
Part 2:
Part 3:
Part 4:
git
git
Structure your project
Version your project (git
)!
Track metadata
/inputs /outputs /code /paper
/datos/ /brutos /limpiados /finales /codigo /articulo
It doesn’t really matter, as long as it is logical. We will get to how this translates to confidential or big data in a moment!
It might be “Future You!”
Use programming-language specific code as much as possible
Avoid
system("unzip C:\data\myfile.zip")
or
shell unzip "C:\data\myfile.zip"
Most languages have appropriate code:
R:
unzip(zipfile, files = NULL, list = FALSE, overwrite = TRUE, junkpaths = FALSE, exdir = ".", unzip = "internal", setTimes = FALSE)
Stata:
unzipfile "zipfile.zip" [, replace]
Use neutral pathnames (mostly forward slashes)
R: Use functions to combine paths (and/or use forward slashes), packages to make code more portable.
basepath <- rprojroot::find_root(rprojroot::has_file("README.md")) data <- read.dta(file.path(basepath,"path","data.dta"))
Stata: always use forward slashes, even on Windows
global data "/my/computer" use "$data/path/data.dta"
This may no longer work:
/datos/ /brutos /limpiados /finales /codigo /articulo
/proyecto/ /datos/ /brutos /limpiados /finales /codigo /articulo /secretos (read-only) /impuestos (read-only) /salarios (read-only)
File structure thus becomes more complex, but fundamentally not so different:
global taxdata "/secretos/impuestos" global salarydata "/secretos/salarios" global outputdata "/proyecto/datos/limpiados" // this is where you would write the data you create in this project global results "/proyecto/articulo" // All tables for inclusion in your paper go here global programs "/proyecto/codigo" // All programs (which you might "include") are to be found here
Follow the lesson learned here and create a basic project structure
Did that work?
Did that work?
Once you are done, at most one line can be changed to make it run!
Do you think your code will work on somebody else’s computer or in the cloud?