Lars Vilhuber
2019-10-02
Cornell University
The data is obtained from a Census Bureau website.
We will NOT use the regular Zenodo; rather, we will test in the Sandbox.
Check your URL bar! There's no other indication that this is not the real Zenodo!
https://library.cfa.harvard.edu/data-archiving-and-sharing (Harvard Center for Astrophysics)
Goal 2: Be able to curate the data and code necessary for reproducible analysis
We have archived unreliable data in a reliable location. ✔️
We can now use the following information to augment the replication:
# Zenodo DOI
zenodo.prefix <- "10.5281/zenodo"
# Specific DOI - resolves to a fixed version
zenodo.id <- "2649598"
# We will recover the rest from Zenodo API
zenodo.api = "https://zenodo.org/api/records/"
We will parse the information that Zenodo gives us through an API:
# needs rjson, tidyr, dplyr
We download the metadata from the API:
download.file(paste0(zenodo.api,zenodo.id),destfile=file.path(dataloc,"metadata.json"))
We read the JSON in:
latest <- fromJSON(file=file.path(dataloc,"metadata.json"))
We get the links to the actual CSV files (and the codebook):
file.list <- as.data.frame(latest$files) %>% select(starts_with("self")) %>% gather()
We download all the csv files, by looking whether the filename has csv
in it:
for ( value in file.list$value ) {
print(value)
if ( grepl("csv",value ) ) {
print("Downloading...")
file.name <- basename(value)
download.file(value,destfile=file.path(dataloc,basename(value)))
}
}
You can now add this to your copy of the code:
Goal 3: Robustness and automation - getting close to push-button reproducibility
Goal 4: Correctly document reproducible research
And: not make your collaborators mad…
We could
We instead
zenodo
main
branchzenodo
Knit
)You can compare the changes: https://gitlab.com/larsvilhuber/jobcreationblog/compare/master…zenodo?view=parallel
We could then proceed to incorporate (pull or merge) the changes into the main
repository:
Read more about it at
The final result
DOI = 10.5281/zenodo.2649598
)Data, source document, dependencies
So far:
Still left:
Github Pages and Gitlab Pages are an easy way to publish project pages
Didn't we say those are not archives?
!(release)[images/Github_Releases_2.png]
!(release)[images/Github_Zenodo_4.png]
The final result
DOI = 10.5281/zenodo.2649598
)DOI = 10.5281/zenodo.400356
)Data, source document, dependencies
We've touched on
… because there can be a lot more