Various helpful scripts#
All of the following scripts are either made available in bash
when you run the bash setup in the $HOME/bin
directory, or are available in the tools/
folder in each repository.
Note
If you do not see a script referenced here, or the script does not behave as intended,
git commit
all changes, andgit push
themRun the
Refresh repository
script in Bitbucket.
Data Download and Synchronization Tools#
download_box_private.py#
Python script for downloading files from private Box folders using JWT authentication.
download_dv.py#
Python script for downloading complete datasets from Dataverse repositories as ZIP archives using DOI.
download_openicpsr-private.py#
Python script for downloading files from private (unpublished) openICPSR deposits with authentication.
download_osf.sh#
Bash script for downloading all files and directories from Open Science Framework (OSF) projects.
download_zenodo_draft.py#
Python script for downloading files from Zenodo draft deposits that require authentication.
download_zenodo_public.sh#
Bash script for downloading files from public Zenodo repositories using zenodo_get tool.
list_box_files.py#
Lists files from a private Box folder using JWT authentication and outputs results to a text file.
Links: Source | Help
sync-codeocean.sh#
Synchronizes CodeOcean capsules with local repositories, maintaining both live Git clones and static copies.
File Format Conversion Tools#
convert_eps.sh#
Bash script that recursively converts EPS (Encapsulated PostScript) files to PNG format using ImageMagick.
Links: Source | Help
convert_graphs.do#
Stata script that converts GPH graph files to PDF and PNG formats.
Links: Source | Help
csv2md.py#
Python tool for converting arbitrary CSV files to Markdown format.
Links: Source | Help
matlab_convert_fig.m#
MATLAB script that converts .fig files to PNG format, processing all figure files in the current directory.
matlab_convert_mat2csv.m#
MATLAB script that converts .mat files to CSV format, extracting all variables as separate CSV files.
mk_tex_table.sh#
Converts standalone LaTeX table files to complete PDF documents with comprehensive formatting packages.
Tools to check for various things#
These are usually not used directly, but run by the Pipelines.
Stata_scan_code/#
Directory containing Stata code scanning tools and packages for analyzing Stata scripts and dependencies.
Links: Source | Help
scan_pkg.jl#
Julia package scanner that identifies and lists packages used in Julia files via using
and import
statements.
Links: Source | Help
check_r_deps.R#
R script that finds and outputs all R package dependencies as CSV from a project directory.
Links: Source | Help
check_rds_files.R#
R script for checking RDS (R data files), designed to run automatically without manual changes.
Links: Source | Help
install.R#
R package installation utility with version control; provides pkgTest()
function to install and require packages.
Links: Source | Help
summarize_data.py#
Data summarization script that reads CSV metadata and calculates total bytes by directory level.
Links: Source | Help
Ad-hoc Data Analysis and Comparison Tools#
compare_manifests.py#
Python script that compares two SHA256 manifest files to identify overlaps in filenames, checksums, and complete records.
Links: Source | Help
generate_png_diff.sh#
Generates visual diffs for modified PNG images by comparing them against their git repository versions.
summarize_diff_stats.py#
Parses and summarizes statistical differences from files, extracting numerical values and filenames.
Links: Source | Help
Pipeline and Workflow Tools#
pipeline-steps1-4.sh#
Combined pipeline script that handles multiple steps of the openICPSR download process.
Links: Source | Help
run_scanner.sh#
Runs Stata code scanner on ICPSR directory, reads configuration and executes scanning operations.
Links: Source | Help
sbatch-shell.sh#
SLURM batch job script template for running Stata jobs on HPC clusters with resource specifications.
Links: Source | Help
Configuration and Setup Tools#
linux-system-info.sh#
System information collector that displays OS details, processor info, and memory availability.
Links: Source | Help
update_tools.sh#
Tool updater that downloads latest replication template files from GitHub and copies them to template directory.
Links: Source | Help
Document Processing Tools#
prepare-revision.py (inactive)#
Processes Markdown files by replacing code block content in Appendix sections while maintaining headers.
Links: Source | Help
Configuration Files#
requirements-scanner.txt#
Python requirements file for scanner tools.
Links: Source
requirements.txt#
Python requirements file for general tools.
Links: Source
template.tex#
LaTeX template file for document generation.
Links: Source