Various helpful scripts#

All of the following scripts are either made available in bash when you run the bash setup in the $HOME/bin directory, or are available in the tools/ folder in each repository.

Note

If you do not see a script referenced here, or the script does not behave as intended,

  1. git commit all changes, and git push them

  2. Run the Refresh repository script in Bitbucket.

Data Download and Synchronization Tools#

download_box_private.py#

Python script for downloading files from private Box folders using JWT authentication.

Links: Source | Help

download_dv.py#

Python script for downloading complete datasets from Dataverse repositories as ZIP archives using DOI.

Links: Source | Help

download_openicpsr-private.py#

Python script for downloading files from private (unpublished) openICPSR deposits with authentication.

Links: Source | Help

download_osf.sh#

Bash script for downloading all files and directories from Open Science Framework (OSF) projects.

Links: Source | Help

download_zenodo_draft.py#

Python script for downloading files from Zenodo draft deposits that require authentication.

Links: Source | Help

download_zenodo_public.sh#

Bash script for downloading files from public Zenodo repositories using zenodo_get tool.

Links: Source | Help

list_box_files.py#

Lists files from a private Box folder using JWT authentication and outputs results to a text file.

Links: Source | Help

sync-codeocean.sh#

Synchronizes CodeOcean capsules with local repositories, maintaining both live Git clones and static copies.

Links: Source | Help

File Format Conversion Tools#

convert_eps.sh#

Bash script that recursively converts EPS (Encapsulated PostScript) files to PNG format using ImageMagick.

Links: Source | Help

convert_graphs.do#

Stata script that converts GPH graph files to PDF and PNG formats.

Links: Source | Help

csv2md.py#

Python tool for converting arbitrary CSV files to Markdown format.

Links: Source | Help

matlab_convert_fig.m#

MATLAB script that converts .fig files to PNG format, processing all figure files in the current directory.

Links: Source | Help

matlab_convert_mat2csv.m#

MATLAB script that converts .mat files to CSV format, extracting all variables as separate CSV files.

Links: Source | Help

mk_tex_table.sh#

Converts standalone LaTeX table files to complete PDF documents with comprehensive formatting packages.

Links: Source | Help

Tools to check for various things#

These are usually not used directly, but run by the Pipelines.

Stata_scan_code/#

Directory containing Stata code scanning tools and packages for analyzing Stata scripts and dependencies.

Links: Source | Help

scan_pkg.jl#

Julia package scanner that identifies and lists packages used in Julia files via using and import statements.

Links: Source | Help

check_r_deps.R#

R script that finds and outputs all R package dependencies as CSV from a project directory.

Links: Source | Help

check_rds_files.R#

R script for checking RDS (R data files), designed to run automatically without manual changes.

Links: Source | Help

install.R#

R package installation utility with version control; provides pkgTest() function to install and require packages.

Links: Source | Help

summarize_data.py#

Data summarization script that reads CSV metadata and calculates total bytes by directory level.

Links: Source | Help

Ad-hoc Data Analysis and Comparison Tools#

compare_manifests.py#

Python script that compares two SHA256 manifest files to identify overlaps in filenames, checksums, and complete records.

Links: Source | Help

generate_png_diff.sh#

Generates visual diffs for modified PNG images by comparing them against their git repository versions.

Links: Source | Help

summarize_diff_stats.py#

Parses and summarizes statistical differences from files, extracting numerical values and filenames.

Links: Source | Help

Pipeline and Workflow Tools#

pipeline-steps1-4.sh#

Combined pipeline script that handles multiple steps of the openICPSR download process.

Links: Source | Help

run_scanner.sh#

Runs Stata code scanner on ICPSR directory, reads configuration and executes scanning operations.

Links: Source | Help

sbatch-shell.sh#

SLURM batch job script template for running Stata jobs on HPC clusters with resource specifications.

Links: Source | Help

Configuration and Setup Tools#

linux-system-info.sh#

System information collector that displays OS details, processor info, and memory availability.

Links: Source | Help

update_tools.sh#

Tool updater that downloads latest replication template files from GitHub and copies them to template directory.

Links: Source | Help

Document Processing Tools#

prepare-revision.py (inactive)#

Processes Markdown files by replacing code block content in Appendix sections while maintaining headers.

Links: Source | Help

Configuration Files#

requirements-scanner.txt#

Python requirements file for scanner tools.

Links: Source

requirements.txt#

Python requirements file for general tools.

Links: Source

template.tex#

LaTeX template file for document generation.

Links: Source