World Bank Reproducible Research Repository Downloads#
Note
Based on Version 1.0.0
The download_worldbank.py
script automatically downloads replication packages, verification reports, and documentation from the World Bank Reproducible Research Repository.
Overview#
This script resolves World Bank DOIs to download three key components:
README.pdf - Project documentation
Reproducibility verification report - Quality assessment document
Replication package - Complete code and data archive (automatically extracted)
Download files from World Bank Reproducible Research Repository
positional arguments:
doi_or_id World Bank repository identifier (DOI suffix, DOI, or DOI URL)
options:
-h, --help show this help message and exit
--output OUTPUT Output directory (default: current directory)
--dry-run Show what would be downloaded without actually downloading
--version show program's version number and exit
Installation#
The script requires Python 3.6+ with the requests
library:
pip install requests
Usage#
Basic Usage#
(Assumes python3
is the command for Python 3 on your system; replace with python
as needed)
# Download using DOI suffix
python3 tools/download_worldbank.py azav-8915
# Download using full DOI
python3 tools/download_worldbank.py https://doi.org/10.60572/azav-8915
# Download using DOI without URL
python3 tools/download_worldbank.py 10.60572/azav-8915
Options#
# Preview what will be downloaded (dry run)
python3 tools/download_worldbank.py azav-8915 --dry-run
# Download to specific directory
python3 tools/download_worldbank.py azav-8915 --output /path/to/downloads
# Show help
python3 tools/download_worldbank.py --help
Output Structure#
Downloads are organized in a directory named wb-[DOI_SUFFIX]/
:
wb-azav-8915/
├── README.pdf # Project documentation
├── reproducibility-wb-azav-8915.2025-08-28.pdf # Verification report (dated)
├── wb-azav-8915.zip # Original replication package
└── PP_WLD_2020_PRWP-9501_prg_v01/ # Extracted replication package
├── code/
├── data/
└── ...
Examples#
Example 1: Basic Download#
$ python3 tools/download_worldbank.py azav-8915
🏷️ DOI suffix: azav-8915
📂 Output directory: wb-azav-8915
🌐 Resolving DOI: https://doi.org/10.60572/azav-8915
🔄 Redirect 1: https://doi.org/10.60572/azav-8915 -> https://reproducibility.worldbank.org/index.php/catalog/study/PP_WLD_2020_PRWP-9501_v01
✅ Final URL: https://reproducibility.worldbank.org/index.php/catalog/study/PP_WLD_2020_PRWP-9501_v01
🔍 Detected study URL format, checking for HTTP refresh redirect...
✅ Found HTTP Refresh header: 0;url=https://reproducibility.worldbank.org/index.php/catalog/39
🔄 Following HTTP refresh to: https://reproducibility.worldbank.org/index.php/catalog/39
✅ Successfully redirected to catalog: https://reproducibility.worldbank.org/index.php/catalog/39
📦 Catalog ID: 39
🔍 Found download IDs: ['81', '85']
📦 Found 2 file(s) to download:
⬇️ Downloading: README.pdf
✅ Downloaded: 180.4 KB of 180.4 KB (100.0%)
⬇️ Downloading: wb-azav-8915.zip
✅ Downloaded: 58.4 MB of 58.4 MB (100.0%)
📦 Unzipping: wb-azav-8915.zip
✅ Unzipped to: wb-azav-8915
🎉 Download process completed!
✅ Successfully downloaded: 2 files
Example 2: Dry Run Preview#
$ python3 tools/download_worldbank.py 101y-vn15 --dry-run
📦 Found 2 file(s) to download:
1. Download ID 892: zip (329428 bytes)
Content-Disposition: attachment; filename="README.pdf"
2. Download ID 895: zip (202086235 bytes)
Content-Disposition: attachment; filename="PP_CIV_2025_368.zip"
🔍 Dry run completed. No files were downloaded.
Technical Details#
DOI Resolution Process#
The script follows this resolution chain:
DOI Redirect:
https://doi.org/10.60572/azav-8915
→ Study URLStudy URL:
/catalog/study/PP_WLD_2020_PRWP-9501_v01
(returns HTTP Refresh header)HTTP Refresh:
0;url=https://reproducibility.worldbank.org/index.php/catalog/39
Catalog Page:
/catalog/39
(contains download links)
File Type Detection#
The script identifies files using HTTP Content-Disposition headers and content types:
README files: Detected by filename containing “README” → saved as
README.pdf
Replication packages: ZIP files → saved as
wb-[DOI_SUFFIX].zip
and extractedVerification reports: Additional PDFs → saved with date suffix
Progress Indicators#
Spinner animation during network operations (disabled in CI environments)
Real-time download progress with percentage and transfer rates
File processing status for each download and extraction
Supported Input Formats#
Format |
Example |
Description |
---|---|---|
DOI Suffix |
|
Just the unique identifier |
DOI |
|
Standard DOI format |
DOI URL |
|
Full URL |
Error Handling#
The script handles common issues gracefully:
Missing files: Continues downloading other files if one fails
Network timeouts: Retries with appropriate delays
Invalid DOIs: Clear error messages with expected formats
Existing directories: Prevents overwriting (must remove manually)
Integration#
Continuous Integration#
The script detects CI environments via the CI
environment variable and:
Disables animated progress spinners
Uses simplified output formatting
Maintains full download functionality
Git Integration#
For automated workflows, the script can be integrated with git operations:
# Download and add to git
python3 tools/download_worldbank.py azav-8915
git add wb-azav-8915/
git commit -m "Add World Bank replication package azav-8915"
Troubleshooting#
Common Issues#
Directory already exists
# Error: wb-azav-8915 already exists - please remove prior to downloading
rm -rf wb-azav-8915
python3 tools/download_worldbank.py azav-8915
Network connectivity issues
Check internet connection
Try again after a few minutes (temporary server issues)
Use
--dry-run
to test DOI resolution without downloading
Invalid DOI format
# Error: Could not extract DOI suffix from: invalid-doi
# Use one of these formats:
python3 tools/download_worldbank.py azav-8915 # DOI suffix
python3 tools/download_worldbank.py 10.60572/azav-8915 # DOI
python3 tools/download_worldbank.py https://doi.org/10.60572/azav-8915 # DOI URL
Limitations#
Internet connection required for DOI resolution and downloads
Disk space: Replication packages can be large (50MB - 200MB+ typical)
World Bank specific: Only works with World Bank Reproducible Research Repository DOIs (10.60572/*)
Version Information#
This script is part of the replication template toolkit and follows semantic versioning. Check the repository for the latest version and updates.