download_box_private.py - Download files from private Box folders#
Description#
This script downloads files from a private Box folder using JWT authentication. It’s designed for secure access to private research data stored in Box with proper authentication credentials.
Usage#
python tools/download_box_private.py [SUBFOLDER]
Arguments#
SUBFOLDER (optional) - Subfolder identifier (downloads from ‘aearep-SUBFOLDER’). This will be the tag of the main Jira ticket, such as aearep-1234. If empty, will be deduced from the current directory name.
Example#
python tools/download_box_private.py 1234 # Download from subfolder 'aearep-1234'
or
cd /path/to/aearep-1234
python tools/download_box_private.py # Download from subfolder 'aearep-1234'
Requirements#
Python >= 3.9
boxsdk: Box Python SDKValid Box JWT application credentials
Installing Box SDK#
This can be done through conda, or in pip, or pipx.
The above dependencies can be installed by executing
pip install -r requirements.txt
in any recent Bitbucket repository updated with “tools” newer than June 7, 2025.
Ideally, these should be installed in your main Python environment, since you will be re-using this regularly. You can also install in a virtual environment.
If using conda, you can install boxsdk with:
conda install boxsdk
conda install boxsdk[jwt]
Currently, Box is updating
boxsdktobox-sdk-gen. This script will work only withboxsdk, not the newerbox-sdk-gen.
Using Right Credentials#
To permantently set the proper credentials on BioHPC, you can modify your ~/.bashrc profile, to include the box environmental variables. These values (or the (SECRETURL)) can be obtained from a supervisor.
Download Environment Variables File#
Download the environment variable setup file onto BioHPC:
wget (SECRETURL) -O ~/envvars.txt
The first time, adjust your ~/.bashrc to read the envvars.txt:
echo "source ~/envvars.txt" >> ~/.bashrc
Then, load the variables into your current session:
source ~/.bashrc
This ensures that all required Box environment variables are available in your session.
Python Environment Setup#
To run the script, you need a newer version of Python (>= 3.10). There are two ways.
You can use standard Python on BioHPC compute nodes. Get a terminal through Slurm:
srun --pty bash -l
Load the necessary modules, and install the necessary packages:
module load python/3.12.7
cd /path/to/aearep-1234 # adjust to the project at hand
pip install -r requirements.txt
You will need to run this every time from a compute node, as the more recent Python versions are not available on the login nodes.
Create a Conda Environment (from BioHPC login node)
To avoid conflicts with existing Python installations, create a dedicated conda environment:
Load conda if needed
module load anaconda # or follow BioHPC-specific instructions to enable conda
Create environment with Python 3.11
conda create --name download python=3.11
Activate the environment
You will need to do this every time after you log in, if you intend to use the script:
conda activate download
Install Box SDK
Follow the conda install instructions from above to install the necessary packages.
⚠️ Important: Some newer versions of boxsdk are incompatible. If you get import errors (No module named ‘boxsdk’), uninstall the current version and install a compatible version:
conda uninstall boxsdk
conda install -c conda-forge boxsdk=3.14.0
Install Additional Dependencies
If you encounter authentication errors related to JWT, install pyjwt:
pip install pyjwt
Environment Variables#
Required for authentication:
BOX_FOLDER_PRIVATE - Box folder ID to download from
BOX_PRIVATE_KEY_ID - Box JWT public key ID
BOX_ENTERPRISE_ID - Box enterprise ID
BOX_CLIENT_ID - Box client ID (optional if using config file)
BOX_CLIENT_SECRET - Box client secret (optional if using config file)
Optional configuration:
BOX_CONFIG_PATH - Directory containing the Box config file
BOX_OUTPUT_DIR - Directory to download files to (default: ./restricted)
BOX_PRIVATE_JSON - Base64 encoded content of the Box config JSON file
Base64 Configuration#
To convert the entire Box configuration as a base64-encoded string:
# Generate base64 config
cat config.json | base64
# Set environment variable
export BOX_PRIVATE_JSON="base64_encoded_string_here"
Authentication Methods#
1. Environment Variables#
Set individual Box API credentials as environment variables.
2. Config File#
Place Box JWT configuration file in the directory specified by BOX_CONFIG_PATH.
3. Base64 Encoded Config#
Provide entire configuration as base64-encoded string in BOX_PRIVATE_JSON.
Features#
JWT Authentication: Secure access using Box JWT authentication
Flexible Configuration: Multiple authentication methods for different environments
Organized Downloads: Downloads to organized folder structure
Error Handling: Comprehensive error handling for API failures
Logging: Configurable logging for debugging and monitoring
Output Structure#
restricted/ # Default output directory (configurable)
└── # Downloaded files from specified subfolder
├── file1.txt
├── file2.pdf
└── ...
Box Application Setup#
To use this script, a technical person / supervisor needs to set up the following:
Box Developer Account: Create at https://developer.box.com/
JWT Application: Create a new JWT application in Box Developer Console
Authentication Keys: Download the JWT configuration file
Folder Access: Ensure the application has access to the target folder
Security Considerations#
Store JWT credentials securely
Use environment variables in production
Limit application permissions to necessary scopes
Regularly rotate authentication credentials