download_box_private.py - Download files from private Box folders#

Description#

This script downloads files from a private Box folder using JWT authentication. It’s designed for secure access to private research data stored in Box with proper authentication credentials.

Usage#

python tools/download_box_private.py [SUBFOLDER] 

Arguments#

  • SUBFOLDER (optional) - Subfolder identifier (downloads from ‘aearep-SUBFOLDER’). This will be the tag of the main Jira ticket, such as aearep-1234. If empty, will be deduced from the current directory name.

Example#

python tools/download_box_private.py 1234  # Download from subfolder 'aearep-1234'

or

cd /path/to/aearep-1234
python tools/download_box_private.py   # Download from subfolder 'aearep-1234'

Requirements#

  • Python >= 3.9

  • boxsdk: Box Python SDK

  • Valid Box JWT application credentials

Installing Box SDK#

  • This can be done through conda, or in pip, or pipx.

  • The above dependencies can be installed by executing

pip install -r requirements.txt

in any recent Bitbucket repository updated with “tools” newer than June 7, 2025.

Ideally, these should be installed in your main Python environment, since you will be re-using this regularly. You can also install in a virtual environment.

If using conda, you can install boxsdk with:

conda install boxsdk
conda install boxsdk[jwt]
  • Currently, Box is updating boxsdk to box-sdk-gen. This script will work only with boxsdk, not the newer box-sdk-gen.

Using Right Credentials#

To permantently set the proper credentials on BioHPC, you can modify your ~/.bashrc profile, to include the box environmental variables. These values (or the (SECRETURL)) can be obtained from a supervisor.

Download Environment Variables File#

Download the environment variable setup file onto BioHPC:

wget (SECRETURL) -O ~/envvars.txt

The first time, adjust your ~/.bashrc to read the envvars.txt:

echo "source ~/envvars.txt" >> ~/.bashrc

Then, load the variables into your current session:

source ~/.bashrc

This ensures that all required Box environment variables are available in your session.

Python Environment Setup#

To run the script, you need a newer version of Python (>= 3.10). There are two ways.

You can use standard Python on BioHPC compute nodes. Get a terminal through Slurm:

srun --pty  bash -l

Load the necessary modules, and install the necessary packages:

module load python/3.12.7
cd /path/to/aearep-1234 # adjust to the project at hand
pip install -r requirements.txt

You will need to run this every time from a compute node, as the more recent Python versions are not available on the login nodes.

Create a Conda Environment (from BioHPC login node)

To avoid conflicts with existing Python installations, create a dedicated conda environment:

Load conda if needed

module load anaconda  # or follow BioHPC-specific instructions to enable conda

Create environment with Python 3.11

conda create --name download python=3.11

Activate the environment

You will need to do this every time after you log in, if you intend to use the script:

conda activate download

Install Box SDK

Follow the conda install instructions from above to install the necessary packages.

⚠️ Important: Some newer versions of boxsdk are incompatible. If you get import errors (No module named ‘boxsdk’), uninstall the current version and install a compatible version:

conda uninstall boxsdk
conda install -c conda-forge boxsdk=3.14.0

Install Additional Dependencies

If you encounter authentication errors related to JWT, install pyjwt:

pip install pyjwt

Environment Variables#

Required for authentication:

  • BOX_FOLDER_PRIVATE - Box folder ID to download from

  • BOX_PRIVATE_KEY_ID - Box JWT public key ID

  • BOX_ENTERPRISE_ID - Box enterprise ID

  • BOX_CLIENT_ID - Box client ID (optional if using config file)

  • BOX_CLIENT_SECRET - Box client secret (optional if using config file)

Optional configuration:

  • BOX_CONFIG_PATH - Directory containing the Box config file

  • BOX_OUTPUT_DIR - Directory to download files to (default: ./restricted)

  • BOX_PRIVATE_JSON - Base64 encoded content of the Box config JSON file

Base64 Configuration#

To convert the entire Box configuration as a base64-encoded string:

# Generate base64 config
cat config.json | base64

# Set environment variable
export BOX_PRIVATE_JSON="base64_encoded_string_here"

Authentication Methods#

1. Environment Variables#

Set individual Box API credentials as environment variables.

2. Config File#

Place Box JWT configuration file in the directory specified by BOX_CONFIG_PATH.

3. Base64 Encoded Config#

Provide entire configuration as base64-encoded string in BOX_PRIVATE_JSON.

Features#

  • JWT Authentication: Secure access using Box JWT authentication

  • Flexible Configuration: Multiple authentication methods for different environments

  • Organized Downloads: Downloads to organized folder structure

  • Error Handling: Comprehensive error handling for API failures

  • Logging: Configurable logging for debugging and monitoring

Output Structure#

restricted/                 # Default output directory (configurable)
└──                         # Downloaded files from specified subfolder
    ├── file1.txt
    ├── file2.pdf
    └── ...

Box Application Setup#

To use this script, a technical person / supervisor needs to set up the following:

  1. Box Developer Account: Create at https://developer.box.com/

  2. JWT Application: Create a new JWT application in Box Developer Console

  3. Authentication Keys: Download the JWT configuration file

  4. Folder Access: Ensure the application has access to the target folder

Security Considerations#

  • Store JWT credentials securely

  • Use environment variables in production

  • Limit application permissions to necessary scopes

  • Regularly rotate authentication credentials