Cluster Execution#

This tutorial demonstrates how to scale template matching to large datasets. When processing dozens or hundreds of tomograms, manually creating and submitting individual jobs becomes impractical. The pytme_runner automates this workflow by discovering datasets, generating cluster scripts, and managing job submission.

Dataset Organization#

For this tutorial, we extend the ribosome picking example to a larger dataset. Your project directory will typically look like this

project_directory/
├── tomograms/                      # Tomograms
│   ├── TS_037_10.00Apx.rec
│   ├── TS_041_10.00Apx.rec
│   └── TS_045_10.00Apx.rec
├── metadata/                       # Metadata files
│   ├── TS_037.mdoc                 # Can also be Warp/M XMLs
│   ├── TS_041.mdoc                 # or tomostar STAR files
│   └── TS_045.mdoc
├── masks/                          # Optional tomogram masks
│   ├── TS_037_mask.mrc
│   ├── TS_041_mask.mrc
│   └── TS_045_mask.mrc
└── templates/
    ├── emd_3228_resampled.mrc      # 80S ribosome template
    └── emd_3228_resampled_mask.mrc # 80S ribosome mask

The batch runner automatically extracts tomogram identifiers by removing technical suffixes like pixel size information (_10.00Apx) and matches files across directories.

Basic Batch Processing#

The batch processing command identifies all tomograms and metadata files using glob patterns

pytme_runner \
    --tomograms "project_directory/tomograms/*.rec" \
    --metadata "project_directory/metadata/*.mdoc" \
    --template templates/emd_3228_resampled.mrc \
    --template-mask templates/emd_3228_resampled_mask.mrc \
    --particle-diameter 300 \
    --output-dir ribosome_batch_001/results \
    --script-dir ribosome_batch_001/scripts \
    --dry-run

Note

The quotation marks are required for parsing of glob patterns. If your tomogram names end with .mrc, you would adapt the glob pattern to "project_directory/tomograms/*.mrc".

This command will

  1. Discover all .rec files in the tomograms directory

  2. Match each tomogram with its corresponding .mdoc metadata file

  3. Generate individual SLURM scripts for each valid pair

The generated scripts can be submitted manually, or automatically through the runner by omitting the --dry-run flag.

The scripts generated by the runner will generally follow this pattern

#!/bin/bash

# SLURM directives
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=256G
#SBATCH --time=08:00:00
#SBATCH --partition=gpu-el8
#SBATCH --gres=gpu:1
#SBATCH --job-name=pytme_TS_037

# Environment setup
module load pyTME

# Run template matching
match_template \
    --target project_directory/tomograms/TS_037_10.00Apx.rec \
    --template templates/emd_3228_resampled.mrc \
    --template-mask templates/emd_3228_resampled_mask.mrc \
    --output results/TS_037/TS_037_10.00Apx.pickle \
    --particle-diameter 300 \
    --tilt-angles project_directory/metadata/TS_037.mdoc \
    --amplitude-contrast 0.08 \
    --spherical-aberration 27000000.0 \
    --acceleration-voltage 300

Note

Currently only SLURM scripts are supported. Feel free to open an issue if you require a different architecture, or create one yourself by inheriting from the ExecutionBackend defined in pytme_runner.

Advanced Processing Options#

For production runs, you may want to include additional filters similar to those described in the ribosome picking tutorial

pytme_runner \
    --tomograms "project_directory/tomograms/*.rec" \
    --metadata "project_directory/metadata/*.mdoc" \
    --masks "project_directory/masks/*mask.mrc" \
    --template templates/emd_3228_resampled.mrc \
    --template-mask templates/emd_3228_resampled_mask.mrc \
    --particle-diameter 300 \
    --lowpass 40 \
    --tilt-weighting relion \
    --whiten-spectrum \
    --amplitude-contrast 0.08 \
    --spherical-aberration 2.7 \
    --voltage 300 \
    --cpus 8 \
    --memory 256 \
    --gpu-count 1 \
    --time-limit "08:00:00" \
    --output-dir results/ribosome_batch_001

Compared to the basic run above, this now includes

  • Tomogram masks to exclude problematic regions

  • Lowpass filtering to 40 Ångstrom

  • Missing wedge correction with RELION-style tilt weighting

  • Spectral whitening to enhance weak signals

  • Resource specifications appropriate for your cluster

Tip

You can also switch between compute backends via --backend. By default, the runner will use cupy.

Processing Subsets#

To process only specific tomograms, create a list file

# Create tomogram selection
echo "TS_037" > selected_tomos.txt
echo "TS_041" >> selected_tomos.txt

# Process only selected tomograms
pytme_runner \
    --tomograms "project_directory/tomograms/*.rec" \
    --metadata "project_directory/metadata/*" \
    --template templates/emd_3228_resampled.mrc \
    --tomo-list selected_tomos.txt \
    --particle-diameter 300

Output Structure#

Results are organized in the following manner

ribosome_batch_001/results
├── TS_037_10.00Apx.pickle # Template matching results
├── TS_037_12345.out       # SLURM logs
├── TS_041_10.00Apx.pickle
├── TS_041_12346.out
├── TS_045_10.00Apx.pickle
└── TS_045_12347.out

Mixed Formats#

You can mix formats by adapting the glob patterns. For instance for metadata

pytme_runner \
    --tomograms "project_directory/tomograms/*.rec" \
    --metadata "project_directory/metadata/*" \
    --template templates/emd_3228_resampled.mrc \
    --particle-diameter 300

The metadata/* pattern will match .mdoc, .xml, .star, and other supported formats, automatically pairing each tomogram with its corresponding metadata file. However, note that when multiple metadata files exist for a given tomogram, the runner will default to the first one it encountered.

Monitoring Progress#

Use standard SLURM commands to monitor your batch jobs:

# Check all your jobs
squeue --me

# Count running/pending jobs
squeue --me | grep pytme | wc -l

# Check specific job details
scontrol show job 12345

# Monitor resource usage
sacct -j 12345 --format=JobID,JobName,MaxRSS,Elapsed

Environment Configuration#

Different clusters require different environment setups. Configure this for your specific cluster

# Using environment modules (default)
pytme_runner --environment-setup "module load pyTME; export \$PYTHONPATH" ...

# Using conda environments
pytme_runner --environment-setup "source ~/.bashrc; conda activate pytme_env" ...

# Complex setup with GPU modules
pytme_runner --environment-setup "module load CUDA/11.7; conda activate pytme" ...