Core-hours ≈ Wall-clock time × Number of CPU cores
Example
A program runs 2 hours using 8 cores = 16 core-hours
# Example output from /usr/bin/time -v
Elapsed (wall clock) time: 0:02:15
User time (seconds): 120.45
System time (seconds): 12.33
Maximum resident set size (kbytes): 2048000
Understanding this relationship helps optimize resource requests
Scaling Patterns: linear vs. quadratic
# Simple scaling test
./my_program --data 2percent_sample.csv # Measure time
./my_program --data 4percent_sample.csv # Should take ~2x if linear
./my_program --data 8percent_sample.csv # Should take ~4x if linear
If 8% data takes 10× longer than 2% data, you don’t have linear scaling!
Amdahl’s Law
#!/bin/bash
dataset="my_data.csv"
for cores in 1 2 4 8 16; do
echo "Testing with $cores cores..."
/usr/bin/time -f "cores=$cores wall=%E cpu=%U+%S" \
my_parallel_program --input $dataset --threads $cores \
2>> scaling_results.log
done
Example Results
Tip
Document environment (software versions, modules)
Run end-to-end tests (include all steps)
# Basic timing with detailed output
/usr/bin/time -v my_program input.dat
# Custom format for logging
/usr/bin/time -f 'wall=%E user=%U sys=%S maxRSS=%M' my_program input.dat
Elapsed
: Wall-clock timeUser/System time
: CPU time breakdownMaximum resident set size
: Peak memory (KB)Benchmarking my code with different data sizes
Scenario: Image processing pipeline
Goal: Process 10,000 images, estimate resources needed
# Create subsets of your full dataset
mkdir test_data_5pct test_data_10pct test_data_15pct
# Randomly sample (adjust numbers for your case)
shuf -n 500 full_image_list.txt > test_data_5pct/image_list.txt
shuf -n 1000 full_image_list.txt > test_data_10pct/image_list.txt
shuf -n 1500 full_image_list.txt > test_data_15pct/image_list.txt
Tip
Use random sampling to ensure representative subsets
#!/bin/bash
module load languages/python/3.7.12
for size in 5 10 15; do
echo "=== Testing ${size}% dataset ==="
for run in {1..3}; do
echo "Run $run..."
/usr/bin/time -f "${size}pct run$run: wall=%E user=%U sys=%S maxRSS_MB=%M" \
python image_pipeline.py \
--input test_data_${size}pct/ \
--output results_${size}pct_run${run}/ \
2>> timing_results.log
done
done
# Extract timing data
grep "wall=" timing_results.log
# Results:
# 5pct run1: wall=0:02:15 user=2:01.45 sys=0:12.33 maxRSS_MB=2048
# 5pct run2: wall=0:02:18 user=2:03.12 sys=0:13.01 maxRSS_MB=2051
# ...
# 15pct run1: wall=0:04:45 user=4:15.22 sys=0:28.11 maxRSS_MB=2055
Analysis:
5% data: ~135 seconds
10% data: ~285 seconds
15% data: ~580 seconds
Linear scaling confirmed!
100% estimate: ~2850 seconds = 27.5 minutes
If the relationship is not linear, you can use Excel to add a trendline to your chart and choose to display the equation.
Benchmark with a quadratic trend
A polynomial of order 2 or 3 is generally a good option for capturing non-linear trends.
Forecasting the runtime with all the data
Important
Always benchmark on compute nodes, not login nodes!
#!/bin/bash
#SBATCH --account=ABC012345
#SBATCH --partition=test
#SBATCH --job-name=benchmark
#SBATCH --output=benchmark_out_%j.txt
#SBATCH --error=benchmark_err_%j.txt
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --time=00:30:00
module load languages/python/3.7.12
export OMP_NUM_THREADS=8 # Match requested CPUs
echo "Starting benchmark at $(date)"
echo "Running on node: $(hostname)"
/usr/bin/time -v srun --cpu-bind=cores \
python my_program.py --input test_10pct.csv --threads 8
# Get job statistics
sacct -j JOBID --format=JobID,Elapsed,TotalCPU,MaxRSS,ReqCPUS,State -P
# Example output:
# JobID|Elapsed|TotalCPU|MaxRSS|ReqCPUS|State
# 12345|00:15:30|01:58:45|2048000K|8|COMPLETED
Resource Analysis:
Wall time: 15.5 minutes
CPU time: 118.75 minutes
Peak memory: ~2GB
Core-hours: 15.5 min × 8 cores = 2.07 core-hours
The same benchmarking principles used for CPU-based tasks apply generally to GPU workloads.
However, GPUs often have longer warm-up times. Consider factors like batch size, data transfer overhead, and GPU memory usage, which can significantly affect runtime.
To get accurate estimates, benchmarks should be run with sufficiently large datasets to reflect realistic performance.
Isambard-AI: using 1 GPU for 1 hour would be counted as 0.25 hours of resource use
Scenario
Note
Step-by-step calculation:
Full problem: 5× larger (20% → 100%)
Time per run: 45 min × 5 = 225 min = 3.75 hours
Core-hours per run: 3.75 × 8 = 30 core-hours
Base cost: 30 × 50 runs = 1,500 core-hours
Safety factors:
Total request: 1,500 × 3.6 = 5,400 core-hours
For queries related to these training materials jgi-training@bristol.ac.uk
For other HPC queries hpc-help@bristol.ac.uk
Happy computing! 🚀