MD Performance Guide - Compute Canada

BENCHMARK DETAILS

ID=151

Dataset: 6n4o
Software: PMEMD.cuda.MPI (amber/20.9-20.15-gomklc-2020a-avx512)
Resource: 3 tasks, 1 cores, 1 nodes, 3 GPUs, no NVLink
CPU: Xeon E5-2650 (Sandy Bridge), 2.2 GHz
GPU: Tesla-P100-PCIE-12GB, 16 cores/GPU
Simulation speed: 7.97 ns/day
Efficiency: 17.8 %
Site: Cedar
Date: Feb. 7, 2022, 11:35 p.m.

Submission script:

#!/bin/bash
#SBATCH --nodes=1 --ntasks=3 --gpus-per-node=p100:3
#SBATCH --mem-per-cpu=2000 --time=1:0:0

INPFILE=pmemd_prod.in
STEPS=40000
# End of user input
LOGFILE=production_${SLURM_JOBID}.log 
module --force purge
ml StdEnv/2020  gcc/9.3.0 cuda/11.0 openmpi/4.0.3 amber/20.9-20.15
# Print resource info
echo ${SLURM_NODELIST} running on ${SLURM_NTASKS} tasks
cat /proc/cpuinfo | grep "model name" | uniq
# Run simulation 
srun pmemd.cuda.MPI -O -i $INPFILE -o $LOGFILE -p prmtop.parm7 -c restart.rst7
#rm  $TMPFILE $LOGFILE mdinfo mdcrd restrt

Notes:

Inconsistent performance. One day on nodes, cdr340 and cdr341 speed was only about 9 ns/day. Two days later speed was back to normal.

Compare jobs 22449024 and  22503974. Note abnormal RAM usage of the slow job 22449024. Nvidia-smi showed only three running processes for  22449024, while normal jobs had four.


Script to calculate speed:

#!/bin/bash
STEPS=40000
grep "NonSetup CPU time" production_*.log > tmp
awk -v steps=$STEPS '{ print 3.6*2.4*steps*0.01/$6 }' tmp
rm tmp

Simulation input file:

Benchmark
 &cntrl
  imin=0,irest=1,ntx=5,
  nstlim=20000,dt=0.001,
  ntc=2,ntf=2,
  cut=8.0, 
  ntpr=100, ntwx=1000,
  ntb=2, ntp=1, taup=2.0,
  ntt=3, gamma_ln=2.0, temp0=300.0,
 /
 &ewald
  nfft1=128,
  nfft2=128,
  nfft3=128,
 /

MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

BENCHMARK DETAILS

ID=151