MOLECULAR DYNAMICS PERFORMANCE GUIDE - COMPUTE CANADA

BENCHMARK DETAILS

CPUY: CPU years per 1 microsecond long simulation. GPUY: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).
ID Software Module Toolch Arch Data Speed CPUeff CPUY GPUY T C N GPU NVLink Site
46 PMEMD.cuda.MPI amber/20.12-20.15 gofbc avx2 6n4o 8.71e+01 63.2 0.0 0.063 2 1 1 2A100-SXM4 Yes Narval
  • Benchmark submission script:
    1. #!/bin/bash
    2. #SBATCH --nodes=1 --ntasks=2 --gpus-per-node=a100:2
    3. #SBATCH --mem-per-cpu=2000 --time=1:0:0
    4. # Usage: sbatch $0
    5. INPFILE=pmemd_prod.in
    6. STEPS=40000
    7. TMPFILE=tf_${SLURM_NTASKS}
    8. LOGFILE=production_${SLURM_NTASKS}.log
    9. module --force purge
    10. ml StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 amber/20.12-20.15
    11. # Print resource info
    12. echo ${SLURM_NODELIST} running on ${SLURM_NTASKS} tasks
    13. cat /proc/cpuinfo | grep "model name" | uniq
    14. srun pmemd.cuda_SPFP.MPI -O -i $INPFILE -o $LOGFILE -p prmtop.parm7 -c restart.rst7
    15. grep "NonSetup CPU time" $LOGFILE > $TMPFILE
    16. echo -n "ns/day:"
    17. awk -v steps=$STEPS "{ print 3.6*2.4*steps*0.01/$6 }" $TMPFILE
  • Simulation input file:
    1. Benchmark
    2. &cntrl
    3. imin=0,irest=1,ntx=5,
    4. nstlim=20000,dt=0.001,
    5. ntc=2,ntf=2,
    6. cut=8.0,
    7. ntpr=100, ntwx=1000,
    8. ntb=2, ntp=1, taup=2.0,
    9. ntt=3, gamma_ln=2.0, temp0=300.0,
    10. /
    11. &ewald
    12. nfft1=128,
    13. nfft2=128,
    14. nfft3=128,
    15. /