MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

BENCHMARK DETAILS

ID=283

  • Dataset: 9naw
  • Software: GROMACS.cuda.mpi (gromacs/2024.4-gofbc-2023a-avx512)
  • Resource: 2 tasks, 16 cores, 1 nodes, 2 GPUs, with NVLink
  • CPU: Xeon Gold 6448Y (Sapphire Rapids), 2.1 GHz
  • GPU: NVidia-H100-HBM3-80GB, 16 cores/GPU
  • Simulation speed: 26.911 ns/day
  • Efficiency: 68.8 %
  • Site: Rorqual
  • Date: Dec. 24, 2025, 2:03 p.m.
  • Submission script:
    #!/bin/bash
    #SBATCH --mem-per-cpu=4000 --time=3:0:0
    #SBATCH --nodes=1
    #SBATCH --ntasks=2
    #SBATCH --cpus-per-task=16
    #SBATCH --gpus-per-task=h100:1

    module load StdEnv/2023 gcc/12.3 openmpi/4.1.5 cuda/12.2 gromacs/2024.4

    WORKDIR=`pwd`
    cp * $SLURM_TMPDIR
    cd $SLURM_TMPDIR

    export GMX_ENABLE_DIRECT_GPU_COMM=1
    gmx mdrun \
    -ntmpi $SLURM_NTASKS \
    -ntomp $SLURM_CPUS_PER_TASK \
    -nb gpu \
    -pme gpu \
    -npme 1 \
    -update gpu \
    -bonded gpu \
    -noconfout \
    -nstlist 300 \
    -s topol.tpr
  • Notes:
    Multi-GPU performance:

    https://developer.nvidia.com/hpc-application-performance
    https://catalog.ngc.nvidia.com/orgs/hpc/containers/gromacs?version=2023.2
    Reported good scaling of STMV up to 8 GPUs.

    Key to scaling is using direct GPU communications + cuFFT

    https://developer.nvidia.com/blog/massively-improved-multi-node-nvidia-gpu-scalability-with-gromacs/

    At the runtime set:

    export GMX_ENABLE_DIRECT_GPU_COMM=1

    Build GROMACS with:

    -DGMX_OPENMP=ON \
    -DGMX_MPI=ON \
    -DGMX_BUILD_OWN_FFTW=ON \
    -DGMX_GPU=CUDA \
    -DCMAKE_BUILD_TYPE=Release \
    -DGMX_DOUBLE=off \
    -DGMX_USE_CUFFTMP=ON \
    -DcuFFTMp_ROOT=$HPCSDK_LIBDIR
  • Simulation input file:
    title                   = benchmark 
    ; Run parameters
    integrator = md
    nsteps = 100000
    dt = 0.002
    ; Output control
    nstxout = 0
    nstvout = 0
    nstfout = 0
    nstenergy = 1000
    nstlog = 500
    nstxout-compressed = 5000
    compressed-x-grps = System
    ; Bond parameters
    continuation = yes
    constraint_algorithm = Lincs
    constraints = h-bonds

    ; Neighborsearching
    cutoff-scheme = Verlet
    ns_type = grid
    nstlist = 10
    rcoulomb = 0.8
    rvdw = 0.8
    DispCorr = Ener ; anaytic VDW correction
    ; Electrostatics
    coulombtype = PME
    pme_order = 4
    fourier-nx = 324
    fourier-ny = 324
    fourier-nz = 324
    ; Temperature coupling is on
    tcoupl = V-rescale
    tc-grps = system
    tau_t = 0.1
    ref_t = 300
    ; Pressure coupling is on
    pcoupl = Parrinello-Rahman
    pcoupltype = isotropic
    tau_p = 2.0
    ref_p = 1.0
    compressibility = 4.5e-5
    ; Periodic boundary conditions
    pbc = xyz
    ; Velocity generation
    gen_vel = no