MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

FIND THE OPTIMAL
SUBMISSION PARAMETERS

SPEED UP SIMULATION

  • Finding optimal resources
    A poor choice of job submission or simulation parameters leads to poor performance and a waste of computing resources. However, finding optimal MD engines and submission parameters in a complex HPC environment with heterogeneous hardware is a daunting and time-consuming challenge. We developed this web portal to simplify this task.
  • How fast will a simulation run?
    A quick look at the chart of the maximum simulation speed of all MD executables tested on CC systems offers a quick idea of how long a simulation will take.
  • Finding balance between speed and efficiency
    The fastest simulations may be inefficient due to poor parallel scaling to many CPUs or GPUs. Inefficient simulations consume more resources and impact priority, resulting in longer times in queue and less work done. Color-coded by efficiency chart allows choosing an optimal combination of speed and efficiency.
  • Exploring the database
    All benchmarks are available from the Explore menu . This menu offers tools for querying, filtering, and viewing benchmark data.
  • Viewing QM/MM benchmarks
    To view QM/MM benchmarks select simulation system 4cg1 in the Explore window .

Best Performance Of Each MD Engine Among All Clusters

*Data from simulation of 6n4o system with ~240,000 atoms, updated Oct. 6, 2024

Cost Of Simulations Using Best Performing CPU-only MD Engines

*Data from simulation of 6n4o system with ~240,000 atoms, updated Oct. 6, 2024

    OPTIMIZING CPU USAGE

  • Cost of a simulation
    Choosing fast and efficient job submission parameters is a good approach, but the total amount of computing resources used for a job, as we call it, cost, is more important. Three components define cost: CPU time, GPU time, and RAM. We measure cost in core years and GPU years. Core year is the equivalent of using one CPU core continuously for a full year and GPU year is the equivalent of using one GPU continuously for a full year.
  • Minimizing queue time
    Scheduler controls resource usage to ensure that each user gets equal resources. If you use more resources than an average user, the scheduler will put your jobs on hold so that other users can catch up. You can avoid delays by minimizing the cost. Often you can significantly reduce cost by choosing simulations that are only slightly slower than the fastest but expensive ones .

Cost Of Simulations Using Best Performing GPU-accelerated MD Engines

*Data from simulation of 6n4o system with ~240,000 atoms, updated Oct. 6, 2024
  • OPTIMIZING GPU USAGE

    Understanding GPU equivalents
    We express GPU usage in GPU equivalents per year. GPU equivalent is a bundle made up of a single GPU, several CPU cores, and some memory (CPU memory, not VRAM) . Composition of GPU equivalents is variable because it is defined by a number of CPU cores and RAM per GPU in a compute node.
  • Benchmarking CPU-only MD Engines
    We calculate CPU usage in core equivalents per year. Core equivalent is a bundle made up of a single core, and some memory associated with it . For most of the systems one core equivalent includes 4000M per core.
  • Benchmarking GPU accelerated MD Engines
    For benchmarking we use the optimal number of cores per GPU (the number needed for the fastest simulation time but not exceeding the maximum number of CPU cores per GPU in a GPU equivalent).

BENCHMARK RESULTS

CPUY: CPU years per 1 microsecond long simulation. GPUY: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

*More information is available by clicking ID in the table above
ID Software Module Toolch Arch Data Speed CPU CPUeff CPUY GPUY T C N GPU NVLink Site
25 NAMD3.cuda binary_pack/3.0a9 - - 6n4o 7.14e+01 EPYC 7413 53.1 0.0 0.153 1 4 1 4A100-SXM4 Yes Narval
45 PMEMD.cuda amber/20.12-20.15 gofbc avx2 6n4o 6.90e+01 EPYC 7413 100.0 0.0 0.04 1 1 1 1A100-SXM4 Yes Narval
225 OPENMM.cuda openmm/8.0.0 gofbc avx2 6n4o 6.85e+01 EPYC 7413 33.7 0.0 0.16 1 4 1 4A100-SXM4 Yes Narval
245 OPENMM.cuda openmm/8.1.1 gofbc avx2 6n4o 5.45e+01 Xeon Gold 5418Y 52.6 0.0 0.101 1 2 1 2H100 NVL Yes Argo
105 PMEMD.cuda.MPI amber/20.9-20.15 gomklc avx512 6n4o 5.37e+01 Xeon Silver 4216 32.4 0.0 0.204 4 1 1 4V100-SXM2 Yes Cedar
132 OPENMM.cuda openmm/7.7.0 gofbc avx2 6n4o 5.30e+01 EPYC 7413 43.2 0.0 0.155 1 3 1 3A100-SXM4 Yes Narval
233 OPENMM.cuda openmm/8.0.0 gofbc avx512 6n4o 4.18e+01 Xeon Gold 6148 36.1 0.0 0.262 1 4 1 4V100-SXM2 Yes Beluga
9 PMEMD.cuda amber/20.9-20.15 gomklc avx512 6n4o 4.03e+01 Xeon Gold 6148 100.0 0.0 0.068 1 1 1 1V100-PCIE No Siku
21 NAMD2.ucx namd-ucx/2.14 iimkl avx2 6n4o 3.99e+01 EPYC 7532 28.0 87.91 0.0 1280 1 20 0 Yes Narval
90 GROMACS.cuda gromacs/2021.2 gomklc avx512 6n4o 3.90e+01 Xeon Gold 6148 100.0 0.0 0.07 1 8 1 1V100-PCIE No Siku
Date Updated: Oct. 6, 2024, 2:17 p.m.