MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

FIND THE OPTIMAL
SUBMISSION PARAMETERS

SPEED UP SIMULATION

  • Finding optimal resources
    A poor choice of job submission or simulation parameters leads to poor performance and a waste of computing resources. However, finding optimal MD engines and submission parameters in a complex HPC environment with heterogeneous hardware is a daunting and time-consuming challenge. We developed this web portal to simplify this task.
  • How fast will a simulation run?
    A quick look at the chart of the maximum simulation speed of all MD executables tested on CC systems offers a quick idea of how long a simulation will take.
  • Finding balance between speed and efficiency
    The fastest simulations may be inefficient due to poor parallel scaling to many CPUs or GPUs. Inefficient simulations consume more resources and impact priority, resulting in longer times in queue and less work done. Color-coded by efficiency chart allows choosing an optimal combination of speed and efficiency.
  • Exploring the database
    All benchmarks are available from the Explore menu . This menu offers tools for querying, filtering, and viewing benchmark data.
  • Viewing QM/MM benchmarks
    To view QM/MM benchmarks select simulation system 4cg1 in the Explore window .

Best Performance Of Each MD Engine Among All Clusters

*Data from simulation of 6n4o system with ~240,000 atoms, updated Sept. 7, 2023

Cost Of Simulations Using Best Performing CPU-only MD Engines

*Data from simulation of 6n4o system with ~240,000 atoms, updated Sept. 7, 2023

    OPTIMIZING CPU USAGE

  • Cost of a simulation
    Choosing fast and efficient job submission parameters is a good approach, but the total amount of computing resources used for a job, as we call it, cost, is more important. Three components define cost: CPU time, GPU time, and RAM. We measure cost in core years and GPU years. Core year is the equivalent of using one CPU core continuously for a full year and GPU year is the equivalent of using one GPU continuously for a full year.
  • Minimizing queue time
    Scheduler controls resource usage to ensure that each user gets equal resources. If you use more resources than an average user, the scheduler will put your jobs on hold so that other users can catch up. You can avoid delays by minimizing the cost. Often you can significantly reduce cost by choosing simulations that are only slightly slower than the fastest but expensive ones .

Cost Of Simulations Using Best Performing GPU-accelerated MD Engines

*Data from simulation of 6n4o system with ~240,000 atoms, updated Sept. 7, 2023
  • OPTIMIZING GPU USAGE

    Understanding GPU equivalents
    We express GPU usage in GPU equivalents per year. GPU equivalent is a bundle made up of a single GPU, several CPU cores, and some memory (CPU memory, not VRAM) . Composition of GPU equivalents is variable because it is defined by a number of CPU cores and RAM per GPU in a compute node.
  • Benchmarking CPU-only MD Engines
    We calculate CPU usage in core equivalents per year. Core equivalent is a bundle made up of a single core, and some memory associated with it . For most of the systems one core equivalent includes 4000M per core.
  • Benchmarking GPU accelerated MD Engines
    For benchmarking we use the optimal number of cores per GPU (the number needed for the fastest simulation time but not exceeding the maximum number of CPU cores per GPU in a GPU equivalent).

BENCHMARK RESULTS

CPUY: CPU years per 1 microsecond long simulation. GPUY: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

*More information is available by clicking ID in the table above
ID Software Module Toolch Arch Data Speed CPU CPUeff CPUY GPUY T C N GPU NVLink Site
69 GROMACS.mpi gromacs/2021.2 gomkl avx512 6n4o 1.04e+02 Xeon Gold 6248 18.1 33.72 0.0 1280 1 32 0 No Siku
204 GROMACS.cuda gromacs/2023.2 gofbfc avx2 6n4o 1.01e+02 EPYC 7413 57.1 0.0 0.054 1 24 1 2A100-SXM4 Yes Narval
47 PMEMD.cuda.MPI amber/20.12-20.15 gofbc avx2 6n4o 9.97e+01 EPYC 7413 36.2 0.0 0.11 4 1 1 4A100-SXM4 Yes Narval
203 GROMACS.cuda.mpi gromacs/2023.2 gofbfc avx2 6n4o 9.84e+01 EPYC 7413 55.7 0.0 0.056 2 12 1 2A100-SXM4 Yes Narval
184 GROMACS.cuda gromacs/2022.3 gofbc avx2 6n4o 8.56e+01 EPYC 7413 100.0 0.0 0.032 1 12 1 1A100-SXM4 Yes Narval
101 GROMACS.cuda gromacs/2021.4 gofbc avx2 6n4o 8.56e+01 EPYC 7413 100.0 0.0 0.032 1 12 1 1A100-SXM4 Yes Narval
240 NAMD3.cuda binary_pack/3.0b3 - - 6n4o 7.70e+01 EPYC 7413 45.7 0.0 0.142 1 4 1 4A100-SXM4 Yes Narval
83 GROMACS.mpi gromacs/2021.4 gofb avx2 6n4o 7.49e+01 EPYC 7532 29.7 18.72 0.0 512 1 8 0 Yes Narval
25 NAMD3.cuda binary_pack/3.0a9 - - 6n4o 7.14e+01 EPYC 7413 53.1 0.0 0.153 1 4 1 4A100-SXM4 Yes Narval
45 PMEMD.cuda amber/20.12-20.15 gofbc avx2 6n4o 6.90e+01 EPYC 7413 100.0 0.0 0.04 1 1 1 1A100-SXM4 Yes Narval
Date Updated: Sept. 7, 2023, 12:31 a.m.