MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

FIND THE OPTIMAL
SUBMISSION PARAMETERS

SPEED UP SIMULATION

  • Finding optimal resources
    A poor choice of job submission or simulation parameters leads to poor performance and a waste of computing resources. However, finding optimal MD engines and submission parameters in a complex HPC environment with heterogeneous hardware is a daunting and time-consuming challenge. We developed this web portal to simplify this task.
  • How fast will a simulation run?
    A quick look at the chart of the maximum simulation speed of all MD executables tested on CC systems offers a quick idea of how long a simulation will take.
  • Finding balance between speed and efficiency
    The fastest simulations may be inefficient due to poor parallel scaling to many CPUs or GPUs. Inefficient simulations consume more resources and impact priority, resulting in longer times in queue and less work done. Color-coded by efficiency chart allows choosing an optimal combination of speed and efficiency.
  • Exploring the database
    All benchmarks are available from the Explore menu . This menu offers tools for querying, filtering, and viewing benchmark data.
  • Viewing QM/MM benchmarks
    To view QM/MM benchmarks select simulation system 4cg1 in the Explore window .

Best Performance Of Each MD Engine Among All Clusters

*Data from simulation of 6n4o system with ~240,000 atoms, updated Oct. 6, 2024

Cost Of Simulations Using Best Performing CPU-only MD Engines

*Data from simulation of 6n4o system with ~240,000 atoms, updated Oct. 6, 2024

    OPTIMIZING CPU USAGE

  • Cost of a simulation
    Choosing fast and efficient job submission parameters is a good approach, but the total amount of computing resources used for a job, as we call it, cost, is more important. Three components define cost: CPU time, GPU time, and RAM. We measure cost in core years and GPU years. Core year is the equivalent of using one CPU core continuously for a full year and GPU year is the equivalent of using one GPU continuously for a full year.
  • Minimizing queue time
    Scheduler controls resource usage to ensure that each user gets equal resources. If you use more resources than an average user, the scheduler will put your jobs on hold so that other users can catch up. You can avoid delays by minimizing the cost. Often you can significantly reduce cost by choosing simulations that are only slightly slower than the fastest but expensive ones .

Cost Of Simulations Using Best Performing GPU-accelerated MD Engines

*Data from simulation of 6n4o system with ~240,000 atoms, updated Oct. 6, 2024
  • OPTIMIZING GPU USAGE

    Understanding GPU equivalents
    We express GPU usage in GPU equivalents per year. GPU equivalent is a bundle made up of a single GPU, several CPU cores, and some memory (CPU memory, not VRAM) . Composition of GPU equivalents is variable because it is defined by a number of CPU cores and RAM per GPU in a compute node.
  • Benchmarking CPU-only MD Engines
    We calculate CPU usage in core equivalents per year. Core equivalent is a bundle made up of a single core, and some memory associated with it . For most of the systems one core equivalent includes 4000M per core.
  • Benchmarking GPU accelerated MD Engines
    For benchmarking we use the optimal number of cores per GPU (the number needed for the fastest simulation time but not exceeding the maximum number of CPU cores per GPU in a GPU equivalent).

BENCHMARK RESULTS

CPUY: CPU years per 1 microsecond long simulation. GPUY: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

*More information is available by clicking ID in the table above
ID Software Module Toolch Arch Data Speed CPU CPUeff CPUY GPUY T C N GPU NVLink Site
13 PMEMD.mpi amber/20.9-20.15 gomkl avx512 6n4o 7.24e+00 Xeon Gold 6248 45.5 60.53 0.0 160 1 3 0 No Siku
119 NAMD2.cuda.ofi namd-ofi-smp/2.14 iimklc avx2 6n4o 6.33e+00 Xeon E5-2650 46.8 0.0 1.731 4 6 1 4P100-PCIE No Cedar
1 NAMD2.omp namd-multicore/2.14 iimkl avx512 6n4o 2.71e+00 Xeon Gold 6248 63.5 40.42 0.0 1 40 1 0 No Siku
16 NAMD2.omp namd-multicore/2.14 iimkl avx2 6n4o 1.47e+00 EPYC 7532 68.6 37.32 0.0 1 20 1 0 No Narval
Date Updated: Oct. 6, 2024, 2:17 p.m.