MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

ADVANCED SEARCH

APPLIED FILTERS: QUICK _____ _____ _____ _____ _____ _____ _____ 4cg1
.
  • EXPLORING THE DATABASE

    Default view
    When this page is viewed no filters are initially applied. All benchmarks are selected and sorted by simulation speed. The chart on the right displays only top 30 benchmarks for clarity.
  • Selecting benchmarks
    A subset of benchmarks can be selected using a custom chain of filters. Selected database entries can be downloaded as CSV files for further analysis or viewed in the Benchmark Details table at the bottom of the page.
  • Detailed views
    A detailed view of each database entry can be accessed from Benchmark ID and Software ID search forms. Detailed views include submission commands and simulation input files. View example: PMEMD @Narval (benchmark ID=46).
  • Parallel efficiency
    Efficiency is computed as PS/(SS * N) where PS is speed of the parallel program, SS is speed of the serial program, and N is the number of CPUs or GPUs.
  • Viewing parallel speedup and efficiency
    To view the graph of the dependence of parallel speedup and efficiency on the number of CPU/GPU equivalents select only one software and one cluster. View example: GROMACS @Narval .

  • Viewing QM/MM benchmarks
    To view QM/MM benchmarks select simulation system 4cg1 .

Performance Chart For Selected Benchmarks

*Data updated Oct. 6, 2024

Cost Of GPU-accelerated Simulations

*Data updated Oct. 6, 2024
  • OPTIMIZING GPU USAGE

    Parallel scaling to multiple GPUs
    Parallel scaling to multiple GPUs strongly depends on the compibation of software, hardware and simulation parameters. Often simulations do not run faster on multiple GPUs (PMEMD @Cedar example). Simulations on nodes with direct interconnect between GPUs (NVLink) are more likely to benefit from multiple GPUs, but efficiency decreases and cost goes up with the number of GPUs (NAMD3 @Cedar example ).
  • Benchmarking GPU accelerated MD Engines
    For benchmarking we use the optimal number of cores per GPU (the number needed for the fastest simulation time but not exceeding the maximum number of CPU cores per GPU in a GPU equivalent).

BENCHMARK RESULTS

CPUY: CPU years per 1 microsecond long simulation. GPUY: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

*More information is available by clicking ID in the table above
ID Software Module Toolch Arch Data Speed CPU CPUeff CPUY GPUY T C N GPU NVLink Site
197 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 2.46e-02 EPYC 7413 23.3 0.0 1784.881 16 1 4 16A100-SXM4 Yes Narval
196 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 2.44e-02 EPYC 7413 30.9 0.0 1347.032 12 1 3 12A100-SXM4 Yes Narval
195 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 2.18e-02 EPYC 7413 41.5 0.0 1004.82 8 1 2 8A100-SXM4 Yes Narval
194 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 1.99e-02 EPYC 7413 50.5 0.0 824.581 6 1 2 6A100-SXM4 Yes Narval
191 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 1.63e-02 Xeon Silver 4216 26.8 0.0 2689.498 16 1 4 16V100-SXM2 Yes Cedar
193 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 1.60e-02 EPYC 7413 60.8 0.0 685.058 4 1 1 4A100-SXM4 Yes Narval
127 SANDER.QUICK.cuda.MPI ambertools/21 gofbc avx2 4cg1 1.58e-02 EPYC 7413 24.7 0.0 2779.807 16 1 4 16A100-SXM4 Yes Narval
190 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 1.53e-02 Xeon Silver 4216 33.4 0.0 2152.968 12 1 3 12V100-SXM2 Yes Cedar
199 SANDER.QUICK.cuda.MPI ambertools/21 gofbc avx2 4cg1 1.44e-02 EPYC 7413 30.1 0.0 2282.344 12 1 3 12A100-SXM4 Yes Narval
189 SANDER.QUICK.cuda.MPI ambertools/23 gofbc avx2 4cg1 1.35e-02 Xeon Silver 4216 44.4 0.0 1622.019 8 1 2 8V100-SXM2 Yes Cedar
Date Updated: Oct. 6, 2024, 2:17 p.m.