MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

ADVANCED SEARCH

APPLIED FILTERS: GROMACS.mpi _____ _____ _____ Narval _____ _____ _____ 6n4o
.
  • EXPLORING THE DATABASE

    Default view
    When this page is viewed no filters are initially applied. All benchmarks are selected and sorted by simulation speed. The chart on the right displays only top 30 benchmarks for clarity.
  • Selecting benchmarks
    A subset of benchmarks can be selected using a custom chain of filters. Selected database entries can be downloaded as CSV files for further analysis or viewed in the Benchmark Details table at the bottom of the page.
  • Detailed views
    A detailed view of each database entry can be accessed from Benchmark ID and Software ID search forms. Detailed views include submission commands and simulation input files. View example: PMEMD @Narval (benchmark ID=46).
  • Parallel efficiency
    Efficiency is computed as PS/(SS * N) where PS is speed of the parallel program, SS is speed of the serial program, and N is the number of CPUs or GPUs.
  • Viewing parallel speedup and efficiency
    To view the graph of the dependence of parallel speedup and efficiency on the number of CPU/GPU equivalents select only one software and one cluster. View example: GROMACS @Narval .

  • Viewing QM/MM benchmarks
    To view QM/MM benchmarks select simulation system 4cg1 .

Performance Chart For Selected Benchmarks

*Data updated Sept. 7, 2023

Cost Of CPU-only Simulations

*Data updated Sept. 7, 2023

    OPTIMIZING CPU USAGE

  • Submitting CPU-only simulation
    CPU-only simulations reach performance comparable to GPU-accelerated ones only with hundreds of CPU cores. It is not uncommon for such jobs to wait in the queue for up to several days for such a significant resource to be available, especialy if a long time is requested.
  • Benchmarking CPU-only MD Engines
    We calculate CPU usage in core equivalents per year. Core equivalent is a bundle made up of a single core, and some memory associated with it . For most of the systems one core equivalent includes 4000M per core.

BENCHMARK RESULTS

CPUY: CPU years per 1 microsecond long simulation. GPUY: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

*More information is available by clicking ID in the table above
ID Software Module Toolch Arch Data Speed CPU CPUeff CPUY GPUY T C N GPU NVLink Site
83 GROMACS.mpi gromacs/2021.4 gofb avx2 6n4o 7.49e+01 EPYC 7532 29.7 18.72 0.0 512 1 8 0 Yes Narval
74 GROMACS.mpi gromacs/2021.4 gofb avx2 6n4o 5.06e+01 EPYC 7532 40.2 13.85 0.0 256 1 4 0 Yes Narval
87 GROMACS.mpi gromacs/2021.4 gofb avx2 6n4o 4.58e+01 EPYC 7532 9.1 61.29 0.0 1024 1 16 0 Yes Narval
73 GROMACS.mpi gromacs/2021.4 gofb avx2 6n4o 2.95e+01 EPYC 7532 46.8 11.9 0.0 128 1 2 0 Yes Narval
71 GROMACS.mpi gromacs/2021.4 gofb avx2 6n4o 1.90e+01 EPYC 7532 60.2 9.25 0.0 64 1 1 0 Yes Narval
Date Updated: Sept. 7, 2023, 12:31 a.m.