MD Performance Guide - Compute Canada

EXPLORING THE DATABASE

Default view
When this page is viewed no filters are initially applied. All benchmarks are selected and sorted by simulation speed. The chart on the right displays only top 30 benchmarks for clarity.
Selecting benchmarks
A subset of benchmarks can be selected using a custom chain of filters. Selected database entries can be downloaded as CSV files for further analysis or viewed in the Benchmark Details table at the bottom of the page.
Detailed views
A detailed view of each database entry can be accessed from Benchmark ID and Software ID search forms. Detailed views include submission commands and simulation input files. View example: PMEMD @Narval (benchmark ID=46).
Parallel efficiency
Efficiency is computed as PS/(SS * N) where PS is speed of the parallel program, SS is speed of the serial program, and N is the number of CPUs or GPUs.
Viewing parallel speedup and efficiency
To view the graph of the dependence of parallel speedup and efficiency on the number of CPU/GPU equivalents select only one software and one cluster. View example: GROMACS @Narval .
Viewing QM/MM benchmarks
To view QM/MM benchmarks select simulation system 4cg1 .

^*Data updated March 14, 2026

^*Data updated March 14, 2026

Submitting CPU-only simulation
CPU-only simulations reach performance comparable to GPU-accelerated ones only with hundreds of CPU cores. It is not uncommon for such jobs to wait in the queue for up to several days for such a significant resource to be available, especialy if a long time is requested.
Benchmarking CPU-only MD Engines
We calculate CPU usage in core equivalents per year. Core equivalent is a bundle made up of a single core, and some memory associated with it . For most of the systems one core equivalent includes 4000M per core.

^*Data updated March 14, 2026

OPTIMIZING GPU USAGE

Parallel scaling to multiple GPUs
Parallel scaling to multiple GPUs strongly depends on the compibation of software, hardware and simulation parameters. Often simulations do not run faster on multiple GPUs (PMEMD @Cedar example). Simulations on nodes with direct interconnect between GPUs (NVLink) are more likely to benefit from multiple GPUs, but efficiency decreases and cost goes up with the number of GPUs (NAMD3 @Cedar example ).
Benchmarking GPU accelerated MD Engines
For benchmarking we use the optimal number of cores per GPU (the number needed for the fastest simulation time but not exceeding the maximum number of CPU cores per GPU in a GPU equivalent).

CPU_Y: CPU years per 1 microsecond long simulation. GPU_Y: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

^*More information is available by clicking ID in the table above
ID	Software	Module	Toolch	Arch	Data	Speed	CPU	CPU_eff	CPU_Y	GPU_Y	T	C	N	GPU	NV_Link	Site
156	OPENMM.cuda	openmm/7.7.0	gofbc	avx512	6n4o	1.95e+01	Xeon Gold 6248	23.2	0.0	0.561	1	4	1	4_RTX6000	No	Siku
172	NAMD2.ucx	namd-ucx/2.14	iimkl	avx2	6n4o	1.93e+01	Xeon E5-2683	50.3	72.68	0.0	512	1	16	0	No	Graham
71	GROMACS.mpi	gromacs/2021.4	gofb	avx2	6n4o	1.90e+01	EPYC 7532	60.2	9.25	0.0	64	1	1	0	Yes	Narval
280	NAMD3.cuda	namd-multicore/3.0.1	gfbfc	avx512	9naw	1.89e+01	EPYC 9654	62.3	0.0	0.435	1	3	1	3_H100-HBM3	Yes	Rorqual
81	GROMACS.cuda	gromacs/2021.4	gofbc	avx2	6n4o	1.87e+01	EPYC 7413	22.8	0.0	0.146	1	4	1	1_A100-SXM4	Yes	Narval
50	NAMD3.cuda	binary_pack/3.0a9	-	-	6n4o	1.85e+01	Xeon Silver 4216	100.0	0.0	0.148	1	1	1	1_V100-SXM2	Yes	Cedar
54	NAMD3.cuda	binary_pack/3.0a9	-	-	6n4o	1.85e+01	Xeon E5-2650	52.5	0.0	0.593	1	4	1	4_P100-PCIE	No	Cedar
292	GROMACS.cuda.mpi	gromacs/2024.4	gofbc	avx512	9naw	1.81e+01	Xeon Platinum 8570	100.0	0.0	0.151	1	14	1	1_H100-HBM3	Yes	Nibi
306	GROMACS.ROCm	container/2026.0	SYCL (Adaptive CPP)	avx512	9naw	1.80e+01	AMD Instinct Accelerator MI300A	100.0	0.0	0.152	1	24	1	1_MI300A	No	Nibi
38	NAMD3.cuda	binary_pack/3.0a9	-	-	6n4o	1.79e+01	Xeon Gold 6248	62.9	0.0	0.307	1	2	1	2_RTX6000	No	Siku

Date Updated: March 14, 2026, 2:13 p.m.