MD Performance Guide - Compute Canada

Finding optimal resources
A poor choice of job submission or simulation parameters leads to poor performance and a waste of computing resources. However, finding optimal MD engines and submission parameters in a complex HPC environment with heterogeneous hardware is a daunting and time-consuming challenge. We developed this web portal to simplify this task.
How fast will a simulation run?
A quick look at the chart of the maximum simulation speed of all MD executables tested on CC systems offers a quick idea of how long a simulation will take.
Finding balance between speed and efficiency
The fastest simulations may be inefficient due to poor parallel scaling to many CPUs or GPUs. Inefficient simulations consume more resources and impact priority, resulting in longer times in queue and less work done. Color-coded by efficiency chart allows choosing an optimal combination of speed and efficiency.
Exploring the database
All benchmarks are available from the Explore menu . This menu offers tools for querying, filtering, and viewing benchmark data.
Viewing QM/MM benchmarks
To view QM/MM benchmarks select simulation system 4cg1 in the Explore window .

^*Data from simulation of 6n4o system with ~240,000 atoms, updated Dec. 24, 2025

^*Data from simulation of 6n4o system with ~240,000 atoms, updated Dec. 24, 2025

Cost of a simulation
Choosing fast and efficient job submission parameters is a good approach, but the total amount of computing resources used for a job, as we call it, cost, is more important. Three components define cost: CPU time, GPU time, and RAM. We measure cost in core years and GPU years. Core year is the equivalent of using one CPU core continuously for a full year and GPU year is the equivalent of using one GPU continuously for a full year.
Minimizing queue time
Scheduler controls resource usage to ensure that each user gets equal resources. If you use more resources than an average user, the scheduler will put your jobs on hold so that other users can catch up. You can avoid delays by minimizing the cost. Often you can significantly reduce cost by choosing simulations that are only slightly slower than the fastest but expensive ones .

^*Data from simulation of 6n4o system with ~240,000 atoms, updated Dec. 24, 2025

OPTIMIZING GPU USAGE

Understanding GPU equivalents
We express GPU usage in GPU equivalents per year. GPU equivalent is a bundle made up of a single GPU, several CPU cores, and some memory (CPU memory, not VRAM) . Composition of GPU equivalents is variable because it is defined by a number of CPU cores and RAM per GPU in a compute node.
Benchmarking CPU-only MD Engines
We calculate CPU usage in core equivalents per year. Core equivalent is a bundle made up of a single core, and some memory associated with it . For most of the systems one core equivalent includes 4000M per core.
Benchmarking GPU accelerated MD Engines
For benchmarking we use the optimal number of cores per GPU (the number needed for the fastest simulation time but not exceeding the maximum number of CPU cores per GPU in a GPU equivalent).

CPU_Y: CPU years per 1 microsecond long simulation. GPU_Y: GPU years per 1 microsecond long simulation. | T: tasks | C: cores | N: nodes. Speed is in ns/day. Integration step = 1 fs. Measured with dataset 6n40 (239,131 atoms).

^*More information is available by clicking ID in the table above
ID	Software	Module	Toolch	Arch	Data	Speed	CPU	CPU_eff	CPU_Y	GPU_Y	T	C	N	GPU	NV_Link	Site
258	PMEMD.cuda.mpi	amber/24.3	gofbc	avx512	6n4o	1.69e+02	EPYC 9654	38.4	0.0	0.065	4	1	1	4_H100-HBM3	Yes	Rorqual
254	GROMACS.cuda.mpi	gromacs/2024.4	gofbc	avx512	6n4o	1.62e+02	Xeon Gold 6448Y	50.2	0.0	0.034	2	16	1	2_H100-HBM3	Yes	Rorqual
252	GROMACS.mpi	gromacs/2024.4	gofb	avx512	6n4o	1.52e+02	EPYC 9654	15.0	20.8	0.0	1152	1	6	0	No	Rorqual
242	GROMACS.cuda.mpi	gromacs/2024.1	gofbfc	avx512	6n4o	1.49e+02	Xeon Gold 5418Y	50.0	0.0	0.037	2	12	1	2_H100-NVL	Yes	Argo
241	GROMACS.cuda	gromacs/2024.1	gofbfc	avx512	6n4o	1.48e+02	Xeon Gold 5418Y	100.0	0.0	0.018	1	12	1	1_H100-NVL	Yes	Argo
259	PMEMD.cuda	amber/24.3	gofbc	avx512	6n4o	1.10e+02	EPYC 9654	100.0	0.0	0.025	1	1	1	1_H100-HBM3	Yes	Rorqual
262	NAMD3.cuda	namd-multicore/3.0.1	gfbfc	avx512	6n4o	1.09e+02	EPYC 9654	40.8	0.0	0.1	1	4	1	4_H100-HBM3	Yes	Rorqual
69	GROMACS.mpi	gromacs/2021.2	gomkl	avx512	6n4o	1.04e+02	Xeon Gold 6248	18.1	33.72	0.0	1280	1	32	0	No	Siku
204	GROMACS.cuda	gromacs/2023.2	gofbfc	avx2	6n4o	1.01e+02	EPYC 7413	57.1	0.0	0.054	1	24	1	2_A100-SXM4	Yes	Narval
47	PMEMD.cuda.MPI	amber/20.12-20.15	gofbc	avx2	6n4o	9.97e+01	EPYC 7413	36.2	0.0	0.11	4	1	1	4_A100-SXM4	Yes	Narval

Date Updated: Dec. 24, 2025, 6:16 p.m.