MOLECULAR DYNAMICS PERFORMANCE GUIDE - Digital Research Alliance of CANADA

BENCHMARK DETAILS

ID=130

  • Dataset: 6n4o
  • Software: OPENMM.cuda (openmm/7.7.0-gofbc-2020.1.403.114-avx2)
  • Resource: 1 tasks, 2 cores, nodes, 2 GPUs, with NVLink
  • CPU: EPYC 7413 (Milan), 2.65 GHz
  • GPU: NVIDIA-A100-SXM4-40GB, 12 cores/GPU
  • Simulation speed: 36.474289264586 ns/day
  • Efficiency: 44.7 %
  • Site: Narval
  • Date: Aug. 30, 2023, 1:01 a.m.
  • Submission script:

    #!/bin/bash
    #SBATCH -c2 --gpus-per-node=a100:2
    #SBATCH --mem-per-cpu=4000 --time=1:0:0

    module purge
    ml StdEnv/2020 gcc/9.3.0 cuda/11.4 openmpi/4.0.3 python/3.8.10
    ml openmm/7.7.0 netcdf/4.7.4 hdf5/1.10.6 mpi4py/3.0.3

    virtualenv ${SLURM_TMPDIR}/env
    source ${SLURM_TMPDIR}/env/bin/activate
    pip install --no-index netCDF4 parmed

    python openmm_input.py

  • Notes:

    Benchmark time: 32.05213788059185 ns/day, Using 2 GPUs, ng11105
    Benchmark time: 35.36828282446389 ns/day, Using 2 GPUs, ng10103
    Benchmark time: 35.54908796849859 ns/day, Using 2 GPUs, ng30602
    Benchmark time: 36.123564634703705 ns/day, Using 2 GPUs, ng30104
    Benchmark time: 36.194952011856905 ns/day, Using 2 GPUs, ng10404
    Benchmark time: 36.474289264586 ns/day, Using 2 GPUs, ng30104

  • Simulation input file:

    #!/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.8.10/bin/python

    # FILE openmm_input.py

    import openmm as mm
    import openmm.app as app
    from openmm.unit import *
    import sys, time, netCDF4, ctypes
    from parmed import load_file
    from parmed.openmm import StateDataReporter, NetCDFReporter

    CUDA_SUCCESS = 0
    cuda = ctypes.CDLL("libcuda.so")
    device = ctypes.c_int()
    nGpus = ctypes.c_int()
    name = b" " * 100
    devID=[]

    result = cuda.cuInit(0)
    if result != CUDA_SUCCESS:
    print("CUDA is not available")
    quit()
    cuda.cuDeviceGetCount(ctypes.byref(nGpus))
    for i in range(nGpus.value):
    result = cuda.cuDeviceGet(ctypes.byref(device), i)
    result = cuda.cuDeviceGetName(ctypes.c_char_p(name), len(name), device)
    print("Device %s: %s" % (device.value, name.split(b"\0", 1)[0].decode()))
    devID.append(device.value)
    dvi=",".join(str(i) for i in devID)

    amber_sys=load_file("prmtop.parm7", "restart.rst7")
    ncrst=app.amberinpcrdfile.AmberInpcrdFile("restart.rst7")

    nsteps=100000
    system=amber_sys.createSystem(
    nonbondedMethod=app.PME,
    ewaldErrorTolerance=0.0004,
    nonbondedCutoff=8.0*angstroms,
    constraints=app.HBonds,
    removeCMMotion = True,
    )

    #PME Grids:
    #EwaldTolerance=0.00040 Grid=[144, 144, 144]
    #EwaldTolerance=0.00045 Grid=[140, 140, 140]
    #EwaldTolerance=0.00050 Grid=[135, 135, 135]
    #EwaldTolerance=0.00065 Grid=[128, 128, 128]

    integrator = mm.LangevinIntegrator(300*kelvin, 1.0/picoseconds, 1.0*femtoseconds,)
    barostat = mm.MonteCarloBarostat(1.0*atmosphere, 300.0*kelvin, 25)
    system.addForce(barostat)

    platform = mm.Platform.getPlatformByName("CUDA")
    prop = dict(CudaPrecision="mixed", DeviceIndex=dvi)

    sim = app.Simulation(amber_sys.topology, system, integrator, platform, prop)
    sim.context.setPositions(amber_sys.positions)
    sim.context.setVelocities(ncrst.velocities)

    sim.reporters.append(
    StateDataReporter(
    sys.stdout,
    400,
    step=True,
    time=False,
    potentialEnergy=True,
    kineticEnergy=True,
    temperature=True,
    volume=True
    )
    )

    sim.reporters.append(
    NetCDFReporter(
    "trajectory.nc",
    50000,
    crds=True
    )
    )

    print("Running dynamics")
    start = time.time()
    sim.step(nsteps)
    elapsed=time.time() - start
    benchmark_time=3.6*2.4*nsteps*0.01/elapsed
    print(f"Elapsed time: {elapsed} sec\nBenchmark time: {benchmark_time} ns/day")