The world’s most powerful supercomputer has raised the bar for calculating the number of atoms in a molecular dynamics simulation, reaching a size and speed 1,000 times greater than any other simulation of its kind before.
The simulation was carried out by a team of researchers from the University of Melbourne, the Department of Energy’s Oak Ridge National Laboratory, AMD and QDX who used the Frontier supercomputer to calculate the dynamics of more than 2 million electrons.
Additionally, this is the first time-resolved quantum chemistry simulation to exceed exaflop (over a quintillion, or a billion billion, calculations per second) using double-precision arithmetic. The 16 decimal places provided by double precision are computationally demanding, but the extra precision is needed for many scientific problems. In addition to setting a new benchmark, this achievement also provides a model for improving algorithms to solve larger and more complex scientific problems using top-of-the-line exaflop supercomputers.
“This is going to be a game changer for many areas of science, but especially for drug discovery using high-precision quantum mechanics,” said Giuseppe Barca, an associate professor and programmer at the University of Melbourne. “Historically, researchers have hit a wall when trying to simulate the physics of molecular systems with very precise models because there just wasn’t enough computing power. So they’ve been limited to simulating small molecules, but many of the interesting problems we want to solve involve large models.”
In drug design, for example, Barca explains that large proteins are often responsible for diseases. These proteins can contain up to 10,000 atoms, which is too many to simulate without using very rough models that operate with reduced precision. Models with reduced precision are more computationally efficient but produce less accurate results.
Modern drug design requires the ability to simultaneously model large proteins and pair them with libraries of small molecules designed to bind to the large protein and prevent it from working. Most molecular simulations use simplified models to represent the forces between atoms. However, these force-field approximations do not account for some essential quantum mechanical phenomena such as bond breaking and bond formation—both important for chemical reactions—and other complex interactions.
To overcome these limitations, Barca and his team developed EXESS, or Extreme-scale Electronic Structure System. Instead of going through the tedious process of scaling existing codes that had been written for previous-generation petascale machines, Barca decided to write new code specifically designed for exascale systems like Frontier with hybrid architectures—a combination of central processing units (CPUs) and graphics processing unit (GPU) accelerators.
Common methods in quantum mechanics, such as density functional theory, estimate the energy of a molecule based on the density distribution of its electrons. In contrast, EXESS uses wave function theory, or WFT, an approach to quantum mechanics that uses Schrödinger theory directly. equation for calculating the behavior of electrons based on their wave functions. EXESS integrates MP2, a specific type of WFT that adds an extra layer of precision and includes interactions between electron pairs.
The Schrödinger equation offers a much higher level of precision in simulations. However, its resolution requires considerable computing power and time, so much so that these simulations have until now been limited to small systems. This is partly explained by the fact that, as the number of atoms in the simulation increases, the time required to simulate them increases even further.
“Being able to accurately predict the behavior and model the properties of atoms, whether in larger molecular systems or with greater fidelity, is critical to developing new and more advanced technologies, including improved drug treatments, medical materials and biofuels,” said Dmytro Bykov, group leader of Chemistry and Materials Informatics at ORNL. “That’s why we created Frontier, to push the boundaries of computing and do what hasn’t been done before.”
Far exceeding the petaflop
Bykov and Barca began working together several years ago as part of Exascale Computing Projector ECP, DOE’s research, development, and deployment efforts focused on creating the world’s first high-performance exascale ecosystem. The duo’s collaboration in ECP focused on optimizing decades-old code to run on next-generation supercomputers built with entirely new hardware and architectures. Their goal with EXESS was not only to write new molecular dynamics code for exascale machines, but also to create a simulation that would put Frontier to the test.
The HPE-Cray EX Frontier supercomputer, located at the Oak Ridge Leadership Computing Facility, or OLCF, is currently ranked No. 1 on TOP500 Frontier is listed as one of the world’s fastest supercomputers after reaching a peak performance of 1.2 exaflops. Frontier has 9,408 nodes with over 8 million processing cores from a combination of 3rd Gen AMD EPYC™ processors and AMD Instinct™ MI250X GPUs.
The team’s efforts on Frontier have been a huge success. They ran a series of simulations using 9,400 Frontier compute nodes to calculate the electronic structure of different proteins and organic molecules containing hundreds of thousands of atoms.
Frontier’s colossal computing power allowed the research team to surpass the limits of previous molecular dynamics simulations with quantum mechanical precision. This was the first time a quantum chemistry simulation of more than 2 million electrons had exceeded exaflop speed using double-precision arithmetic.
This isn’t the first time the team has raised the bar for this type of simulation. Before working with Frontier, they achieved similar success on the 200-petaflop Summit supercomputer, Frontier’s predecessor, also located at OLCF. In addition to being 1,000 times larger and faster, exascale simulations also predict how chemical reactions occur over time, something they previously lacked the computing power to do.
Average simulation run times ranged from minutes to hours. The new algorithm allowed the team to simulate atomic interactions in time steps (essentially snapshots of the system) with significantly improved latency compared to previous methods. For example, time steps for protein systems with thousands of electrons can now be completed in just 1 to 5 seconds.
Time steps are crucial to understanding how some processes evolve naturally over time. This resolution will help researchers better understand how drug molecules can bind to disease-causing proteins, how catalytic reactions can be used to recycle plastics, how to better produce biofuels, and how to design biomedical materials.
“I can’t describe how difficult it was to get to that scale, both from a molecular and computational perspective,” Barca said. “But it would have been pointless to do those calculations using anything less than double precision. So it was all or nothing.”
“Two of the biggest challenges in this achievement were designing an algorithm that could push Frontier to its limits and ensuring that the algorithm could run on a system with over 37,000 GPUs,” Bykov added. “The solution meant using more computing components, and every time you add more, it also means there’s a greater chance that one of those components will fail at some point. The fact that we used the entire system is incredible, and it was remarkably efficient.”
On a personal note, Barca added, after he and his team worked around the clock for weeks in preparation, the calculation that broke the double-precision exaflop barrier for scientific applications came on the last day of their Frontier allocation with the very last calculation of the simulation. It was recorded at 3 a.m. — shortly after Barca fell asleep for the first time in a long time.
The team is now working on preparing their results for scientific publication. They then plan to use the high-precision simulations to train machine learning models and integrate artificial intelligence into the algorithm. These improvements will provide an entirely new level of sophistication and efficiency for solving even larger and more complex problems.
This research was funded by the DOE Office of Science Advanced Scientific Computing Research Program. OLCF is a user service of the DOE Office of Science.
UT-Battelle manages ORNL for the DOE Office of Science, the largest supporter of basic research in the physical sciences in the United States. The DOE Office of Science is working to address some of the most pressing challenges of our time. For more information, visit energy.gov/science.