Skip to content

Rapid advancements in computational chemistry methods expedite the forecast of molecular and material structures.

Harvard scientists unveil innovative computational chemistry method to streamline extensive molecular examination, crucial for discovering fresh substances with appealing attributes, by hitting the mark on chemical precision.

Researchers at MIT have devised a novel computational chemistry method that aims to streamline...
Researchers at MIT have devised a novel computational chemistry method that aims to streamline high-throughput molecular screening. This critical process, where precision is key in identifying new molecules and materials with advantageous properties, could receive a boost from this innovative approach.

Rapid advancements in computational chemistry methods expedite the forecast of molecular and material structures.

In ancient times, creating materials was a grueling task. Researchers for centuries attempted to manufacture gold through a concoction of lead, mercury, and sulfur, blindly mixing ingredients in the hope of achieving the perfect combination. Even renowned scientists like Tycho Brahe, Robert Boyle, and Isaac Newton dabbled in this futile alchemy.

Thankfully, materials science has advanced drastically since then. For the past 150 years, researchers have benefited from the periodic table of elements, which provides them with the knowledge that different elements possess distinct properties and can't arbitrarily transform into one another. Moreover, in the past decade or so, machine learning tools have significantly boosted our capacity to determine the structure and physical properties of various molecules and substances. A study led by Ju Li, the Tokyo Electric Power Company Professor of Nuclear Engineering at MIT, promises to take materials design to the next level. The findings of their research are reported in a December 2024 issue of Nature Computational Science.

Currently, most machine-learning models employed to characterize molecular systems rely on density functional theory (DFT), a quantum mechanical method that calculates the total energy of a molecule or crystal by examining the electron density distribution - basically, the average number of electrons found in a given volume around the molecule. Although the approach has been highly successful, it has its setbacks: "First, the accuracy isn't always exceptional. And, second, it only provides information about the lowest total energy of the molecular system," Li comments.

These limitations have encouraged Li's team to rely on a different computational chemistry technique, known as coupled-cluster theory, or CCSD(T). Li refers to CCSD(T) as "the gold standard of quantum chemistry." The precision of CCSD(T) calculations is significantly greater than what you get from DFT calculations, and they can be as reliable as experimental results. However, performing these calculations on a computer is slow, Li explains, "and the scaling is bad: If you double the number of electrons in the system, the computations become 100 times costlier." For that reason, CCSD(T) calculations have generally been confined to molecules with about 10 atoms or fewer. Anything beyond that would simply take too long.

That's where machine learning comes into play. Li and his colleagues have designed a neural network with a novel architecture to speed up CCSD(T) calculations using approximation techniques. Their neural network model can provide information about a molecule that goes beyond just its energy: "In previous work, people have used multiple different models to assess different properties," says MIT PhD student Hao Tang. "Here we use just one model to evaluate all of these properties, which is why we call it a 'multi-task' approach."

This "Multi-task Electronic Hamiltonian network," or MEHnet, sheds light on several electronic properties, such as the dipole and quadrupole moments, electronic polarizability, and the optical excitation gap - the energy needed to move an electron from the ground state to the lowest excited state. "The excitation gap affects the optical properties of materials," Tang explains, "because it determines the frequency of light that can be absorbed by a molecule." Another advantage of their CCSD-trained model is that it can reveal properties of not only ground states, but also excited states. The model can also predict the infrared absorption spectrum of a molecule related to its vibrational properties.

The strength of their approach owes a lot to the network architecture. Drawing on the work of MIT Assistant Professor Tess Smidt, the team is using an E(3)-equivariant graph neural network, where nodes represent atoms, and the edges that connect the nodes represent bonds between atoms. They also employ customized algorithms that incorporate physics principles directly into their model.

The researchers tested their model on known hydrocarbon molecules, outperforming DFT counterparts and closely matching experimental results from published literature. Materials discovery specialist Qiang Zhu from the University of North Carolina at Charlotte is impressed by the progress made so far: "Their method enables effective training with a small dataset, while achieving superior accuracy and computational efficiency compared to existing models," he says. "This is exciting work that illustrates the powerful synergy between computational chemistry and deep learning, offering fresh ideas for developing more accurate and scalable electronic structure methods."

The MIT-based group began with analyzing small, nonmetallic elements like hydrogen, carbon, nitrogen, oxygen, and fluorine, from which organic compounds can be created, and have since moved on to examining heavier elements such as silicon, phosphorus, sulfur, chlorine, and even platinum. After being trained on small molecules, the model can be generalized to handle larger and larger molecules. "Previously, most calculations were limited to analyzing hundreds of atoms with DFT and just tens of atoms with CCSD(T) calculations," Li says. "Now we're talking about dealing with thousands of atoms and, eventually, perhaps tens of thousands."

While they are currently evaluating known molecules, the model can be used to characterize molecules that haven't been discovered yet and to predict the properties of hypothetical materials composed of different molecules. "The idea is to use our theoretical tools to pick out promising candidates that satisfy a particular set of criteria before suggesting them to an experimentalist to verify," Tang says.

Looking ahead, Zhu is optimistic about the potential applications. "This approach holds the potential for high-throughput molecular screening," he says. "That's a task where achieving chemical accuracy can be essential for identifying novel molecules and materials with desirable properties." Once they demonstrate the ability to analyze large molecules with perhaps tens of thousands of atoms, Li says, "we should be able to invent new polymers or materials" that might be used in drug design or in semiconductor devices. The examination of heavier transition metal elements could lead to the development of new materials for batteries - an area of urgent need.

The future, as Li sees it, is wide open: "It's no longer just about one area. Our ambition, ultimately, is to cover the whole periodic table with CCSD(T)-level accuracy, but at lower computational cost than DFT. This should enable us to solve a wide range of problems in chemistry, biology, and materials science. It's hard to know, at present, just how wide that range might be."

This work was supported by the Honda Research Institute. Hao Tang acknowledges support from the Mathworks Engineering Fellowship. The calculations in this work were performed, in part, on the Matlantis high-speed universal atomistic simulator, the Texas Advanced Computing Center, the MIT SuperCloud, and the National Energy Research Scientific Computing.

  1. The researchers' findings are reported in the December 2024 issue of Nature Computational Science.
  2. The study was led by Ju Li, the Tokyo Electric Power Company Professor of Nuclear Engineering at MIT.
  3. The accuracy of DFT calculations isn't always exceptional, according to Li.
  4. Secondly, DFT only provides information about the lowest total energy of the molecular system, Li comments.
  5. Their neural network model, the Multi-task Electronic Hamiltonian network (MEHnet), provides information about a molecule beyond just its energy.
  6. The excitation gap affects the optical properties of materials because it determines the frequency of light that can be absorbed by a molecule, Tang explains.
  7. The strength of their approach owes a lot to the network architecture, which uses an E(3)-equivariant graph neural network.
  8. The study's results show the method's potential for high-throughput molecular screening, Zhu says.
  9. Once they demonstrate the ability to analyze large molecules with perhaps tens of thousands of atoms, Li says, they should be able to invent new polymers or materials.
  10. The examination of heavier transition metal elements could lead to the development of new materials for batteries, an area of urgent need.

Read also:

    Latest

    Exploration of resilient bacterial species in NASA cleanrooms uncovered: Research study titled...

    NASA's Pristine Work Space Contaminated, Research Reveals

    Research findings unveil 26 previously undiscovered bacterial species from NASA cleanrooms, demonstrating exceptional resistance to harsh conditions, as covered in the 2025 study titled "Genomic insights into extremotolerant bacteria from NASA spacecraft assembly cleanrooms." The research honed...