Researchers have developed a platform that combines automated experiments with artificial intelligence to predict how chemicals will react with each other, which could speed up the design process for new drugs.
Predicting how molecules will react is crucial to discovering and making new pharmaceuticals, but historically this has been a process of trial and error, and reactions often fail. To predict how molecules will react, chemists typically simulate electrons and atoms in simplified models, a process that is computationally expensive and often inaccurate.
Now, researchers from the University of Cambridge have developed a data-driven approach, inspired by genomics, where automated experiments are combined with machine learning to understand chemical reactivity, greatly speeding up the process. They called their approach, which was validated on a dataset of more than 39,000 drug-related reactions, the chemical “reactome.”
Their results are reported in the journal Chemistry of Natureis a product of collaboration between Cambridge and Pfizer.
“The reactome could change the way we think about organic chemistry,” said Dr Emma King-Smith of Cambridge’s Cavendish Laboratory, first author of the paper. “A deeper understanding of chemistry could allow us to make pharmaceuticals and so many other useful products much faster. But more fundamentally, the understanding we hope to create will benefit anyone who works with molecules.”
The reactome approach picks out relevant associations between reactants, reactants and reaction yield from the data and highlights gaps in the data itself. Data are generated from very fast, or high-throughput, automated experiments.
“High-throughput chemistry has changed the game, but we thought there was a way to reveal a deeper understanding of chemical reactions than can be seen from the initial results of a high-throughput experiment,” said King-Smith.
“Our approach reveals hidden relationships between response components and outcomes,” said Dr. Alpha Lee, who led the research. “The dataset we trained the model on is huge — it will help bring the process of chemical discovery from trial and error into the era of big data.”
In a related paper, published in Nature communicationsthe team developed a machine learning approach that allows chemists to introduce precise transformations into predefined regions of a molecule, enabling faster drug design.
The approach allows chemists to modify complex molecules — such as a last-minute design change — without having to make them from scratch. Building a molecule in the lab is usually a multi-step process, like building a house. If chemists want to vary the core of a molecule, the conventional way is to rebuild the molecule, like tearing down the house and rebuilding from scratch. However, key variations are important for medicine design.
A class of reactions, known as late-stage functionalization reactions, attempt to introduce chemical transformations directly into the nucleus, avoiding the need to start from scratch. However, it is difficult to make late-stage functionalization selective and controllable — usually there are many regions of the molecules that can react and the outcome is difficult to predict.
“Late-stage functionalizations can produce unpredictable results, and current modeling methods, including our own expert intuition, are not perfect,” said King-Smith. “A more predictive model would give us the opportunity for better control.”
The researchers developed a machine learning model that predicts where a molecule would react and how the location of the reaction varies as a function of different reaction conditions. This enables chemists to find ways to precisely tailor the core of a molecule.
“We trained the model on a large body of spectroscopic data — effectively teaching the model’s general chemistry — before optimizing it to predict these complex transformations,” said King-Smith. This approach allowed the team to overcome the limitation of low data: there are relatively few late-stage functionalization reactions reported in the scientific literature. The team experimentally validated the model on a diverse set of drug-like molecules and was able to accurately predict reactivity sites under different conditions.
“The application of machine learning in chemistry is often limited by the problem that the amount of data is small compared to the vastness of chemical space,” Lee said. “Our approach — designing models that learn from large data sets that are similar but not identical to the problem we’re trying to solve — solves this fundamental low-data challenge and could unlock advances beyond functionalization of the last stage.”
The research was supported in part by Pfizer and the Royal Society.