13-Jun-2022 - Westfälische Wilhelms-Universität Münster

More Data in Chemistry

Clearer reporting of negative experimental results would improve reaction planning in chemistry

Databases containing huge amounts of experimental data are available to researchers across a wide variety of chemical disciplines. However, a team of researchers have discovered that the available data is unsuccessful in predicting the yields of new syntheses using artificial intelligence (AI) and machine learning. Their study published in the journal Angewandte Chemie suggests that this is in large part down to the tendency of scientists not to report failed experiments.

Although AI-based models have been particularly successful in predicting molecular structures and material properties, they return rather inaccurate predictions for information relating to product yields in synthesis, as Frank Glorius and his team of researchers at Westfälische Wilhelms-Universität Münster, Germany, have discovered.

The researchers attribute this failure to the data used to train AI systems. “Interestingly, the prediction of reaction yields (reactivity) is much more challenging than the prediction of molecular properties. Reactants, reagents, quantities, conditions, the experimental execution—all determine the yield, and thus, the problem of yield prediction becomes very data-intensive,” explains Glorius. So, despite the huge amounts of available literature and results, the researchers came to realize that the data is not fit for accurate predictions of the expected yield.

The problem is not only down to a lack of experiments. In contrast, the team identified three possible causes for biased data. Firstly, the results of chemical syntheses may be flawed due to experimental error. Secondly, when chemists are planning their experiments, they may, either consciously or unconsciously, introduce bias based on personal experience and reliance on well-established methods. Finally, since only reactions with a positive outcome are believed to contribute to progress, failed reactions are reported less frequently.

To find out which of these three factors had the greatest influence, Glorius and the team purposely altered the datasets for four different, commonly used (and therefore data-rich) organic reactions. They artificially increased experimental error, reduced the size of the data sampling sets, or removed negative results from the data. Their investigations showed that the experimental error had the smallest influence on the model, while the contribution made by the lack of negative results was fundamental.

The group hopes that these findings will encourage scientists to always report failed experiments as well as their successes. This would improve data availability for training AI, ultimately helping to speed up planning and making experimentation more efficient. Glorius adds: “machine learning in (molecular) chemistry will increase efficiency dramatically and fewer reactions will have to be run to achieve a certain goal, for example, an optimization. This will empower chemists and will help them to make chemical processes—and the world—more sustainable.”

Facts, background information, dossiers

  • artificial intelligence
  • Machine Learning

More about WWU Münster

  • News

    Researchers show that chiral oxide catalysts align electron spin

    Controlling the spin of electrons opens up future scenarios for applications in spin-based electronics (spintronics), for example in data processing. It also presents new opportunities for controlling the selectivity and efficiency of chemical reactions. Researchers recently presented first ... more

    On the way to cell-type materials

    Molecular machines control a sizeable number of fundamental processes in nature. Embedded in a cellular environment, these processes play a central role in the intracellular and intercellular transportation of molecules, as well as in muscle contraction in humans and animals. In order for t ... more

    Biochemists use new tool to control mRNA by means of light

    DNA (deoxyribonucleic acid) is a long chain of molecules composed of many individual components, and it forms the basis of life on Earth. The function of DNA is to store all genetic information. The translation of this genetic information into proteins – which an organism needs to function, ... more

  • q&more articles

    Dandelions as a new source of natural rubber

    More than 12,500 plants produce latex, a colorless to white milky sap that contains, among other things, natural rubber. However, this industrially indispensable raw material is found in only three plants in a quality required to produce high-performance rubber products such as car tires. more

  • Authors

    Prof. Dr. Dirk Prüfer

    Dirk Prüfer, born in 1963, studied biology at the University of Cologne and gained his doctorate at the Max Planck Institute for Plant Breeding Research. In 2004, he qualified as a professor at Justus Liebig University, Gießen, Germany. Since 2004 he is Professor of Molecular Plant Biotechn ... more

    Prof. Dr. Joachim Jose

    born 1961,studied biology at the University of Saarbrücken, where he was awarded a doctorate. He gained his professorship at the Institute of Pharmaceutical and Medicinal Chemistry of the University of the Saarland. From 2004 to 2011, he was professor for bioanalytics (C3) at the Heinrich-H ... more

More about Angewandte Chemie

  • News

    Nanocrystals Store Light Energy and Drive Chemical Reactions

    Chemistry is increasingly making use of the trick plants can do with photosynthesis: driving chemical reactions that run poorly or do not occur spontaneously at all with light energy. This requires suitable photocatalysts that capture light energy and make it available for the reaction. In ... more

    Economical PEF Production

    One possible replacement for drink containers made from PET is polyethylene furandicarboxylate (PEF), made from renewable resources. However, the production of the raw material for PEF from biomass is still rather inefficient. A new titanium-based photocatalyst could be about to change this ... more

    Cage with Caps: Selective confinement of rare-earth-metal hydrates in host molecules

    Rare-earth metals are indispensable for many technical products, from smartphones, laptops, batteries, electromotors, and wind turbines, to catalysts. In the journal Angewandte Chemie, a Japanese team has now introduced a molecular “cage” with “caps” that can be used to selectively “confine ... more

q&more – the networking platform for quality excellence in lab and process

The q&more concept is to increase the visibility of recent research and innovative solutions, and support the exchange of knowledge. In the broad spectrum of subjects covered, the focus is on achieving maximum quality in highly innovative sectors. As a modern knowledge platform, q&more offers market participants one-of-a-kind networking opportunities. Cutting-edge research is presented by authors of international repute. Attractively presented in a high-quality context, and published in German and English, the original articles introduce new concepts and highlight unconventional solution strategies.

> more about q&more

q&more is supported by: