13-Jun-2022 - Westfälische Wilhelms-Universität Münster

More Data in Chemistry

Clearer reporting of negative experimental results would improve reaction planning in chemistry

Databases containing huge amounts of experimental data are available to researchers across a wide variety of chemical disciplines. However, a team of researchers have discovered that the available data is unsuccessful in predicting the yields of new syntheses using artificial intelligence (AI) and machine learning. Their study published in the journal Angewandte Chemie suggests that this is in large part down to the tendency of scientists not to report failed experiments.

Although AI-based models have been particularly successful in predicting molecular structures and material properties, they return rather inaccurate predictions for information relating to product yields in synthesis, as Frank Glorius and his team of researchers at Westfälische Wilhelms-Universität Münster, Germany, have discovered.

The researchers attribute this failure to the data used to train AI systems. “Interestingly, the prediction of reaction yields (reactivity) is much more challenging than the prediction of molecular properties. Reactants, reagents, quantities, conditions, the experimental execution—all determine the yield, and thus, the problem of yield prediction becomes very data-intensive,” explains Glorius. So, despite the huge amounts of available literature and results, the researchers came to realize that the data is not fit for accurate predictions of the expected yield.

The problem is not only down to a lack of experiments. In contrast, the team identified three possible causes for biased data. Firstly, the results of chemical syntheses may be flawed due to experimental error. Secondly, when chemists are planning their experiments, they may, either consciously or unconsciously, introduce bias based on personal experience and reliance on well-established methods. Finally, since only reactions with a positive outcome are believed to contribute to progress, failed reactions are reported less frequently.

To find out which of these three factors had the greatest influence, Glorius and the team purposely altered the datasets for four different, commonly used (and therefore data-rich) organic reactions. They artificially increased experimental error, reduced the size of the data sampling sets, or removed negative results from the data. Their investigations showed that the experimental error had the smallest influence on the model, while the contribution made by the lack of negative results was fundamental.

The group hopes that these findings will encourage scientists to always report failed experiments as well as their successes. This would improve data availability for training AI, ultimately helping to speed up planning and making experimentation more efficient. Glorius adds: “machine learning in (molecular) chemistry will increase efficiency dramatically and fewer reactions will have to be run to achieve a certain goal, for example, an optimization. This will empower chemists and will help them to make chemical processes—and the world—more sustainable.”

Facts, background information, dossiers

  • artificial intelligence
  • machine-learning

More about WWU Münster

  • News

    Biochemists use new tool to control mRNA by means of light

    DNA (deoxyribonucleic acid) is a long chain of molecules composed of many individual components, and it forms the basis of life on Earth. The function of DNA is to store all genetic information. The translation of this genetic information into proteins – which an organism needs to function, ... more

    Help for stressed-out cells in a crisis

    According to a team of researchers at the University of Münster, mitochondria provide unexpected help for cells in a crisis by respiring away harmful substances. A current study produced by the Institute of Biology and Biotechnology of Plants (IBBP) shows three things: that this mechanism c ... more

    "Hand in hand in hand": three catalysts solve chemical problem

    For organic synthesis, i.e. for the production of carbon-based compounds, it is important to develop synthesis processes by which the desired product can be obtained in good yield. At the same time, the processes should be sustainable: for example, they should use environmentally friendly a ... more

  • q&more articles

    Dandelions as a new source of natural rubber

    More than 12,500 plants produce latex, a colorless to white milky sap that contains, among other things, natural rubber. However, this industrially indispensable raw material is found in only three plants in a quality required to produce high-performance rubber products such as car tires. more

  • Authors

    Prof. Dr. Dirk Prüfer

    Dirk Prüfer, born in 1963, studied biology at the University of Cologne and gained his doctorate at the Max Planck Institute for Plant Breeding Research. In 2004, he qualified as a professor at Justus Liebig University, Gießen, Germany. Since 2004 he is Professor of Molecular Plant Biotechn ... more

    Prof. Dr. Joachim Jose

    born 1961,studied biology at the University of Saarbrücken, where he was awarded a doctorate. He gained his professorship at the Institute of Pharmaceutical and Medicinal Chemistry of the University of the Saarland. From 2004 to 2011, he was professor for bioanalytics (C3) at the Heinrich-H ... more

More about Angewandte Chemie

  • News

    New Tools Against Hospital Infections?

    Antibiotic-resistant hospital pathogens are not to be underestimated as a health risk. A research team has now introduced a new approach for treating multiple-drug resistant Staphylococcus in the journal Angewandte Chemie. It is based on a synthetic peptide that reduces the virulence of the ... more

    Firefly Luminescence Reveals Pesticides

    A luminescence reaction modeled on fireflies can detect contamination with organophosphates with high sensitivity, ease, and low cost. At the center of this technology is a new enzymatic method for the synthesis of analogues of luciferin, the substance that makes fireflies glow. As reported ... more

    Strong and elastic, yet degradable: protein-based bioplastics

    More than eight million tons of plastic end up in the oceans every year—a serious danger for the environment and health. Biodegradable bioplastics could provide an alternative. In the journal Angewandte Chemie, a research team has now introduced a new method for the production of protein-ba ... more

q&more – the networking platform for quality excellence in lab and process

The q&more concept is to increase the visibility of recent research and innovative solutions, and support the exchange of knowledge. In the broad spectrum of subjects covered, the focus is on achieving maximum quality in highly innovative sectors. As a modern knowledge platform, q&more offers market participants one-of-a-kind networking opportunities. Cutting-edge research is presented by authors of international repute. Attractively presented in a high-quality context, and published in German and English, the original articles introduce new concepts and highlight unconventional solution strategies.

> more about q&more

q&more is supported by: