22-Feb-2021 - Technische Universität München

Artificial intelligence deciphers genetic instructions

Deep learning algorithms reveal the rules of gene regulation

With the help of artificial intelligence (AI) a German-American team of scientists deciphered some of the more elusive instructions encoded in DNA. Their neural network trained on high-resolution maps of protein-DNA interactions uncovers subtle DNA sequence patterns throughout the genome, thus providing a deeper understanding of how these sequences are organized to regulate genes.

Artificial intelligence algorithms are extremely powerful at fitting massive and complex datasets. But their interpretation, rationalizing how the machine performs specific predictions when presented a given input, is notoriously hard. This black box behavior hampers wide acceptation of AI in medical diagnostics, where justifications matter, and restrain their utility in natural sciences where understanding mechanisms is the goal.

Now, an interdisciplinary team of biologists and computational researchers from the Technical University of Munich, the Stowers Institute for Medical Research and the Stanford University has shown that applying neural networks, such as those used for facial recognition, together with newly developed model interpretation techniques can be used to decipher complex instructions encoded in DNA. 

One of the big unsolved problems in biology is the genome’s second code, its regulatory code. The DNA bases encode not only the instructions for how to build proteins, but also when and where to make these proteins in an organism. 

The regulatory code is read by proteins called transcription factors that bind to short stretches of DNA called motifs. However, how particular combinations and arrangements of motifs specify regulatory activity is an extremely complex problem that has been hard to pin down.

DNA binding experiments and computational modeling going hand in hand

The key was to perform transcription factor-DNA binding experiments and computational modeling at the highest possible resolution, down to the level of individual DNA bases. The increased resolution allowed the team not only to train highly accurate neural network models, but also to extract the key elements and patterns from the models, including transcription factor binding motifs and the combinatorial rules by which they function together as code. 

“Neural networks are black boxes, but they can be interrogated digitally. So, with a large number of virtual experiments we figured out the rules the neural net learned” says first author Dr. Žiga Avsec, member of the group of Julien Gagneur, professor of computational molecular medicine at the Technical University in Munich. Together with Anshul Kundaje, professor at the Stanford University, he created the first version of the model when he visited Stanford as a guest scientist. 

Applied to master regulators of stem cell differentiation and confirmed experimentally by CRISPR genomic edition, the approach revealed complex rules involving a precise positioning along the DNA double helix and specific ordering of events.

“This was extremely satisfying,” says project leader Julia Zeitlinger, investigator at the Stowers Institute and professor at the University of Kansas Medical Center, “as the results fit beautifully with existing experimental results, and also revealed novel insights that surprised us.” 

A pattern becomes visible: how Nanog binds to DNA

For example, the researchers found that a well-studied transcription factor called Nanog binds cooperatively to DNA when multiples of its motif are present in a periodic fashion such that they appear on the same side of the spiraling DNA helix. 

“There has been a long trail of experimental evidence that such motif periodicity sometimes exists in the regulatory code,” Zeitlinger says. However, the exact circumstances were elusive, and Nanog had not been a suspect. Discovering that Nanog has such a pattern, and seeing additional details of its interactions, was surprising because we did not specifically search for this pattern.”

“This is the key advantage of using neural networks for this task. A classic computational model is built on hand-crafted, rigid rules to ensure that it can be interpreted”, says Avsec. “However, biology is extremely rich and complicated. By abandoning the need to interpret individual parameters, we can train much more flexible and nuanced models that capture any biological phenomena, including those yet unknown.“

A powerful bottom-up approach

This neural net model – named BPNet for Base Pair Network – is a powerful bottom-up approach similar to facial recognition in images, where a neural network first detects edges in the pixels, then learns how edges form facial elements like the eye, nose or mouth, and finally how facial elements together form a face. 

Instead of learning from pixels, BPNet learns from the raw DNA sequence and learns to detect sequence motifs and eventually the higher-order rules by which the elements predict the base-resolution binding data. 

Both the Zeitlinger Lab and the Kundaje Lab are already using BPNet to reliably identify binding motifs for other cell types, relate motifs to biophysical parameters and learn other structural features in the genome such as those associated with DNA packaging. To enable other scientists to use BPNet and adapt it for their own needs, the researchers have made the entire software framework available with documentation and tutorials.

“This work is a technological tour-de-force,” says Julien Gagneur. ”It combines deep learning modeling of genome-wide assays down to single-nucleotide resolutions, together with advanced explainable AI techniques allowing to interpret what "the black box” has learned. The methodology will help biologist studying the full regulatory grammar.”

Facts, background information, dossiers

  • artificial intelligence
  • gene regulation
  • deep learning
  • transcription factors

More about TUM

  • News

    First electric nanomotor made from DNA material

    A research team led by the Technical University of Munich (TUM) has succeeded for the first time in producing a molecular electric motor using the DNA origami method. The tiny machine made of genetic material self-assembles and converts electrical energy into kinetic energy. The new nanomot ... more

    Mass spectrometry-based draft of the mouse proteome

    Proteins control and organize almost every aspect of life. The totality of all proteins in a living organism, a tissue or a cell is called the proteome. Using mass spectrometry, researchers at the Technical University of Munich (TUM) characterize the proteome, or protein complement of the g ... more

    Mini-fuel cell generates electricity using the body's sugar

    Glucose is the most important energy source in the human body. Scientists at the Technical University of Munich (TUM) and the Massachusetts Institute of Technology (MIT) now want to use the body's sugar as an energy source for medicinal implants. They have developed a glucose fuel cell whic ... more

  • q&more articles

    Vital wheat gluten, a protein with potential

    For almost every one of the 17 goals that the 2030 Agenda for Sustainable Development sets out, food and its value chain plays an important role [1]. With this agenda, the United Nations has created a global framework for action that addresses all social players. more

    Biobased raw material flows of the future

    Anthropogenic climate change and the rising world population, in combination with increasing urbanization, poses global challenges to our societies that can only be solved by technological advancement. The direct biotechnological use of greenhouse gases, including residual biomass flows fro ... more

    Taste and aroma boost in the mouth

    The food trend towards healthy snacks is continuing. Snacks made from freeze-dried fruit meet consumer expectations of modern and high-quality food. However, freeze drying of whole fruits requires long drying times and substantially reduces sensorial quality, which is unappealing to consumers. more

  • Authors

    Prof. Dr. Thomas Becker

    Thomas Becker, born in 1965, studied Technology and Biotechnology of Food at the Technical University of Munich (TUM). He then worked as a project engineer at the company Geo-Konzept from 1992 to 1993. In 1995, he received his PhD from the TUM. From 1996 to 2004 he was Deputy Head of Depart ... more

    Monika C. Wehrli

    Monika Wehrli, born in 1994, graduated from the ETH Zurich with a major in food process engineering. Since 2018 she has been working as a researcher at the Technical University of Munich, Germany, at the Chair of Brewing and Beverage Technology, where she pursues her doctorate in the field ... more

    Prof. Dr. Thomas Brück

    Thomas Brück, born in 1972, obtained his B.Sc. in chemistry, biochemistry and management science from Keele University, Stoke on Trent. Additionally, he holds an M.Sc. in molecular medicine from the same institution. In 2002, Thomas obtained his Ph.D. in Protein Biochemistry from Imperial C ... more

q&more – the networking platform for quality excellence in lab and process

The q&more concept is to increase the visibility of recent research and innovative solutions, and support the exchange of knowledge. In the broad spectrum of subjects covered, the focus is on achieving maximum quality in highly innovative sectors. As a modern knowledge platform, q&more offers market participants one-of-a-kind networking opportunities. Cutting-edge research is presented by authors of international repute. Attractively presented in a high-quality context, and published in German and English, the original articles introduce new concepts and highlight unconventional solution strategies.

> more about q&more

q&more is supported by: