02-Dec-2021 - Max-Planck-Institut für molekulare Genetik

Exploring the current paradigm of gene regulation

How much tissue-specific information is contained in enhancer sequences?

How do cells know when to activate a certain gene? This information is encoded in the sequence of the DNA, but our understanding of this code is incomplete. Researchers now tested how much information can be extracted from sequence data to predict which gene is active in which tissue. 

A good storyteller knows exactly which anecdotes will bring his stories’ characters to life. By telling the right story at the right time, our genome even manages to give rise to hundreds of different cell types with characteristic life stories breathing an individual identity into every cell.

DNA snippets scattered across the genome harbor the code that directs the script of a cell’s life, successively switching genes on and off. Sequences called enhancers play an outstanding role in this process. They attract transcription factor proteins that start the expression of genes, thereby “enhancing” their activity. In some cases, they are located far away from the gene they activate.

Researchers Philipp Benner and Martin Vingron from the Max Planck Institute for Molecular Genetics (MPIMG) set out to decipher the instructions of the activation patterns in distinct cell types and embryonic tissues of the mouse.

With a series of statistical and bioinformatic analyses, the scientists identified several hundreds of tissue-specific DNA subsequences or “codewords” in enhancers that guide transcription factors, not only confirming sequences already known from other studies, but also identifying many new ones. The results have been published in several articles in NAR Genomics and Bioinformatics and the Journal of Computational Biology.

Training a model

“Today, researchers assume that all the information is in the DNA sequence, including information for specific cell types, tissues, and organs,” says Martin Vingron, Director at the MPIMG. According to the prevailing theory, transcription factor proteins recognize “codewords” in enhancers that are specific for a certain cell type, allowing the genome to tell a cell’s story by jumping to the right chapters. “We wanted to see how far this approach would take us and test its limits,” says Vingron.

The researchers developed a program that is able to identify DNA sequences that are recognized by the cell in order to activate genes in a tissue-specific way. They achieved this by training a statistical model with existing experimental data, telling it which enhancer is active in which tissue. Namely, they used sequencing data from eight tissues of the embryonic mouse like heart, lung, brain, or liver.

Learning to predict

By comparing sequence data between the tissues, the program learned to recognize sequence patterns in enhancers that are characteristic for certain tissues.

This told the researchers how much cell type-specific regulatory information is actually contained in the DNA sequence of enhancers, explains Philipp Benner, who is a postdoctoral researcher in Vingron’s lab: “The better our algorithm can classify any given enhancer, the more information it contains about the tissue or cell types that it is responsible for.”

The statistical classifiers can also identify DNA subsequences that might underlie cell type-specific gene activation. In fact, Benner found several hundred new codewords in addition to patterns that have been identified in other studies.

“Overall, we established a strong and, most importantly, an interpretable model,” says Benner.

Reaching the limits

“With our advanced methods, the predictions are promising but far from perfect”, says Vingron. “Our results indicate that we might really have only a fragmentary understanding of the actual cell type-specific regulatory code.”

It might be possible that not all the required information is contained in the DNA sequence of enhancers but is distributed elsewhere in the genome. Some cross-references in the storybook of the genome might still hide in other regulatory sequences, like promoter regions that are in close proximity to the gene itself.

Facts, background information, dossiers

  • gene regulation
  • genes
  • transcription factors

More about MPI für molekulare Genetik

  • News

    How to find marker genes in cell clusters

    The thousands of cells in a biological sample are all different and can be analyzed individually, cell by cell. Based on their gene activity, they can be sorted into clusters. But which genes are particularly characteristic of a given cluster, i.e. what are its “marker genes”? A new statist ... more

    Cell-culture breakthrough: Advanced “mini brains” in the dish

    “Outer Radial Glia” (oRG) cells are nervous system stem cells that are instrumental for the development of the human cortex and have been challenging to produce in the lab. Now, a team of Max Planck researchers from Berlin succeeded in generating brain organoids that are enriched with these ... more

    Unmuting the genome

    Hereditary diseases as well as cancers and cardiovascular diseases may be associated with a phenomenon known as genomic imprinting, in which only the maternally or paternally inherited gene is active. An international research team involving scientists at the Technical University of Munich ... more

More about Max-Planck-Gesellschaft

  • News

    Neuroscientists illuminate how brain cells deep in the cortex operate in freely moving mice

    How can we see what neurons deep in the cortex are doing during behavior? Researchers at the Max Planck Institute for the Neurobiology of Behavior - caesar (MPINB) have developed a miniature microscope small enough to be carried on the head of a freely moving mouse and capable of measuring ... more

    Measuring Organ Development

    Researchers from Dresden and Vienna reveal link between connectivity of three-dimensional structures in tissues and the emergence of their architecture to help scientists engineer self-organising tissues that mimic human organs. Organs in the human body have complex networks of fluid-filled ... more

    Back to the Future of Photosynthesis

    The central biocatalyst in Photosynthesis, Rubisco, is the most abundant enzyme on earth. But how did Rubisco evolve, and how did it adapt to environmental changes during Earth’s history? By reconstructing billion-year-old enzymes, a team of Max Planck Researchers has deciphered one of the ... more

q&more – the networking platform for quality excellence in lab and process

The q&more concept is to increase the visibility of recent research and innovative solutions, and support the exchange of knowledge. In the broad spectrum of subjects covered, the focus is on achieving maximum quality in highly innovative sectors. As a modern knowledge platform, q&more offers market participants one-of-a-kind networking opportunities. Cutting-edge research is presented by authors of international repute. Attractively presented in a high-quality context, and published in German and English, the original articles introduce new concepts and highlight unconventional solution strategies.

> more about q&more

q&more is supported by: