q&more
My watch list
my.chemie.de  
Login  

News

New software processes huge amounts of single-cell data

Comprehensive analysis of large gene-expression datasets

Helmholtz Zentrum München

Visualization of gene expression patterns of murine brain cells generated with Scanpy.

13-Feb-2018: Scientists from the Helmholtz Zentrum München have developed a program that is able to help manage enormous datasets. The software, named Scanpy, is a candidate for analyzing the Human Cell Atlas.

“It’s about analyzing gene-expression data of a large number of individual cells,” explains lead author Alex Wolf of the Institute of Computational Biology (ICB) at Helmholtz Zentrum München. He developed Scanpy together with his colleague Philipp Angerer in the Machine Learning Group of Prof. Dr. Dr. Fabian Theis. In addition to his position at Helmholtz Zentrum, Theis is also a professor of mathematical modelling of biological systems at the Technical University of Munich. “New technical advances generate several orders of magnitude more data with a correspondingly greater information content,” Theis says. “However, the historically evolved software infrastructure for gene-expression analysis simply wasn’t designed to cope with the new challenges. New analytic methods are therefore needed.”

The race for the Human Cell Atlas

According to Theis, a major international research project could also benefit from the software. A team of international scientists is compiling a reference database, called the Human Cell Atlas, which holds data on the gene activity of all human cell types. “For this project, and in a growing number of other projects in which databases are combined, it is important to have scalable software,” says Theis. It is therefore no surprise that Scanpy is currently a candidate for helping to analyze the Human Cell Atlas.

“The publication of Scanpy marks the first software that allows comprehensive analysis of large gene-expression datasets with a broad range of machine-learning and statistical methods,” explains Wolf, describing the achievement. “The software is already being used by a number of groups around the world, notably at the Broad Institute of Harvard University and the Massachusetts Institute of Technology, MIT.”

Technologically, the application is a trailblazing development: Whereas biostatistics programs are traditionally written in the programming language R, Scanpy is based on the Python language, the dominant language in the machine learning community. Another new feature is that graph-based algorithms lie at the heart of Scanpy. Unlike the usual approach of regarding cells as points in a coordinate system within gene-expression space, the algorithms use a graph-like coordinate system. Instead of characterizing a single cell by the expression value for thousands of genes, the system simply characterizes cells by identifying their closest neighbors – very much like the connections in social networks. In fact, to identify cell types, Scanpy uses the same algorithms as Facebook does for identifying communities.

Original publication:
Wolf, A. et al.; "Scanpy: large-scale single-cell gene expression data analysis"; Genome Biology; 2018

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt GmbH

Request information now

Recommend news PDF version / Print Add news to watchlist

Share on

Facts, background information, dossiers

More about Helmholtz Zentrum München

  • News

    New insight into the maturation of miRNAs

    An international research team led by Helmholtz Zentrum München, Technical University of Munich and the University of Edinburgh has used an integrated structural biological approach to elucidate the maturation of a cancer-causing microRNA in gene regulation. In the future, the authors hope ... more

    Selenium Protects a Specific Type of Interneurons in the Brain

    200 years after the discovery of the trace element selenium, researchers at Helmholtz Zentrum München have shown for the first time why this chemical element is indispensable for mammalian life. As integral part of the enzyme GPX4, selenium protects a subset of neurons from cell death durin ... more

    (Re)-acquiring the potential to become everything

    A new study identifies a specific population of pluripotent embryonic stem cells that can reprogram to totipotent-like cells in culture. Moreover, the scientists of Helmholtz Zentrum München and Ludwig-Maximilians-Universität München (LMU) have identified bottlenecks and drivers of this rep ... more

q&more – the networking platform for quality excellence in lab and process

The q&more concept is to increase the visibility of recent research and innovative solutions, and support the exchange of knowledge. In the broad spectrum of subjects covered, the focus is on achieving maximum quality in highly innovative sectors. As a modern knowledge platform, q&more offers market participants one-of-a-kind networking opportunities. Cutting-edge research is presented by authors of international repute. Attractively presented in a high-quality context, and published in German and English, the original articles introduce new concepts and highlight unconventional solution strategies.

> more about q&more

q&more is supported by:



Your browser is not current. Microsoft Internet Explorer 6.0 does not support some functions on Chemie.DE