My watch list


SMART: Facial recognition for molecular structures

Chen Zhang, Bill Gerwick

The SMART cluster map based on training results of 2,054 HSQC spectra over 83,000 iterations, with inset boxes representing different compound classes.

09-Nov-2017: An interdisciplinary team of researchers at the University of California San Diego has developed a method to identify the molecular structures of natural products that is significantly faster and more accurate than existing methods. The method works like facial recognition for molecular structures--it uses a piece of spectral data unique to each molecule and then runs it through a deep learning neural network to place the unknown molecule in a cluster of molecules with similar structures.

The patent-pending new system is called "SMART," which stands for Small Molecule Accurate Recognition Technology, and has the potential to accelerate the molecular structure identification process ten-fold. This development could represent a paradigm shift in the chemical analysis, pharmaceutical and drug discovery fields since 70 percent of all FDA-approved drugs are based on natural products such as soil microorganisms, terrestrial plants and, increasingly, marine life forms such as algae.

This work represents a collaboration between the UC San Diego Jacobs School of Engineering and the UC San Diego Scripps Institution of Oceanography.

"The structure of a molecule is the enabling information," said Bill Gerwick, professor of oceanography and pharmaceutical sciences at UC San Diego's Scripps Institution of Oceanography. "You have to have the structure for any FDA approval. If you want to have intellectual property you have to patent that structure, if you want to make analogs of that molecule you need to know what the starting molecule is--it's a critical piece of information."

Chen Zhang is a nanoengineering Ph.D. student at the UC San Diego Jacobs School of Engineering. Zhang said that determining a molecule's structure can be a bottleneck in the natural product research process, taking experts months and even years to accurately determine the correct and complete structure. While each molecule and its identification timeline is different, the SMART approach gives researchers an early clue into what family a new molecule falls under, drastically reducing the time it takes to characterize a new natural product.

"The way we were able to accelerate the process is by essentially using facial recognition software to look at the key piece of information we obtain on the molecules," Gerwick explained. The key piece of information the team uses is something called a heteronuclear singular quantum coherence nuclear magnetic resonance, or HSQC NMR, spectrum. It produces a topological map of spots that reveal which protons in the molecule are attached directly to which carbon atoms, and is unique to every molecule.

Zhang and Gerwick teamed up with Gary Cottrell, a computer science and engineering professor at the UC San Diego Jacobs School of Engineering, to develop a deep learning system trained with thousands of HSQC spectra pulled from the literature. This convolutional neural network takes a 2D image of the HSQC NMR spectrum of an unknown molecule and maps it into a 10-dimensional space clustered near similar molecules, making it easier for researchers to elucidate an unknown molecule's structure.

"Chen took this approach to getting NMR spectra of over 4,000 compounds from the literature by literally cutting out the images from the PDFs of the papers," Cottrell said. "It was an awesome effort! Even so, this is normally not enough data to train a deep network, but we used a technology called a Siamese network, in which you train on pairs of images. This amplifies your training set by roughly the square of the number of compounds in a family, and is what made this project feasible."

This collaboration is the first time Gerwick has mentored an engineering student, and the exchange of ideas proved fruitful.

"It's been a wonderful interaction. UC San Diego has something really quite magical about it, and that is the depth of collaboration that occurs between departments--it's phenomenal," Gerwick said. "When you try and thoughtfully take from another discipline something that is maybe even commonplace in that discipline and apply it in a new and unique way in our discipline, it's an opportunity to really have this kind of paradigm-shifting thing. And I think this technology, with some advancement, could be a real paradigm shift in the way we do all kinds of chemistry and chemical analysis."

The team will get that chance for advancement thanks to a $550,000 grant from the National Institutes of Health to develop efficient methods that facilitate the automated structural classification, feature discovery and structure elucidation of natural products and to build an infrastructure that interacts with data input from the community.

Original publication:
Chen Zhang, Yerlan Idelbayev, Nicholas Roberts, Yiwen Tao, Yashwanth Nannapaneni, Brendan M. Duggan, Jie Min, Eugene C. Lin, Erik C. Gerwick, Garrison W. Cottrell & William H. Gerwick; "Small Molecule Accurate Recognition Technology (SMART) to Enhance Natural Products Research"; Scientific Reports; 2017

Facts, background information, dossiers

  • molecular structure
  • drug discovery
  • Scripps

More about UCSD

  • News

    Scientists use artificial neural networks to predict new stable materials

    Artificial neural networks--algorithms inspired by connections in the brain--have "learned" to perform a variety of tasks, from pedestrian detection in self-driving cars, to analyzing medical images, to translating languages. Now, researchers at the University of California San Diego are tr ... more

    Surprise finding points to DNA's role in shaping cells

    As a basic unit of life, the cell is one of the most carefully studied components of all living organisms. Yet details on basic processes such as how cells are shaped have remained a mystery. Working at the intersection of biology and physics, scientists at the University of California San ... more

    Nanoengineers 3-D print biomimetic blood vessel networks

    Nanoengineers at the University of California San Diego have 3D printed a lifelike, functional blood vessel network that could pave the way toward artificial organs and regenerative therapies. The new research, led by nanoengineering professor Shaochen Chen, addresses one of the biggest cha ... more

q&more – the networking platform for quality excellence in lab and process

The q&more concept is to increase the visibility of recent research and innovative solutions, and support the exchange of knowledge. In the broad spectrum of subjects covered, the focus is on achieving maximum quality in highly innovative sectors. As a modern knowledge platform, q&more offers market participants one-of-a-kind networking opportunities. Cutting-edge research is presented by authors of international repute. Attractively presented in a high-quality context, and published in German and English, the original articles introduce new concepts and highlight unconventional solution strategies.

> more about q&more

q&more is supported by:

Your browser is not current. Microsoft Internet Explorer 6.0 does not support some functions on Chemie.DE