Auto linking

A mutational signature linking bladder cancer and smoking has been discovered thanks to a new AI tool

Content of the article

Researchers from the University of California, San Diego have for the first time discovered a pattern of DNA mutations that links bladder cancer to smoking. The discovery was made possible by a powerful new machine learning tool the team developed to find patterns of mutations caused by carcinogens and other DNA-altering processes.

The book, published September 23 in Cell genomicscould help researchers identify environmental factors, such as exposure to tobacco smoke and UV rays, that cause cancer in some patients.

Each of these environmental exposures alters DNA in a unique way, generating a specific pattern of mutations, called a mutational signature. If a signature is found in the DNA of a patient’s cancer cells, the cancer can be traced back to the exposure that created that signature. Knowing which mutational signatures are present could also lead to more personalized treatments for a patient’s specific cancer.

In this study, researchers discovered a mutational signature in the DNA of bladder cancer that is linked to smoking. The finding is significant because a mutational signature of smoking has been detected in lung cancer, but not yet in bladder cancer.

“There is strong epidemiological evidence linking bladder cancer to smoking. We even see a specific mutational signature in other tissues – such as the mouth, esophagus and lungs – that are directly exposed to tobacco carcinogens,” said study lead author Ludmil Alexandrov, Professor of Bioengineering and Cellular and Molecular Medicine at UC San Diego. “The fact that we couldn’t find this signature in the bladder was strange.”

Alexandrov and his colleagues now show that there is a mutational signature of smoking in bladder cancer, and that it is different from the signature found in lung cancer. Moreover, they show that this signature is also found in the normal tissues of the bladder of tobacco smokers who have not developed bladder cancer. The signature was not found in bladder tissue from non-smokers.

“What this signature tells us is that certain mutations in your DNA are due to exposure to tobacco smoke,” said study co-first author Marcos Diaz-Gay, a postdoctoral researcher at the Alexandrov’s laboratory. “It doesn’t necessarily mean you have cancer. But the more you smoke, the more mutations accumulate in your cells and the more you increase your risk of developing cancer.

Enabled by next-generation machine learning

The researchers found the tobacco signature with a next-generation machine learning tool developed by Alexandrov’s lab. The team says it is the most advanced automated bioinformatics tool for extracting mutational signatures directly from large amounts of genetic data.

“This is a powerful machine learning approach to recognize mutation patterns and separate them from genomic data,” Alexandrov said. “It takes those patterns and deciphers them, so we can see what the mutational signatures are and match them to their meaning.”

He likened the machine learning approach to picking out individual conversations at a cocktail party.

“You have multiple groups of people talking all around you, and you’re only interested in hearing certain people talking,” he said. “Our tool basically helps you do that, but with cancer genetic data. Many people around the world are exposed to different environmental mutagens, and some of these exposures leave imprints on their genomes. This tool sifts through all of this data to identify the processes that cause the mutations. »

The tool was used to analyze 23,827 sequenced human cancers. He found four mutational signatures, including that of smoking-related bladder cancer, that had not been detected by any other tool. The other three signatures, found in stomach, colon, and liver cancers, still deserve further study to see what processes caused them.

To show the power of their tool, the researchers compared it to 13 existing bioinformatics tools. The tools were evaluated for their ability to extract mutational signatures from more than 80,000 synthetic cancer samples. The tool developed by Alexandrov’s team had surpassed all others. It detected 20-50% more true positive signatures, with five times fewer false positive signatures. It even performed well when analyzing noisy data, where other tools failed.

“In bioinformatics, this is the first time that such comprehensive benchmarking has been performed at this scale for mutational signature extraction,” Diaz-Gay said. “It’s a huge undertaking, comparing many tools across many datasets.”

Such a feat is also costly, Alexandrov noted. “Thanks to funding from Cancer Research UK, we were able to do this in-depth technical assessment, which is not commonly done.”

Create a more user-friendly and personalized tool

The team’s ultimate goal is to create a web-based tool that more researchers can use and, therefore, profile more patients.

“Right now, this tool requires expertise in bioinformatics to make it work,” Alexandrov said. “What we want is to create a web-friendly version, where researchers can just drop in a patient’s mutations, and that immediately gives you the full set of mutational signatures and the processes that caused them.”

“Our idea for the future is to leverage this tool to analyze patients at the individual level,” Diaz-Gay said.

Title of the article: “Discovery of new mutational signatures by de novo extraction with SigProfilerExtractor.”

This work was supported by Cancer Research UK and the National Institutes of Health.