Part of a gene is better than none at identifying a type of microbe. But for Rice University computer scientists, one part wasn’t nearly enough in their pursuit of a program to identify all species in a microbiome.
Emu, their microbial community profiling software, effectively identifies bacterial species by using long DNA sequences spanning the entire length of the gene under investigation.
The Emu project led by computer scientist Todd Treangen and graduate student Kristen Curry of Rice’s George R. Brown School of Engineering facilitates the analysis of a key gene microbiome that researchers use to identify types of bacteria that could be harmful or beneficial to humans and the environment. .
Their target, 16S, is a subunit of the rRNA (ribosomal ribonucleic acid) gene, the use of which was developed in 1977 by Carl Woese. This region is highly conserved in bacteria and archaea and also contains variable regions crucial for separating different genera and species.
“It’s often used for microbiome analysis because it’s present in all bacteria and most archaea,” said Curry, in her third year in the Treangen group. “That’s why there are regions that have been conserved over the years that make it easy to target. In DNA sequencing, we need parts of it that are the same in all bacteria so we know what to look for, and then we have parts to be different, so we can tell bacteria from each other.”
The Rice team’s study, with collaborators in Germany and at the Houston Methodist Research Institute, Baylor College of Medicine and Texas Children’s Hospital, appears in the journal Nature Methods†
“Years ago, we tended to focus on bad bacteria — or what we thought was bad — and we didn’t really care about the others,” Curry said. “But there’s been a shift over the last 20 years where we think some of those other bacteria hanging out might mean something.
“That’s what we call the microbiome, all the microscopic organisms in an environment,” she said. “Frequently studied environments include water, soil and the intestinal tract, and microbes have been shown to affect crops, carbon sequestration and human health.”
Emu, named for its job of “expectation maximization”, analyzes entire 16S sequences of bacteria processed by an Oxford Nanopore MinION handheld sequencer and uses advanced error correction to identify species based on nine distinct “hypervariable regions”.
“With prior technology, we could only read part of the 16S gene,” explains Curry. “It has about 1,500 base pairs, and with short-read sequencing you can sequence up to 25%-30% of this gene. However, you really need the entire gene to achieve species-level precision.”
But even the latest technology isn’t perfect, allowing errors to slip into series.
“Although error rates have fallen in recent years, they can still contain up to 10% errors in an individual DNA sequence, while species can be separated by a handful of differences in their 16S gene,” said Treangen, an assistant professor in computer science that specializes in detecting infectious diseases. “Distinguishing sequencing errors from real differences was the main computational challenge of this research project.
“One problem is that many of the errors are not random, meaning they can occur repeatedly at specific positions and then look like real differences rather than a sequence error,” he said.
“Another problem is that there can be thousands of bacterial species in a given sample, creating a complex mixture of microbes that can exist in abundances well below the sequencing error rate,” Treangen said. “This means that we cannot simply rely on ad hoc cutoffs to distinguish signal from error.”
Instead, Emu learns to distinguish between signal and error by comparing a large number of long strings, first against a template and then against each other, and iteratively refines the error correction as it profiles microbial communities. In the experiments performed, the false positives decreased significantly in Emu compared to other approaches when analyzing the same data sets.
“Long reads represent a disruptive technology for microbiome research,” Treangen said. “The goal of Emu was to harness all the information about the entire 16S gene, without masking anything, to see if we could achieve more accurate calls at the genus or species level. And that’s exactly what we achieved with Emu, thanks to a fruitful, multidisciplinary collaboration.”
Alexander Dilthey, a professor of genomic microbiology and immunity at Heinrich Heine University, Düsseldorf, Germany, is co-corresponding author of the article.
Open Source Program IDs Synthetic Naturally Occurring Gene Sequences
Kristen Curry, Emu: Species-level microbial community profiling of complete 16S rRNA Oxford Nanopore sequencing data, Nature Methods (2022). DOI: 10.1038/s41592-022-01520-4. www.nature.com/articles/s41592-022-01520-4
Provided by Rice University
Quote: Emu software uses common gene to profile microbial communities (2022, June 30) retrieved June 30, 2022 from https://phys.org/news/2022-06-emu-software-common-gene-profile.html
This document is copyrighted. Other than fair dealing for personal study or research, nothing may be reproduced without written permission. The content is provided for informational purposes only.