Since the human genome was first mapped in in the 1980s, science has unlocked many mysteries of the human body. In that time, the technology surrounding the project has evolved so that the daily amount of data being produced is doubling every seven months. By the year 2030, genomics will likely be generating zetabytes (1021 bytes) of data every year.
Arkansas Research Alliance (ARA) Academy member Dr. David Ussery, a professor of bioinformatics at the University of Arkansas for Medical Sciences, is using this data to fight diseases, including COVID-19.
AMP: Using broad strokes, tell us about the nature of your research.
Dr. Ussery: My research centers around genome sequences. In the context of UAMS, we’re interested in developing high-throughput methods for sequencing and high-throughput methods for computational analysis of microbial genomes, in terms of human health, as well as human genomes and cancer and other diseases.
AMP: To a finer point, your research is focused on bioinformatic analysis of bacterial genomes. How does this work interact with the COVID-19 pandemic?
Dr. Ussery: There is a lot that can be learned from genome sequences, whether they are from viruses, bacteria or humans. For example, currently in the news is a new ‘strain’ of SARS-CoV-2 that is supposedly much more virulent. Actually, from looking at the genome sequences of lots of coronavirus strains, what we can see is that this new strain is not really that different, and most likely just a particular strain that was part of a super-spreader event. It’s not that the virus is suddenly more easily spread, but rather that by chance this particular variant was one that has infected lots of people. Sometimes, correlation is not the same as causation.
In terms of the relationship between my work with bacterial genomics and the COVID-19 pandemic, actually, I work with microbial genomics, which includes viral genomes as well as bacterial genomes. In the past two years (2019 and 2020), I have published several papers that were only about viruses, including one about the mumps virus outbreak based in Springdale. In general, we are interested in understanding microbial ecosystems which include viruses, bacteria and fungi.
AMP: Tell us about the Arkansas Center for Genomic and Ecological Medicine (ArC-GEM).
Dr. Ussery: The idea behind ArC-GEM is pretty simple — use genome sequences as the best unique identifier of an organism to track disease outbreaks. Basically, a pathogen from a clinical isolate can be sequenced, and from that, the doctor can quickly know several important things. What is it? How do I treat it? Is this part of an outbreak that we should be monitoring?
For example, we have sequenced about 40 mumps genomes from the outbreak in Springdale and used this to compare the strain there to other outbreaks. As a result of this, we could map where the Springdale strain came from (Iowa) and other places where it went (Washington state). In a sense, the SARS-CoV-2 genome is just another genome among many that we want to monitor and track in terms of genomic epidemiology. We can also use similar methods that we’re using to look for COVID-19 to look for other viral (and microbial) genomes in wastewater, and to be better able to detect and track outbreaks.
AMP: The bacteria inside your gut can directly influence a number of fascinating functions in the body. What advances in medicine can your work help facilitate?
Dr. Ussery: There are many diseases that can be caused or exacerbated by the microbial community in different parts of the body. In your gut, there are some bacteria that can affect moods, that can influence the immune response (both good and bad), can affect cardiovascular health and even cause (or help prevent) cancer. This means that understanding the microbial community or ecosystem is very important for better treatments of patients.
AMP: What could you do with, say, “astronomical” levels of funding? And what difference would that make for Arkansas?
Dr. Ussery: This is actually a great question. There was a paper a few years ago — “Big Data: Astronomical or Genomical?” — revealing that with the continuing decline in the cost of sequencing, genome sequences are now dominating big data. Currently, about 1 petabyte (1015 bytes) of new sequences are deposited to public databases every two weeks. This is increasing rapidly, and soon we will have zetabytes (1021 bytes) of new genome sequences to work with every year.
But we will need computational infrastructure to deal with this “astronomical” amount of data. So, I’d invest heavily in building up high-throughput computational infrastructure, as well as novel methods that will scale to massive amounts of genomic data. In the NSF’s $24 million DART proposal, which was to build up computational infrastructure throughout the state of Arkansas, there are four areas of industry engagement in the first figure: transportation; retail and ecommerce; marketing and behavior; and genomics. I think that there’s a real need for investing both in the hardware as well as investing in people to maintain and run the big computers.
The UAMS College of Medicine recently calculated its return on investment on a series of internal grants it has funded. It found that roughly for every dollar invested, there was about $15 in grant funding received. So, this is a good investment, in my opinion!
AMP: What do you want the Arkansas business community and public officials to know about your research and how it’s making a difference?
Dr. Ussery: My research is about genomics and big data. This area is now dominating the total amount of data being produced, and there’s a real need for developing high-throughput computational methods for dealing with this. We are developing methods for rapid detection and monitoring of diseases, based on genomic sequences. This includes detecting COVID-19 levels in wastewater, as well as sequencing SARS-CoV-2 genomes from Arkansas clinical isolates.
AMP: What opportunities do you see for converting your research to jobs?
Dr. Ussery: Currently, there’s an NIH grant about looking for COVID-19 in Arkansas wastewater, and I think this is something that is really needed. It would be great if we could monitor a whole community on a daily basis, and drill down to specific geographic locations and find hot spots for infection. Potentially, there are new jobs for people to use genome sequences as epidemiology.
For example, poultry industries could use genome sequencing to quickly identify and isolate pathogenic strains of viruses or bacteria, preventing costly recalls. All of my students have had the luxury of choosing which job that they want — that is, there are more jobs than people in my field.
The ARA Academy of Scholars and Fellow is a community of strategic research leaders that strives to maximize the value of discovery and progress in the state. Learn more at ARAlliance.org.