What Viral Evolution Can Teach Us About the Coronavirus Pandemic

In the month since we learned that the virus was spreading in Seattle, the coronavirus tree has grown explosively, its tips splitting and branching as the virus continues to spread.
Illustration by Adam Ferriss

A scientist I know has a tattoo on his arm of a sketch from Darwin’s notebooks. Spidery lines radiate outward from the center, branching and splitting again and again, ending with labels: A, B, C, and D. The diagram is simple, but the idea is profound: a tree of life, with lines of descent that connect you and me to everything that lives and has ever lived on Earth.

If I had to pick a single image to show how evolutionary biologists like myself think about what we do, my choice would be the same. Evolutionary trees are narratives of history written in genetic code, records of family lineages that stretch into the distant past. For anyone who knows how to look, the past, present, and possible futures of the new coronavirus can be found in its evolutionary tree. We are uncovering the tree now, bit by bit, and, with the help of new technologies that make it cheaper and faster to sequence viral genomes, we are reading the virus’s tree faster than we have in any epidemic before. The question now is: Can we read it fast enough to make a difference as we race to limit the virus’s spread?

In the earliest days of the epidemic, the virus’s evolutionary tree hinted at the seriousness of the problem we would face. On January 10th, ten days after the world first heard about a cluster of mysterious pneumonias in Wuhan, the first genome sequence from the outbreak told us something important: the virus responsible was new to humankind. Its symptoms resemble those of influenza, pneumonia, and the common cold, and the virus is in the coronavirus family, as the name COVID-19 suggests. But the virus has not infected humans before, which means that we have limited immune defenses to stop its spread.

A second important finding came soon after the first. Early reports from the World Health Organization had suggested that the virus had jumped multiple times from its animal host into humans, with limited person-to-person spread. But as more viral genomes were sequenced, in mid-January, their evolutionary tree told a different story. When researchers compared genomes from the COVID-19 outbreak to coronaviruses found in bats, pangolins, and other animals, they found that the viruses from the outbreak clustered tightly in a single branch of the tree, recent descendants from a single spillover event in November or December. We still don’t know the origin of this spillover event. But, as we now know too well, the story the tree told was correct: the new virus spreads easily from person to person.

In late January, a man flew from Wuhan to Washington State and got sick, becoming the first known case of COVID-19 in the United States. For weeks afterward, we learned of similar cases, instances in which the virus was acquired abroad that seemed, at the time, to be isolated offshoots. That changed at the end of February, when researchers in Seattle identified a patient who got COVID-19 without travelling abroad or knowing someone who did. (Several of those researchers are my collaborators and mentors, and this article draws heavily from their work, particularly that of the virologist Trevor Bedford.)

The genomes of the Washington cases told a startling story: the viruses from the Wuhan traveller and the community-transmission case clustered together on their own branch of the coronavirus tree, the second virus a direct descendant of the first. Three mutations had arisen along the transmission chain to distinguish the two viruses, ticks on a genetic clock marking the time that elapsed. The viral genomes showed that after the coronavirus reached Washington State, in late January, it grew into its own branch of the tree, spreading silently through the city for weeks. Based on the cryptic transmission evident from the evolutionary tree, researchers estimated that the virus might have infected five hundred to six hundred people in Washington State by early March, far more than the eighteen cases reported at the time.

In the month since we learned that the virus was spreading in Seattle, the coronavirus tree has grown explosively, its tips splitting and branching as the virus continues to spread. In Washington State, nearly five thousand new cases were diagnosed in March, most of them descendants of the first Washington case. This branch has grown thick enough to seed its own outbreaks. We have found its descendants in New York, California, Connecticut, Minnesota, and Wisconsin, some of the few states to publish viral genome sequences so far. The Washington branch has also sparked clusters of cases as far away as Iceland and Australia, a testament to our interconnected world. (As each of these branches has spread, they have developed different mutations, a normal consequence of mistakes in viral replication. So far, there is no evidence that these mutations change how the virus infects people.)

Across the rest of the coronavirus tree, branches spread across the globe with ease. In the United States, the tree shows that the coronavirus crossed our borders and spread multiple times. Seattle’s outbreak is fuelled by several distinct branches, evidence that the virus has taken multiple paths from Wuhan to Washington State. Most cases sequenced from the massive New York City outbreak belong to a single branch whose closest relatives are in Europe, not China, suggesting that the virus crossed the Atlantic rather than the Pacific Ocean to arrive on the East Coast. Across the United States, the virus has continued to replicate nationwide, in defiance of the patchwork restrictions put forth to stop it.

In the Ebola and Zika outbreaks of the past several years, evolutionary trees revealed the virus’s patterns of spread around the world, but sometimes not until months or years after the outbreaks began. The new coronavirus has spread far faster—but the pace of science has sped up as well. Between January and early April, more than twenty-five hundred COVID-19 genomes were published, making it possible to track how the virus has spread and evolved in almost real time.

These advances raise the tantalizing possibility that knowledge of viral evolution can alter the course of this pandemic. It may already have made a difference in Seattle, where genome sequencing of the first two known Washington cases alerted researchers to weeks of silent viral spread, an insight impossible from positive tests alone. It was, in part, these clues about a larger, hidden outbreak that prompted swift social-distancing measures, long before test results began to catch up.

For now, during this period of extreme social distancing, a major priority is to implement widespread diagnostic testing to learn where and how quickly the virus is spreading. Eventually, expanded testing and contact tracing may tell us who should stay home and who might safely go back to work, letting the economy restart. But the case of Seattle points toward a more distant future in which routine genome sequencing can also guide our public-health response. Someday, genome sequencing may help us figure out who transmits the virus to whom. It might teach us how often the virus spreads through close contact, as opposed to passing encounters. And once social distancing lets up, expanded diagnostic testing paired with genome sequencing can alert us to the virus’s likely resurgences.

Realizing these possibilities would require heroic efforts from overwhelmed public-health agencies to radically expand viral surveillance, and to combine sophisticated bioinformatic analyses with traditional, shoe-leather epidemiology. But there are also glimmers of hope. Before my colleagues found a coronavirus outbreak in their back yards, they built the Seattle Flu Study to sequence thousands of influenza samples, allowing them to track local transmission chains as the flu virus circulated around Puget Sound. Now, in partnership with local public-health agencies and the University of Washington, they are using this infrastructure to launch a massive coronavirus-surveillance project. They are testing samples from both sick and healthy people around the greater Seattle area and sequencing all the coronaviruses they find.

Perhaps, through work like this, we will find clues about how to stop the virus’s spread, so that our knowledge of evolution can help us shape the virus’s evolutionary tree. These are bold ideas, but stranger things have happened in these past few months.


A Guide to the Coronavirus