SARS-CoV-2: How fast is the virus changing and what does it mean?
Updated: Nov 18, 2020
Melanie Matheu, PhD
About me: I am a PhD trained Immunologist / Biophysicist. Disclosure: I am the founder of a company that is developing SARS-CoV-2 neutralizing antibodies.
Today, July 26th, 2020, 16.5 million people have tested positive for the COVID-19 and 652,039 people have died. It has been 4 1/2 months since COVID-19 was declared a global pandemic. For comparison, Influenza typically causes 250,000 to 500,000 deaths annually.
The accuracy of how COVID-19 deaths are counted (or not) has been a matter of debate, I found this article instructive.
Scientific research, some excellent, some hastily done, has flooded the pre-print and peer review publication system at a rate of about 92 per day (Figure 1), making it difficult for experts and novices alike to decipher the trajectory of this pandemic.
Figure 1. Number of peer-reviewed SARS-Cov-2 research articles found using search term “SARS-CoV-2”, hosted by www.PubMed.gov as of 12 noon, July 26th, 2020.
In addition, mixed messages, and information that is often little more than conspiracy theory has been persistent in seeding confusion. As we experience a resurgence in COVID-19 cases, the simple question remains: Will we be able to get ahead of this virus?
From my perspective as an immunologist, the critical component of this question doesn’t lie in how many doses of vaccines we can produce, it is whether we are producing the right vaccine, antibodies, or small molecules to tackle this rapidly changing virus.
Current Vaccines and Antibody Therapies: Will They Be Enough?
To understand the implications of a changing virus, we must understand how and why vaccines and antibodies work.
The immune system is comprised of the adaptive and the innate branches. The innate branch is non-specific (not tailored) to pathogens and works like a general barrier or protection system. Once a virus breaches the innate immune system’s defenses (skin, mucous membranes, as well as Neutrophils, NK, and NKT cells), the T and B cells, primary actors in the adaptive immune response take over. Both T and B cells play important and independent roles in stopping a viral infection.
Once activated, T cells directly kill virus infected cells and offer support to B cells. B cells through a Darwinian selection process of guess-and-check develop specific antibodies to the virus over a couple of days. Some B cells become plasma B cells which mass-produce antibodies, pouring them into circulation to neutralize free-floating viruses looking to infect another cell (Figure 2).
After T and B cells are activated, successfully clear a viral infection a few of the activated, virus specific cells, become long-lived memory cells. These long-lived memory cells reside in the body and reactivate when challenged with the same pathogen, preventing reinfection.
Memory T and B cells have memorized an ‘epitope’ or surface protein signature that is unique to the virus or antigen. It’s almost like remembering the edge of a ridgeline, or shape of a unique building, but in 3D and with some chemistry added in — this is actually a physical-interaction system between protein-based recognition region of the immune cell and the protein-based virus surface. Epitope recognition by our immune system behaves like a sensitive thumb print scanning or facial recognition system that the memory cells use to verify the identity of a virus before reactivating. Memory immune cells quickly react when they encounter the virus that they have ‘memorized’, reducing the time to clear infection so significantly that most people do not feel sick again. This is where the word Immunity is derived from: Immunitas the latin word for ‘exemption’ or ‘freedom from’.
Typically one T or B cell is specific to one epitope, but often our bodies will generate many virus responsive T and B cells to ‘cover all bases’, recognizing many of the surface epitopes.
An antibody binds to a specific region on an antigen called an epitope. A single antigen can have multiple epitopes for different, specific antibodies. https://courses.lumenlearning.com/microbiology/chapter/polyclonal-and-monoclonal-antibody-production/
Viruses can escape a memory immune system response through changes or mutations in an epitope region. This foils the sensitive recognition system of immune cells. Some epitope regions of viruses mutate faster than others. Selective pressure exerted by an immune system can encourage mutations to become more prominent, such that the virus that does mutate is not neutralized by the host immune system and survives to infect more people. Regions that have a slow or low rate of mutation are called ‘conserved’ epitopes and make great vaccine or antibody therapeutic targets because it is more likely that they will work in a large number of infections. Some mutations are not on the surface and don’t change the shape of the virus protein and thus don’t usually change the structure of the epitope. Occasionally T and B cells rely on the same epitope or overlapping epitope and when that epitope changes then it can foil both of those parts of the adaptive immune response.
So how exactly do vaccines and antibodies work?
Vaccines work by training the immune system, both B and T cells, to recognize one or more epitopes such that the memory response can prevent infection when you are exposed to the virus. Sometimes vaccines use the whole virus which produces a more varied response since there are many epitopes, this is typically a more robust vaccine as pathogens don’t often mutate all epitope regions at once.
Antibodies are produced by B cells and can render a virus unable to infect cells (neutralize). Antibodies both alert T cells and buy the immune system time to catch up to a viral infection. Antibodies are both effective in stopping initial infections (if they are high enough in concentration), and can be used therapeutically once a patient is already sick to help shut down the virus and help prevent it from infecting new cells. Virus-specific antibodies such as those found in convalescent serum from recovered COVID-19 patients, as well as manufactured antibodies can be used therapeutically and prophylactically.
In sum, a vaccine or antibody therapy will be effective as long as the immune system, or antibody (surrogate immunity) continues to recognize the target epitopes. The problem is, epitopes are changing, let’s go through that data.
Moving the Therapeutic Goalposts: SARS-CoV-2 is Changing
The publicly available GISAID database has collected 72,705 genetic sequences of the SARS-CoV-2 virus from December 24th, 2019 to July 25th, 2020. Unique virus sequences can be visualized in part (not all mutations are shown at the same time for clarity) on NextStrain.org (Figure 3). Note that this data also includes a few mink derived sequences reported by the Netherlands, where workers on a mink farm infected mink and the mink then infected other workers. More here about zoonotic spread of SARS-CoV-2 from humans to mink, and then back to humans at the same facility.
SARS-CoV-2 virus mutations shown by Clade and region where the virus was isolated. Image from NextStrain.org
The GISAID database represents a small sampling (~0.45%, total isolates/total reported cases) of the viruses circulating in the global population, but for our purposes is used to represent larger viral trends as it is the largest, most complete, publicly accessible database.
I’ve parsed and processed the GISAID data on SARS-CoV-2. Among the 72,705 SARS-COV-2 viruses isolated from human patients, 51,192 have one or more amino acid changes in the original S protein (about 70% of the total viral isolates).
Among the 51,192 S protein mutations, 2,434 are unique amino acid mutations.
What is the S protein? To infect a human (or animal) SARS-CoV-2 uses it’s S or Spike protein (Figure 4) to bind to the ACE2 receptor of cells. The virus then enters the cell where it replicates. To date, all reported neutralizing antibodies (that I have found) are directed to the S protein of SARS-CoV-2. This means that the immune system recognizes epitopes on the S and uses the signature of the S to stop the virus.
Figure 4. S1 Protein from SwissProt: https://swissmodel.expasy.org/interactive/7dVLxC/models/03
What is an amino acid? Amino acids are the building blocks of proteins, a string of amino acids creates a peptide or longer protein. A single amino acid change, for example, at 100, means that there is a change in the protein at position 100. Short form abbreviations for amino acids are in single capital letters, for example: H100Y means at position 100 a Histidine (H) was changed to a Tyrosine (Y). Note that a single genetic point mutation (RNA or DNA) does not always lead to an amino acid change. Several codons (3 DNA or RNA nucleotides each) share or code for the same amino acid. A change in the DNA or RNA that does not lead to an amino acid change is considered a silent mutation.
In general, an increase in unique virus isolates has been occurring over time (Figure 5). SARS-CoV-2 is currently estimated to have a rate of 23.6 nucleotide substitutions per year. In a recent study in Brazil where COVID-19 cases have been surging, substitution rates are reported as high as 33 sites per year. The number of mutations found in the wild may be in part due to how widespread COVID-19 has become.
Figure 5. An increase in the number of virus isolates over time denoted by color, that contain one or more nucleotide substitutions (left), and the geographic location where the isolate was collected (right). Graph was generated from on the public website NextStrain.org on July 25th, 2020. Note that NextStrain.org provides a sampling of acquired data for visualization clarity.
Table 1. Estimated rate of nucleotide substitutions per site per year in pathogenic viruses. Data from NextStrain.org, June 25th, 2020.
Within the 1,273 amino acid long SARS-CoV-2 S protein, an average of 1.67 amino acid changes per site have been recorded.
Taking a closer look at S protein, 290 unique amino acid changes were found in the receptor binding domain (RBD, defined here as amino acids 333–527). The RBD or receptor binding domain is the protein surface used by the virus to bind to cells. The RBD is where the majority of neutralizing antibodies reported to date bind to neutralize the virus.
Fewer mutations were found in the RBD (average 1.5 amino acid changes per site), relative to outside of the RBD (1.71, statistical significance to be determined). Amino acid changes are not equally distributed and relative ‘hot-spots’ where amino acid changes show up more often, occur both inside and outside of the RBD (blue, Figure 6).
Figure 6. Each dot represents a unique amino acid change in the S protein as reported in GISAID.
There are also many mutations outside of the RBD that may alter other antibody recognition sites, similar to dominos knocking something far away into a different place. Also, mutations that have no effect on the RBD can increase virus fitness, as was seen with the D614G mutation.
Let’s take a closer look at D614G
A SARS-CoV-2 virus that changed amino acid position 614 from D (Aspartic acid: a charged, polar, acidic, hydrophilic amino acid) to G (Glycine: a small, very flexible non-polar, neutral amino acid) was first isolated and reported to the GISAID database in January, 2020.
SARS-CoV-2 viruses containing this mutation became globally dominant in about three months (Figure 7). About 75% of SARS-CoV-2 genomes isolated in July have the D614G mutation. Although the D614G mutation does not increase severity of disease, it does lead to a significant increase in infectivity, possibly 10-times higher than the original SARS-CoV2. Although D614G does not increase severity of disease, mutations that do that may otherwise reduce viral fitness, when combined with D614G may have a selective advantage.
Figure 7. NextStrain.org data showing the prevalence of the S protein amino acid change D614G among viral clades over time (upper left), prevalence over time from the first isolation January 1st, 2020 (upper left), and current global distribution of D614G in virus isolates submitted to NextStrain.org (map, bottom). Gold color represents G at amino acid position 614, and the teal color represents D at amino acid position 614.
Once the D614G mutation was introduced to Europe, it became the dominant strain in about three weeks. A similar pattern occurred upon introduction to United States, where local (East Coast) dominance of D614G was achieved in about 2 weeks.
Graphical Abstract from: Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus, Korber et al. 2020 CELL.
If D614G doesn’t increase severity of disease, why do we care?
An amino acid change that is found both in association with the D614G mutation, or independently of D614G, may be important when making a broad assessment of virus fitness. This is because a mutation found independently of D614G may indicate that there is no reduction in virus fitness as the virus has been successful in spreading. However, combination with D614G, which increases infectious capacity of the virus, ensures that a mutation will be distributed as long as it is not severely detrimental to the fitness of the virus.
In total, 641 S amino acid mutations are found independently of D614G and 1,492 are found in association with D614G. Of these, 301 amino acid mutations are found both independently, and, in association with G614D (Figure 8).
Figure 8. Shows the number of amino acid changes found in the S protein sequence that are associated with D614 (original strain of SARS-CoV-2, count = 641), G614 (gain of function mutation, count = 1,492), or both (count = 301)
In the RBD (receptor binding domain) D614G is found in association with 201 unique RBD amino acid changes and 89 of the RBD mutations are found independently of D614G. A total of 13 unique amino acid changes or sets of changes in the RBD are present in the population both with and without D614G (Figure 9).
Figure 9. Shows the number of amino acid changes found in the receptor binding domain (RBD) of the S protein sequence that are associated with D614 (original strain of SARS-CoV-2, count = 89), G614 (gain of function mutation, count = 201), or both (count = 13)
Interestingly, mutations within the whole S protein and those confined to the RBD are more often associated with the gain of function mutation D614G. This indicates that D614G is indeed enhancing the spread of mutations.
How many amino acid changes does it take to thwart a neutralizing antibody?
One amino acid change in an epitope region can render a neutralizing antibody that targets that epitope ineffective. This occurs regardless of the antibody origin (vaccine response, man-made therapeutic, or natural exposure). However, vaccines and natural exposure both typically induce a polyclonal (multi-antibody) response where several different antibodies that recognize distinct epitopes are generated. Some combinations of human derived antibodies tested (discussed below), have also been rendered ineffective by an S mutation in vitro.
In sum, changes in the RBD of the S protein have been found in every major region of the globe. This introduces the interesting question: Is this virus already responding to selective pressures from the human immune response?
Image of SARS-CoV-2 antibodies binding to the RBD of the S protein from: Convergent antibody responses to SARS-CoV-2 in convalescent individuals, Robbiani et al. Nature, 2020.
Are There Escape Mutants Among Us? Yes.
An escape mutant is defined here as a change in the virus protein sequence that leads to a neutralizing antibody no longer being effective.
In a recent study published in Science that characterizes human patient derived antibodies produced by Regeneron, scientists were able to demonstrate emergence of virus escape mutants within a single viral passage (culture) in vitro (in the lab). All of the mutants that escaped neutralization, except three, have already been isolated from COVID-19 patients and recorded in the GISAID database.
Let’s take a closer look at the most prominent escape mutant, H655Y.
First, Regeneron isolated human antibodies by screening the B cells of infected patients, a common method utilized by many human antibody generating companies. These are natural human antibodies that were circulating in the population and in general, convergence (trending towards similarity) of neutralizing antibodies has been reported (Nature, 2020 Robbiani et al.).
In the article from Regeneron, mutation H655Y, an amino acid change from H (His, Histidine) to Y (Tyr, Tyrosine) at position 655 in the S protein, emerged in the lab. This single mutation created a virus that was no longer neutralized by Regeneron antibodies or the combinations of antibodies tested (Figure 2 from Baum et al. Science, 2020).
“Deep sequencing of passaged virus identifies escape mutations”, Figure 2, Baum et al. Science, 2020
But does H655Y exist in the wild? Yes.
The H655Y mutation was first isolated independently of D614G in the human population January 26th, 2020. On April 1st, 2020 the H655Y mutation was found in conjunction with D614G. This indicates it will continue to spread well. In total, 9 unique S protein mutations include the H655Y mutation. Interestingly, this mutation does not occur in the receptor binding domain of the S protein, demonstrating the difficulty in predicting which mutations will evade neutralizing antibodies that recognize the RBD.
Virus mutations capable of evading antibody responses to the original Wuhan strain of SARS-CoV-2 are already endemic, and have been for months. Additionally, with a few short cycles of viral replication under selective pressure from a neutralizing antibody, escape mutations are readily created in the laboratory setting.
Conserved Epitope Region: A Silver Bullet?
Coaxing the human body to produce an antibody to a conserved (low mutation rate) epitope region of the SARS-CoV-2 virus would certainly reduce if not eliminate most viral spread. However, this may be harder than it sounds. Multi-antibody responses produced by the immune system are typically linked to better outcomes, and there has yet to be a conserved region of the S RBD reported. Some antibody binding regions that have been reported to be conserved are found outside of the RBD, but are they truly conserved?
A conserved region between SARS-CoV and SARS-CoV-2 RBD of S was reported along with a neutralizing antibody that binds the region in Nature, Pinto et al. May 18th, 2020. Doing a quick search of the GISAID database, however, reveals that 13 of the 19 amino acids bound by the reported neutralizing antibody are mutated in circulating viruses.
In another research article published May 8th, 2020 in Science, Yuan et al. a conserved epitope region between SARS-CoV and SARS-CoV-2 which is bound by a neutralizing antibody is identified. Today, per the GISAID database 19 amino acids of the 28 amino acids that make up the reported SARS-CoV/CoV-2 conserved epitope region are mutated and in circulation. These mutations occasionally occur independently, but are often association with up to 8 other mutations, many including D614G.
I have yet to find a literature reported conserved epitope region that does not have over 60% of it’s sites mutated in circulating viruses.
How is SARS-CoV-2 Changing So Quickly, and what does it mean?
SARS-CoV-2, like other coronaviruses is capable of recombination or swapping portions of the genetic sequence from different viruses inside the cell). The earliest reports of this potentially occurring in a patient in Belgium. This initial report was later identified as a sequencing error in the GISAID database. At the time of mutation report, however, it was speculated that the patient was infected with two distinct strains of the virus. Recently, additional sequence analysis supports genetic recombination between SARS-CoV-2 viruses that have co-infected the same cell : 2019 Novel Coronavirus Is Undergoing Active Recombination.
It is tempting to speculate that the robust nucleic acid substitution rate and long-lived infection may support in-host recombination events. This would allow SARS-CoV-2 to persist in evading the human immune system, in real-time. Given the oddly long pre-cytokine storm period demonstrated in severe infection (10–14 days), immune system evasion that ‘buys’ the virus time may be at play. Outside of the S protein, several other proteins expressed by the virus are acting as Interferon antagonists: Fung et al. 2020, Chen et al. 2020. Interferons are potent immune system modulatory, typically activating molecules. Many of the anti-inflammatory antibodies for autoimmune disease on the market are indeed interferon antagonists.
The D614G mutation leads to a significantly higher virus titer within a patient, which could support a higher rate of recombination events. In vivo recombination within a patient may also explain long-lived re-emergence of symptoms within patients that have recovered.
Regardless of whether in-host recombination occurs or not, recombination events are common in corona viruses. This allows for rapid evolution and possibly immune system escape, making it difficult for therapeutics that target a specific viral sequence to get ahead of the mutations.
A Sobering Future:
SARS-CoV-2 is unlikely to be completely eliminated. My prediction that the virus would persist and not experience mutational drift towards a less virulent strain (which I also wrote about in March). Relatively few people become deathly ill and the incubation time during which the virus can be spread before people experience severe illness is relatively long. Therefore, the virus has little to no evolutionary pressure to evolve towards a less virulent strain.
It is likely that SARS-CoV-2 will continue to circulate as a highly infectious and far worse version of a seasonal flu-like infection without a seasonal timeline, but instead, an escape mutant timeline. Eventually, through vaccination and exposure, most humans will have enough of an immune response to multiple epitopes that the virus, despite mutations may become mild due to exposure of the population. But this process will take several years. It is also possible that over time, again several years, that a large number of conserved epitope binding antibodies will force viral escape mutations towards a less biologically fit (virulent) version of the virus.
It is difficult to predict a-priori if a vaccine or antibody therapy will work and to what extent it will be effective. What is certain is that; 1) target proteins of vaccine and antibody therapy efforts have undergone changes in native circulating viruses, 2) that antibody escape mutants occur within 1 to 2 passages of viruses in the presence of neutralizing antibodies. This may occur naturally in the body as well.
For now please keep social distancing, wear a mask, and stay safe.