None of the mutations currently documented in the SARS-CoV-2 virus appear to increase its transmissibility, according to a UCL-led study.
The analysis of virus genomes from over 15,000 Covid-19 patients from 75 countries is published today as a pre-print on bioRxiv and has not yet been peer-reviewed.
The findings build on a peer-reviewed study published in Infection, Genetics and Evolution earlier this month that characterised patterns of diversity emerging in the genome of SARS-CoV-2, the coronavirus causing the ongoing pandemic of the Covid-19 disease.
Lead author Professor Francois Balloux (UCL Genetics Institute) said: “As a growing number of mutations have been documented, scientists are rapidly trying to find out if any of them could make the virus more infectious or deadly, as it’s vital to understand such changes as early as possible.
“We employed a novel technique to determine whether viruses with the new mutation are actually transmitted at a higher rate, and found that none of the candidate mutations appear to be benefiting the virus.”
Coronaviruses, like other RNA viruses, can develop mutations in three different ways: by mistake from copying errors during viral replication, through interactions with other viruses infecting the same cell (recombination or reassortment), or they can be induced by host RNA modification systems which are part of host immunity (e.g. a person’s own immune system).
Most mutations are neutral, while others are advantageous or detrimental to the virus. Both neutral and advantageous mutations can become more common as they get passed down to descendant viruses.
The research team from UCL, Cirad and the Université de la Réunion, and the University of Oxford, have so far identified 6,822 mutations in SARS-CoV-2 across the global dataset.
For 273 of the mutations, there is strong evidence that they have occurred repeatedly and independently. Of those, the researchers honed in on 31 mutations which have occurred at least 10 times independently during the course of the pandemic.
To test if the mutations increase the transmission of the virus carrying them, the researchers modelled the virus’s evolutionary tree, and analysed whether a particular mutation was becoming increasingly common within a given branch of the evolutionary tree – that is, testing whether, after a mutation first develops in a virus, descendants of that virus outperform their closely-related individuals that don’t carry it.
The researchers found no evidence that any of the common mutations are increasing the virus’s transmissibility. Instead, they found that some common mutations are neutral, but most are mildly detrimental to the virus.
The mutations analysed included one in the virus spike protein called D614G, which has been widely reported as being a common mutation which may make the virus more transmissible. The new evidence finds that this mutation is in fact not associated with increased viral transmission.
The researchers found that most of the common mutations appear to have been induced by the human immune system, rather than being the result of the virus adapting to its novel human host.
First author Dr Lucy van Dorp (UCL Genetics Institute) said: “It is only to be expected that a virus will mutate and eventually diverge into different lineages as it becomes more common in human populations, but this does not necessarily imply that any lineages will emerge that are more transmissible or harmful.”
Funding: The study was supported by the Newton Fund UK-China NSFC initiative and the Biotechnology and Biological Sciences Research Council (BBSRC).
SARS-CoV-2 is a RNA coronavirus responsible for the pandemic of the Severe Acute Respiratory Syndrome (COVID-19). RNA viruses are characterized by a high mutation rate, up to a million times higher than that of their hosts.
Virus mutagenic capability depends upon several factors, including the fidelity of viral enzymes that replicate nucleic acids, as SARS-CoV-2 RNA dependent RNA polymerase (RdRp). Mutation rate drives viral evolution and genome variability, thereby enabling viruses to escape host immunity and to develop drug resistance.
We analyzed 220 genomic sequences from the GISAID database derived from patients infected by SARS-CoV-2 worldwide from December 2019 to mid-March 2020. SARS-CoV-2 reference genome was obtained from the GenBank database. Genomes alignment was performed using Clustal Omega. Mann–Whitney and Fisher-Exact tests were used to assess statistical significance.
We characterized 8 novel recurrent mutations of SARS-CoV-2, located at positions 1397, 2891, 14408, 17746, 17857, 18060, 23403 and 28881. Mutations in 2891, 3036, 14408, 23403 and 28881 positions are predominantly observed in Europe, whereas those located at positions 17746, 17857 and 18060 are exclusively present in North America.
We noticed for the first time a silent mutation in RdRp gene in England (UK) on February 9th, 2020 while a different mutation in RdRp changing its amino acid composition emerged on February 20th, 2020 in Italy (Lombardy).
Viruses with RdRp mutation have a median of 3 point mutations [range: 2–5], otherwise they have a median of 1 mutation [range: 0–3] (p value < 0.001).
These findings suggest that the virus is evolving and European, North American and Asian strains might coexist, each of them characterized by a different mutation pattern. The contribution of the mutated RdRp to this phenomenon needs to be investigated.
To date, several drugs targeting RdRp enzymes are being employed for SARS-CoV-2 infection treatment. Some of them have a predicted binding moiety in a SARS-CoV-2 RdRp hydrophobic cleft, which is adjacent to the 14408 mutation we identified.
Consequently, it is important to study and characterize SARS-CoV-2 RdRp mutation in order to assess possible drug-resistance viral phenotypes. It is also important to recognize whether the presence of some mutations might correlate with different SARS-CoV-2 mortality rates.
Identification of recurrent mutation hotspots in different geographic areas
A database of 220 complete SARS-CoV-2 patient-isolated genome sequences randomly collected from the GISAID database were aligned and compared to the WSM SARS-CoV-2 reference genome.
In particular, 5 patient-isolated genomes were submitted to the GISAID database in December 2019 (2.3%), 67 in January 2020 (30.45%), 67 in February 2020 (30.45%) and 81 (36.8%) up to the 13th of March 2020. About 33.6% of complete genomes belong to patients aged less than 44 years old, which is the average age of the patients included in the database. The majority of patients are men (55.5%).
We divided our dataset into 4 geographic areas: Asia, Oceania, Europe, North America (Fig. 1).
Within each area we performed alignment analysis comparing patients’ genomes with the reference sequence. The Asian group comprises genomes obtained from patients located in China, Japan, South-East-Asia and India.
The Oceanian group comprises genomes from Australian patients, whereas the European one includes every genome obtained from patients located in each one of the European states (Spain, Portugal, United Kingdom, Netherlands, Italy, Germany, Switzerland, France, Luxemburg, Sweden, Finland, Denmark and Belgium). Finally, the North America group contains genomes from US and Canadian patients.
We evaluated the distribution of SARS-CoV-2 mutations through different geographic areas (see Fig. 1), calculating the mutation frequency within these 4 geographic areas, by normalizing the number of genomes carrying a given mutation per geographic area.
We confirmed the occurrence of mutations located at positions 3036, 8782, 11083, 28144 and 26143 [23–25, 33]. Moreover, we highlighted the presence of additional “conserved mutations” in all the geographic areas, taking into account only those occurring more than 10 times in our database.
Those with a lower occurrence were not reported. These mutations were found in position 1397, 2891, 14408, 17746, 17857, 18060, 23403, 28881, belonging to ORF1ab (1397 nsp2, 2891 nsp3, 14408 RdRp, 17746 and 17857 nsp143, 18060 nsp14), S (23403, spike protein) and ORF9a (28881, nucleocapsid phosphoprotein) sequences, respectively.
We found that 3 out of the 12 most frequent mutations (positions 3036, 8782 and 18060) were silent, whereas one mutation (position 11083) was outside the ORF sequence. On the other hand, mutations 1397, 2891, 14408, 17746, 17857, 23403, 26143, 28144 and 28881 resulted in amino acid changes as follows: 1397 (V to I), 14408 (P to L), 17746 (P to L), 17857 (C to Y), 23403 (D to G), 26143 (G to V), 28144 (L to S). Mutation located at position 28881 is related to a double codon mutation, inducing the substitution of two amino acids, namely 28881 (R to K) and (G to R).
The new amino acid present in 1397 (V to I), 14408 (P to L), 17746 (P to L), 17857 (C to Y), 26143 (G to V) and 28144 (L to S) had a similar isoelectric point compared to the original amino acid present in the reference protein sequences, with the exception of the mutations at positions 23403 (D to G), 28881 (R to K) and 28881 (G to R), where the mutated amino acid has a significantly different isoelectric point.
Further studies are needed to determine whether these mutations have an impact on proteins’ function and structure. We noted that the number and the occurrence of each mutation increase in genomes found out of Asia, reaching a maximum in genomes found in Europe and North America.
We also noted that the viral strains found in Europe and North America are derived from the L-“strain” originated in Asia .
Characterization of geographically distinct hotspots over time
In order to determine the appearance of each mutation, we analyzed each genome from each geographic area over time, by classifying them according to the timing of sample collection, as indicated in the GISAID database.
According to this analysis, 6 time subgroups were defined, namely December 2019 (genomes from 5 patients), 1st–15th Jan. 2020 (genomes from 15 patients), 16th–31st Jan. 2020 (genomes from 52 patients), 1st–15th Feb. 2020 (genomes from 13 patients), 16th–29th Feb 2020 (genomes from 55 patients) and 1st–13th Mar 2020 (genomes from 80 patients).
The number of mutations (normalized by the population taken into account for each period of time) increases over time during viral spread out of Asia (see Fig. 2). No mutations were observed in the Asian genomes analyzed in December 2019.
Interestingly, a different pattern of mutations was observed in Europe between January and February, when a new mutation, at position 14408, emerged (depicted in red). This mutation is located in the RdRp gene. Also starting from February 2020, the emergence of additional new mutations (i.e. 23403, 28881 and 2891–black, electric blue, dark green, respectively) is observed. Over time, we also noted an increase in the frequency of mutation 3036 (orange), already present in mid-January (2.2%).
Moreover, a different pattern of hotspot mutations is clearly distinguishable in viral genomes detected in North American patients starting from March 2020, when the outbreak of positive cases was reported in the US and Canada.
In this group, three novel mutations (17746, 17857 and 18060–light blue, purple and light pink, respectively) were reported. Interestingly, viral genomes present in North American patients carrying RdRp mutation (14%) do not carry any of the European specific mutations.
Mutations hotspots pattern after February 9th, 2020
Given the importance of RdRp for viability and replication of RNA viruses, mutations in this gene are statistically less likely to occur.
However, in some cases, such as in poliovirus, episodes of drug-resistance induced by a point mutation in RdRp have been reported . In our database, the first appearance of a silent RdRp mutation (nt 14804) is manifested on February 9th, 2020 in UK (England), while a different RdRp mutation (nt 14408, amino acid P to L) is observed for the first time in Italy (Lombardy) on February 20th, 2020, when a dramatic increase of the number of European infected patients was reported from the WHO website .
We evaluated the increase/decrease of each mutation frequency before and after February 9th, 2020 across the different geographic areas (Fig. 3). In particular, we observed a strong increase (+60.5%) of genomes carrying the 14408 mutation (affecting RdRp) in Europe, together with an increase of genomes carrying the 3036 mutation (+61.7%), the 23403 mutation (48.1%) and the 28881 mutation (+29.6%) (see upper table Fig. 3).
Simultaneous occurrence of RdRp mutation with other mutations
Next, we analyzed genomes collected after February 9th 2020, when mutation in RdRp gene was reported in the database for the first time. For the purpose of analysis, we divided the genomes into two groups: group 1 contains genomes with mutation in position 14408 (RdRp) (n = 53, 4 North America and 49 European), and group 2 without RdRp mutation (n = 84).
Genomes in group 1 showed an increased number of mutations compared to group 2. In particular, group 1 shows 6 genomes with two mutations (11.3%), 25 genomes with three mutations (47.2%), 21 genomes with four mutations (39.6%), and 1 genome with 5 mutations (1.9%). In group 1, the most reported mutations are the ones in positions 3036, 14408, 23403 and 28881.
Regarding genomes in group 2, 20 do not carry any mutations (23.8%), 25 genomes have a single mutation (29.8%), 19 genomes have two mutations (22.6%), 6 genomes have three mutations (7.1%), 9 genomes have four mutations (10.7%), 2 and 3 genomes have five and six mutations respectively (2.4% and 3.6%). In group 2, the most reported mutations are located at positions 8782, 11083, 17746 and 17857.
The distribution between the two groups in terms of number of mutations is statistically relevant (Fisher-Exact test, p value < 0.001). In particular, group 1 and 2 are significantly different in terms of the distribution of genomes having 0, 1, 3 and 4 numbers of mutations (Fisher-Exact test, p < 0.001) (Fig. 4). This difference, instead, is insignificant when the number of mutations is 2, 5 or 6.
We found that viral strains with RdRp mutation have a median of 3 point mutations [range: 2–5], whereas viral strains with no RdRp mutation have a median of 1 mutation [range: 0–3] (p value < 0.001, Mann–Whitney test). The different distribution between the two groups relative to the number of mutations is statistically significant (Fig. 4).
We also analyzed the most frequent mutations detected: the ones in positions 3036, 23403 and 28881 (in Europe), and the ones in positions 17746, 17857 and 18060 (in North America). Viral genomes carrying each one of these mutations were compared with viral genomes without mutations, by using Mann–Whitney test for paired-groups comparison analysis.
Genomes carrying mutations in positions 3036, 23403, 28881, 17746, 17857 and 18060 show a median of 3–4 mutations (range [2:5]), whereas genomes carrying none of them have a median of 1 or 2 mutations (range [0:3], p-value < 0.001, Mann–Whitney test). This difference is statistically significant and implies that if one of those mutations is present, other mutations are more likely to occur.
Homology study of mutant RdRp protein
Among all mutation sites analyzed, RdRp mutant is particularly interesting given that the enzyme is directly involved in viral replication and its fidelity determines the mutagenic capabilities of SARS-CoV-2. Due to the high homology between RdRps of SARS-CoV and SARS-CoV-2, we aligned SARS-CoV-2 RdRp reference sequence with the reported catalytic site sequence of SARS-CoV RdRp.
The amino acid substitution 323 (P to L) (due to nucleotide mutation 14408) falls outside the catalytic site, in a region that in SARS-CoV is reported to be an Interface Domain, a still poorly characterized surface structure, supposedly implicated in the interaction with other proteins which may regulate the activity of RdRp .
To this regard, it is well-known that SARS-CoV RdRp forms a hollow cylinder-like supercomplex with nsp7 and nsp8, which confer processivity to RdRp . Additionally, replication supercomplex interacts with nsp14, an exonuclease having the Nidovirales-typical proofreading capability.
This activity is important in the context of the mutation rate and for controlling the fidelity in RNA replication. However, critical RdRp residues involved in this interaction are still to be identified, and for this reason further studies are needed to assess the possible role of mutation 14408 concerning RdRp fidelity.
.16. Kirchdoerfer RN. Ward AB structure of the SARS-CoV nsp12 polymerase bound to nsp7 and nsp8 co-factors. Nat. Comm. 2019 doi: 10.1038/s41467-019-10280-3.
.23. Tang X, Wu C, Li X, Song Y, Yao X, Wu X, Duan Y, Zhang H, Wang Y, Qian Z, Cui J, Lu J. On the origin and continuing evolution of SARS-CoV-2. National Sci Rev. 2020 doi: 10.1093/nsr/nwaa036.
.25. Phan T. Genetic diversity and evolution of SARS-CoV-2. Infect Genet Evolu. 2020 doi: 10.1016/j.meegid.2020.104260.
.33. Najjar M, Suebsuwong C, Ray SS, Thapa RJ, Maki JL, Nogusa S, Shah S, Saleh D, Gough PJ, Bertin J, Yuan J, Balachandran S, Cuny GD, Degterev A. Structure guided design of potent and selective ponatinib-based hybrid inhibitors for RIPK1. Cell Rep. 2015 doi: 10.1016/j.celrep.2015.02.052.