Experts say it’s likely that some version of the disease will linger for years. But what it will look like in the future is less clear.
Will the coronavirus, which has already killed more than 2 million people worldwide, eventually be eliminated by a global vaccination campaign, like smallpox?
Will dangerous new variants evade vaccines? Or will the virus stick around for a long time, transforming into a mild annoyance, like the common cold?
Eventually, the virus known as SARS-CoV-2 will become yet “another animal in the zoo,” joining the many other infectious diseases that humanity has learned to live with, predicted Dr. T. Jacob John, who studies viruses and was at the helm of India’s efforts to tackle polio and HIV/AIDS.
But no one knows for sure. The virus is evolving rapidly, and new variants are popping up in different countries. The risk of these new variants was underscored when Novavax Inc. found that the company’s vaccine did not work as well against mutated versions circulating in Britain and South Africa. The more the virus spreads, experts say, the more likely it is that a new variant will become capable of eluding current tests, treatments and vaccines.
For now, scientists agree on the immediate priority: Vaccinate as many people as quickly as possible. The next step is less certain and depends largely on the strength of the immunity offered by vaccines and natural infections and how long it lasts.
“Are people going to be frequently subject to repeat infections? We don’t have enough data yet to know,” said Jeffrey Shaman, who studies viruses at Columbia University. Like many researchers, he believes chances are slim that vaccines will confer lifelong immunity.
If humans must learn to live with COVID-19, the nature of that coexistence depends not just on how long immunity lasts, but also how the virus evolves.
This question of what happens next attracted Jennie Lavine, a virologist at Emory University, who is co-author of a recent paper in Science that projected a relatively optimistic scenario: After most people have been exposed to the virus—either through vaccination or surviving infections—the pathogen “will continue to circulate, but will mostly cause only mild illness,” like a routine cold.
While immunity acquired from other coronaviruses – like those that cause the common cold or SARS or MERS – wanes over time, symptoms upon reinfection tend to be milder than the first illness, said Ottar Bjornstad, a co-author of the Science paper who studies viruses at Pennsylvania State University.
“Adults tend not to get very bad symptoms if they’ve already been exposed,” he said.
The prediction in the Science paper is based on an analysis of how other coronaviruses have behaved over time and assumes that SAR-CoV-2 continues to evolve, but not quickly or radically.
The 1918 flu pandemic could offer clues about the course of COVID-19. That pathogen was an H1N1 virus with genes that originated in birds, not a coronavirus. At the time, no vaccines were available.
The U.S. Centers for Disease Control and Prevention estimates that a third of the world’s population became infected. Eventually, after infected people either died or developed immunity, the virus stopped spreading quickly. It later mutated into a less virulent form, which experts say continues to circulate seasonally.
“Very commonly the descendants of flu pandemics become the milder seasonal flu viruses we experience for many years,” said Stephen Morse, who studies viruses at Columbia University.
As new variants emerge – some more contagious, some more virulent and some possibly less responsive to vaccines – scientists are reminded how much they don’t yet know about the future of the virus, said Mark Jit, who studies viruses at the London School of Hygiene and Tropical Medicine.
“We’ve only known about this virus for about a year, so we don’t yet have data to show its behavior over five years or 10 years,” he said.
Of the more than 12 billion coronavirus vaccine shots being made in 2021, rich countries have bought about 9 billion, and many have options to buy more. This inequity is a threat since it will result in poorer countries having to wait longer for the vaccine, during which time the disease will continue to spread and kill people, said Ian MacKay, who studies viruses at the University of Queensland.
That some vaccines seem less effective against the new strains is worrisome, but since the shots provide some protection, vaccines could still be used to slow or stop the virus from spreading, said Ashley St. John, who studies immune systems at Duke-NUS Medical School in Singapore.
Dr. Gagandeep Kang, an infectious diseases expert at Christian Medical College at Vellore in southern India, said the evolution of the virus raises new questions: At what stage does the virus become a new strain? Will countries need to re-vaccinate from scratch? Or could a booster dose be given?
“These are questions that you will have to address in the future,” Kang said.
The future of the coronavirus may contrast with other highly contagious diseases that have been largely beaten by vaccines that provide lifelong immunity—such as measles. The spread of measles drops off after many people have been vaccinated.
But the dynamic changes over time with new births, so outbreaks tend to come in cycles, explained Dr. Jayaprakash Muliyil, who studies epidemics and advises India on virus surveillance.
Unlike measles, kids infected with COVID-19 don’t always exhibit clear symptoms and could still transmit the disease to vulnerable adults. That means countries cannot let their guard down, he said.
Another unknown is the long-term impact of COVID-19 on patients who survive but are incapacitated for months, Kang said.
The “quantification of this damage” – how many people can’t do manual labor or are so exhausted that they can’t concentrate—is key to understanding the full consequences of the disease.
“We haven’t had a lot of diseases that have affected people on a scale like this,” she said.
In December 2019, China notified the World Health Organization (WHO) that several people with severe pneumonia had been admitted to an intensive care unit at Jin-Yin- Tan Hospital in Wuhan City in Chinese Hubei Province [1–3]. It was soon established that these patients were infected with a virus never observed in humans before.
This novel coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which belongs to family Coronaviridae, genus Betacoronavirus and subgenus Sarbecovirus, has a positive single-stranded RNA linear genome of 29.9 kb [1–6]. Together with the SARS- CoV and MERS-CoV, SARS-CoV-2 is one of the CoVs that can cause severe disease in the human population .
Betacoronaviruses, such as SARS-CoV or MERS-CoV, have a propensity for host jumping from various mammal species to humans. Similar to these other coronaviruses, SARS-CoV-2 certainly has a zoonotic origin, sharing 96.2% identity with a CoV bat strain, RaTG13, which has been found in horseshoe bats .
Given the current situation, it is essential to monitor the diversity of this new human pathogen and its potential implications for pathogenicity and infectivity. The diversity of SARS-CoV-2 is the result of the conjunction of patterns of variability at the population and the intra-host levels, which are products of selective, stochastic and spatio-temporal processes .
The phylogenetic analysis of the consensus genomic sequences of SARS-CoV-2 ob- tained from around the globe reveals a structure determined by geographical and temporal patterns of transmission. The largest clade in the SARS-CoV-2 phylogeny is defined by the presence of the 614G mutation in the spike (S) glycoprotein .
The SARS-CoV-2 S protein binds to the angiotensin-converting enzyme 2 (ACE2) on the surface of the human cell membrane mediating the fusion and entry of the virus. The study of the evolution of this protein in the outbreaks of other coronaviruses suggests that it plays a major role in the interspecies jump and in the adaptation to the ACE2 receptor, determining the infectivity of the virus [10,11].
In various countries, the viral sequences bearing the D614G mutation have become predominant after introduction, suggesting an adaptive advantage related to infectivity [12–14]. However, these conclusions have been contested in other studies [15,16]. Another gene responsible for the current phylogenetic structuring of SARS-CoV-2 is ORF1.
This gene is composed of two open reading frames (ORFs) (a and b) coding for 16 nonstructural proteins (nsps) that compose the viral replication–transcription complex, including the RNA-dependent RNA polymerase (nsp12), . ORF1 shows the highest number of missense mutations in SARS-CoV-2 , mainly in the nsp3 gene, a pattern that has also been observed in MERS-CoV .
One of the main mutations identified in the SARS-CoV-2 ORF1b gene is P314L, which occurs simultaneously with the D614G mutation of the S gene . It is expected that the ORF1 gene acquired the adaptive mutations necessary to adjust the viral replication machinery to the new host, as shown in the adaptation of avian influenza virus to mammalian hosts [18,19].
Therefore, the P314L mutation may accelerate viral replication . However, to our knowledge, no studies have addressed this hypothesis.
The nucleocapsid (N) protein is a structural protein involved in packaging the viral RNA . The nucleocapsid of SARS-CoV-2, together with the S protein, modulates the antibody response . Two important mutations in this gene, R203K and G204R, are found in sequences carrying the D614G mutation in the S gene and the P314L mutation in the ORF1b gene.
Therefore, these mutations may be related to the interaction of the N protein with the membrane protein of SARS-CoV-2 .
The intra-host diversity of RNA viruses is associated with the quasispecies concept. A quasispecies is a cloud of diverse variants that are genetically linked through mutation, that interact cooperatively on a functional level, and that collectively contribute to the char- acteristics of the population .
Deep sequencing has revealed evidence of quasispecies in SARS-CoV [24–27] and MERS-CoV [28–31]. This intra-host diversity contributes to the adaptation of these viruses to the human host. Analysis of the MERS-CoV sequence shows an out-of-frame deletion, leading to the loss of a large part of the S2 subunit of S protein and resulting in the production of a shortened protein bearing only 801 amino acids.
Although this deletion is expected to lead to the production of defective viruses, alternatively, this mutation may block spike-specific MERS-CoV neutralizing antibodies . During the out- break of MERS-CoV in the Republic of Korea in 2015, the virus presenting the D510G and I529T mutations at different intra-host frequencies in the receptor-binding domain (RBD) of the S protein showed increased resistance against neutralizing monoclonal antibodies and a reduced sensitivity to antibody-mediated neutralization .
A major issue in the current pandemic is to determine the diversity of the SARS-CoV-2 quasispecies and its potential contributions to population diversity and virus adaptation. We therefore studied the dynamics of the diversity of intra-host variants over a six-week period using public next-generation sequencing (NGS) SARS-CoV-2 sequences from the State of Victoria (Australia).
The diversity of SARS-CoV-2 in Victoria is a snapshot of global diversity, because (i) most infected patients acquired the virus abroad and imported it into Australia and (ii) the epidemiological analysis shows that onward transmission of the contagion was limited .
We analyzed the intra-host diversity of the S, N and ORF1 genes of 210 samples from the State of Victoria collected between February and April 2020. First, we described the frequency and presence of synonymous and nonsynonymous intra-host single-nucleotide variants (iSNVs) in the SARS-CoV-2 genes. Then, we studied the changes in the diversity of shared iSNVs over time. Finally, we analyzed the distribution of the diversity of iSNVs in the different clades of the phylogenetic tree of consensus sequences. Our results show evidence of iSNVs transmission, and modification over time in this diversity.
Materials and Methods
In total, 217 samples of PRJNA613958 BioProject and related metadata were recovered from the NCBI website using the SRA-Toolkit (http://ncbi.github.io/sra-tools/). This BioProject involves more than 1000 Australian NGS samples from the State of Victoria. Our dataset represents a subsample of this BioProject obtained between 31 January 2020 and 8 April 2020. Our selection criteria involved sequences obtained using NextSeq 550 technology only.
Identification of iSNVs
iSNVs were identified in the ORF1a, ORF1b, S and N genes of the SARS-CoV-2 genome in each of the samples. We considered iSNVs to be those with a median alternative al- lele frequency (AAF) between 5% and 50%. The bioinformatics pipeline involved the following steps: low-quality read trimming with Trimmomatic ; alignment of the reads using Bowtie2  with the SARS-CoV-2 reference sequence ; conversion of the same file alignments to bam files using samtool ; sorting the bam files and remov- ing duplicate sequences with MarkDuplicate (http://broadinstitute.github.io/picard/). ViVarSeq  scripts derived a consensus sequence from the alignments obtained in the last step.
The trimmed reads were realigned to the consensus sequence using Bowtie2. VirVarSeq identified the variants and their frequencies. This pipeline was semi-automated with Snakemake .
In each of the genes, the iSNVs were identified only in samples having a minimum coverage of 30 reads for 90% of the positions analyzed. Genomic positions with less than 30 reads and/or a Phred score lower than 20 were discarded for variant identification. We only considered variants supported by at least 5 reads. In parallel, we identified iSNVs in the SARS-CoV-2 genome with the V-Phaser2 . Only iSNVs satisfying the aforemen- tioned quality criteria and also identified by V-Phaser2 were included in the analysis.
Overall, 210 of these samples matched our criteria for at least one of the SARS-CoV-2 genes under analysis, 5 samples were eliminated for not presenting sufficient coverage and 2 for presenting more than 100 variants (outliers). Figure S1 presents the coverage and depth of the four SARS-CoV-2 genes in the 210 samples. The median of reads for each of the positions varied between 92 and 3100 for the S gene, 76 and 4160 for ORF1a, 392 and 4324 for ORF1b and from 318 to 2987 for the N gene. The median Phred score in the positions with iSNVs was 34 (IQR (33–35)) and the median number of reads representing a specific iSNV was 40 (IQR (23–120)).
Temporal Dynamics of iSNV Diversity
The patterns of temporal variation in the diversity of synonymous and nonsynony- mous iSNVs were represented using the ggplot2  and EvoFreq  packages.
Identification of the iSNV Haplotypes
To determine if nonsynonymous iSNVs cosegregate in the same sequences (haplo- types), all reads spanning the specific region of the SARS-CoV-2 genome containing these variants were identified using the pysam (https://pysam.readthedocs.io/en/latest/faq. html) module of Python from the alignment of reads to the reference sequence of this virus. At each position of the region of interest, the reads and the respective nucleotides were identified. With this information, we determined the proportion of the reads that carried the combination of variants of interest.
Validation of iSNVs Haplotypes with Other Sequence Datasets
The existence of the viral haplotypes in the S gene was verified in two additional sam- ple subsets available at NCBI: 232 samples from PRJNA625551 BioProject and 120 samples from PRJNA610428 BioProject.
For this analysis, 863 sequences of SARS-CoV-2 from Victoria were recovered on 29 June 2020 from GISAID  using the criteria of complete sequence and exclusion of low coverage. To improve the temporal signal of the phylogenetic analysis, 14 sequences from the Wuhan region collected during the month of January 2020 and the reference sequence from SARS-CoV-2 were also recovered from the GISAID website.
The sequences of the ORF1, S and N genes were concatenated and aligned using MAFFT . These alignments were visually inspected with Unipro UGENE . The phylogenetic and temporal signal of this alignment was analyzed according to the guidelines suggested in Mavian et al. .
The phylogenetic signal was evaluated using iqtree  with the likelihood mapping analysis. To explore the presence of a temporal signal, a phylogenetic reconstruction was applied using iqtree [47,48] software with the -m option grouped to TEST, allowing the identification of the best model for partitions representing the four genes and a bootstrap analysis of 1000 replicates.
The outliers were identified in this phylogenetic tree with the TempEst  software, using the regression analysis of the phylogenetic distance of the tips to the root and the collection time. This pruned tree was scaled with treedater  software using a strict molecular clock. The phylogenetic clusters were annotated using Pangolin software (https://github.com/cov-lineages/pangolin).
Single nucleotide variants (SNVs) in GISAID and NGS consensus sequences were identified from the multiple alignment with the QSutils  package in R. Mutations present in at least 1% of the samples were included in the study.
Diversity of SARS-CoV-2 iSNVs
Our analysis showed that a significant percentage of the SARS-CoV-2 sequences of the S, N and ORF1 genes bore evidence of quasispecies diversity. This result corroborates previous studies that showed intra-host variability during epidemic outbreaks of SARS- CoV [24–27], MERS-CoV [28–31] and more recently SARS-CoV-2 [52–55]. The median number of iSNVs per patient estimated in our study was low in contrast to previous studies in which a high number of minority variants were estimated per sample [52–55].
However, our results were consistent with an extensive analysis of the available NGS data for SARS-CoV-2 from the NBCI . We believe that these differences are due to sequencing strategies, bioinformatics pipelines and filtering criteria that can lead to this type of discrepancy . Here, we opted for a conservative approach where only iSNVs identified using two different strategies were selected.
We found that the most frequent intra-host substitutions in Victoria are G > T, C
T, T > C and A > G. Previous studies have shown the important role that C > T and T > C mutations play in the dynamics of SNVs in consensus sequences of SARS-CoV-
The C > T substitutions can be the product of a host defense mechanism mediated by enzymes of the APOBEC3 family [57,58]. A comparison of the intra- and inter-host substitution patterns in samples from two different cities in the United States found a high prevalence of C > T substitutions (except for intra-host diversity in one city) .
The G > T transversion is the third most common type of substitution in the consensus sequences of SARS-CoV-2 worldwide, but in Asia and Oceania it is the second most common type . Likewise, our results indicate that the most frequent nucleotide substitutions were G > T and C > T, suggesting local differences in substitution patterns. The high frequency of G > T transversion in SARS-CoV-2 sequences is striking, because transitions are more likely than transversions . This transversion is probably initiated by 8-oxoguanine derived from a reactive oxygen species , implying the active role of oxidative stress in the emergence of this variation, a hypothesis that needs further investigation.
We focused our analysis on the genes that played a major role in the adaptation of SARS-CoV  and MERS-CoV [28–30] to the human host, and those in SARS-CoV-2 that modulate the antibody response . The majority of iSNVs in these genes were nonsynonymous. This predominance of nonsynonymous substitutions has already been documented for SARS-CoV-2, both at the consensus sequence level [57,58], and at the intra-host level [53,54]. As mentioned above, the C > T and G > T substitutions were predominant in the dataset. Because at least C > T may be the product of the action of the host’s enzymatic defense system, most nonsynonymous mutations in SARS-CoV-2 likely do not involve a selective advantage.
Although a significant proportion of SARS-CoV-2 quasispecies diversity may not represent adaptive variation, the virus is probably under selective pressure as a result of the interspecies jump to a new host. This adaptive process should be particularly evident in the proteins involved in the pathogenicity of the virus and in non-structural proteins that interact with the host’s immune system. Our data showed that the S gene has the highest density of nonsynonymous variants of the four genes analyzed.
The aminoacid substitution G446V was identified in the RBD domain of the S protein involved in binding to the human ACE2 receptor. In addition, the G999C mutation was observed in the HR regions involved in membrane fusion during virus entry into the host cell. Zhang et al. suggest that the RBD domain and the HR regions played a determining role in the adaptation of SARS-CoV to the human host .
These authors identified two groups of amino acid sites under positive selection in consensus sequences: one related to the interspecies jump, mainly present in the RBD domain, and the other, involved in the adaptation to the new host and abundant in the HR region. Because our samples derive from the early stage of the adaptation of SARS-CoV-2 to the human host, it is difficult to affirm the adaptive value of the iSNVs identified in the important functional regions of the S gene. However, the emergence of this variability in regions critical for the pathogenicity of the virus requires spatial and temporal tracking.
The ORF1a gene showed a significant number of nonsynonymous iSNVs in the Late group, and some of these substitutions had AAFs greater than 20%. There is evidence for broad positive selection acting on the MERS-CoV ORF1a . This selective pressure on a gene encoding non-structural proteins may be related to the interaction of these proteins with the human immune system. Alternatively, the replication machinery encoded by the ORF1a gene may be an essential element in the adaptation of the virus to its new host, as established for the adaptation of avian influenza A viruses to mammalian hosts .
The N gene had few nonsynonymous iSNVs. We identified the A29039T variant that led to the substitution of lysine by a stop codon at position 256. A previous analysis that characterized evolution of the viral lineages and transmission in SARS-CoV-2, considering both the consensus information and the iSNVs, also found the A29039T variant in a significant proportion of the samples analyzed. This concordance of results raises a red flag in regard to the efficacy of a SARS-CoV-2 vaccine directed against the N protein, because the stop codon produced by A29039T affects the linker region suppressing the immunogenic domain of this protein .
Viral Haplotypes and Quasispecies
Recent studies have demonstrated the presence of different haplotypes when comparing the diversity between the respiratory system and the intestinal tract [63,64]. Here, we identified four potential viral haplotypes in the investigated SARS-CoV-2 genes. Since we could not experimentally confirm the presence of these haplotypes, e.g., by digital PCR , we verified the presence of the most unexpected one, the N30G/S31Stop/F32Stop/R34P/G35R haplotype of the S gene, in a subset of samples from a North American cohort.
This haplotype was found in 10 of the 232 analyzed samples that were collected in the same time period as the Australian ones, with a frequency ranging from 3% to 32% (Table S5). Both American and Australian sequences were obtained with the ARTIC PCR-tiling strategy, which involves a high number of overlapping amplicons of ~400bp .
We observed that this haplotype fell within the target region of one of the 218 ARTIC primers. It has been suggested that non-removal of primers from sequencing reads could lead to an underestimation of the frequency of iSNVs . Trimming of the ARTIC primers from the reads did not affect the identification and frequency of the N30G/S31Stop/F32Stop/R34P/G35R haplotype in the S gene (Table S6).
In order to evaluate a potential bias related to the ARTIC procedure, we analyzed 120 additional samples from another dataset obtained by metagenomics. In this dataset, we were unable to recover any of the mutations that are part of the proposed haplotype in the S gene. However, sequencing depth was much lower in these data, and we noted a significant number of iSNVs per sample in this cohort, suggesting a poor quality of sequencing data.
Such a low sequencing depth makes it unsuitable for the identification of minority variants, possibly explaining the non-identification of the haplotype (data not shown). We also verified that the haplotype did not fall into a region known to be prone to Illumina sequencing artifacts , as the ARTIC procedure was applied on all samples of the Australian cohort. Therefore, if it cannot be ruled out that this potential haplotype results from sequencing artifacts linked to the ARTIC amplicon strategy, its asymmetric distribution between the Early and Late groups of variants remains difficult to explain, as the ARTIC strategy was applied on all samples of the Australian cohort.
Andrés et al. (2020) identified several deletions upstream of the S1/S2 cleavage site of S protein, in a study that included patients with mild and severe COVID-19 symptoms. These deletions were present at low frequencies and led to in-frame stop codons. The presence of stop codons close to the cleavage site of S1/S2 led to the loss of S2 translation.
The authors proposed that the S1 subunit produced by this defective haplotype is released as a free protein in the extracellular space. This free S1 protein could bind to the human ACE2 cell receptor, thereby competing with complete viral particles and reducing the severity of infection.
In this scenario, transmission of the haplotypes bearing deletions represents a selective advantage since attenuation of the infection increases viral transmission . However, this study does not propose a molecular scenario to understand how these haplotypes with only S1 subunit are transmitted. Our analysis identified several other haplotypes with high frequencies, such as 853Stop/854A/856H in the ORF1a gene (~27%), supporting previous findings and raising the question of the role of these potential defective viral haplotypes within the quasispecies. Further experimental research is necessary to evaluate these hypotheses.
Changes in SARS-CoV-2 iSNVs Diversity over Time
We observed that the diversity of SARS-CoV-2 iSNVs changed over time, between patients. This change implied the emergence of a more heterogeneous pattern of diversity (Late group) that occasionally affected important antigenic regions of the virus proteins. The advent of this so-called Late group was concomitant to the epidemic peak in the State of Victoria and the related public health actions, such as the closing of the Australian border and the declaration of a state of emergency .
The Late-group population was enriched in patients who had acquired the virus through local transmissions. This relationship does not imply causality, and caution must be taken given the limited epidemiological and clinical information included in our analysis. Other epidemiological variables and infor- mation from transmission clusters may help clarify the emergence of diversity observed in the Late group.
It was shown that patients with severe COVID-19 symptoms present a more important intra-host diversity than patients with mild symptoms . Furthermore, Kuipers et al. established that age is significantly associated with intra-host viral genetic diversity . The identification of these factors inducing new diversity is worth exploring in an in-depth analysis of existing genomic data integrated with extensive epidemiological and clinical information.
Transmission and Bottleneck of SARS-CoV-2 iSNVs
Almost one-third of SARS-CoV-2 quasispecies diversity in Victoria was shared be- tween patients, suggesting host-to-host transmission. If potential artifacts are excluded, each iSNV would be shared by a median of three patients. Different studies suggest a relatively important genetic bottleneck in SARS-CoV-2 [55,64,72,73].
By analyzing the intra- host diversity of the S gene in two transmission clusters, Sun et al. evidenced a significant bottleneck that would lead to only 6% of the variants being stably transmitted . Such a narrow bottleneck has also been demonstrated in the analysis of household transmission, which also suggested that this transmission is governed by stochastic processes .
Our results suggest that although transmission may be limited, this process does not seem to be random. Less than one-third of the nonsynonymous variants were shared between samples; however, they represented 70% of the total number of occurrences. This nonrandom transmission of quasispecies diversity has been observed in other RNA viruses [74,75], and some of these variants could be expected to have played a role in the response of the virus to the immune system.
Other factors may—at least partially—explain the shared diversity patterns of SARS- CoV-2 in Victoria. As suggested above, the action of APOBEC3 enzymes may lead to characteristic neutral or deleterious nucleotide substitution profiles in SARS-CoV-2. These enzymes act in specific sequence contexts, which causes the recurrence of substitutions, as suggested in SARS-CoV-2 genomic sequences . It is possible that a part of the iSNVs diversity shared among patients is the product of this recurrence mediated by the host defense system.
This hypothesis could explain the presence of the same iSNVs in different phylogenetic clades. In contrast, some of the SNVs identified in the current study were shared exclusively by patients with the same or a close Pangolin lineage. Then, it is necessary to determine to what extent the diversity of iSNVs is due to transmission between patients or to de novo intra-host mechanisms in SARS-CoV-2.
Investigating whether iSNVs are transmitted, generated de novo or both, requires a large-scale longitudinal analysis of the evolution of intra and inter-host variability. It is likely that both transmission and de novo generation contribute to the diversity of SARS-CoV-2 quasispecies.
Here, we found evidence of intra-host quasispecies diversity in the NGS sequences of SARS-CoV-2 sampled in Victoria. This diversity was dynamic in time and possibly part of this variation was transmitted during the first epidemic episode in this Australian state.
Supplementary Materials: The following are available online at https://www.mdpi.com/1999-491 5/13/1/133/s1, Figure S1: Coverage and depth of 210 SARS-CoV-2 samples, Figure S2: Compar- ison of the density distributions of synonymous and nonsynonymous iSNVs in four SARS-CoV-2 genes, Figure S3: Genomic distribution of nonsynonymous iSNVs in the ORF1a gene, Figure S4: Genomic distribution of nonsynonymous iSNVs in the ORF1b gene, Figure S5. iSNVs with distri- butions limited to specific Pangolin lineages, Table S1: iSNVs of 210 Australian samples, Table S2: Nucleotide substitutions observed in four SARS-CoV-2 genes from Australian samples, Table S3: Temporary groups of shared nonsynonymous iSNVs, Table S4: Correlation between the haplotype frequency and the alternative allele frequency, Table S5: Samples of PRJNA625551 BioProject present- ing the haplotype N30G/S31Stop/F32Stop/R34P/G35R in the S gene; Table S6: Identification of the N30G/S31Stop/F32Stop/R34P/G35R haplotype with and without trimming ARTIC primers.
Author Contributions: Conceptualization, A.A., J.-C.A. and N.B.; methodology, A.A.; validation, A.A., J.-C.A. and N.B.; formal analysis, A.A.; investigation, A.A., J.-C.A. and N.B.; writing—original draft preparation, A.A.; writing—review and editing, A.A., J.-C.A. and N.B. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable.
Data Availability Statement: The pipeline for iSNVs identification is available in https://github. com/alexarmerov/SARS-CoV-2.
Acknowledgments: We are grateful to the COVID-19 genomics response team of Melbourne for generating the data used in the current analysis. This work was supported by the Shanghai Mu- nicipal Science and Technology Major Project (Grant No. 2019SHZDZX02) and this benefited from the Montpellier Bioinformatics Biodiversity platform supported by the LabEx CeMEB, an ANR “Investissements d’avenir” program (ANR-10-LABX-04-01).
reference link: https://doi.org/10.3390/v13010133