The new study conducted by researchers from Case Western Reserve University, Cleveland-USA shows that all COVID-19 infections include a wide mix of SARS-CoV-2 virus variants.
The study findings were published in the peer reviewed journal: PLOS Genetics.
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010200
Our study has revealed significant within host infection diversity of SARS-CoV-2 through full genome sequence analysis. We greatly appreciate the work of international data repositories such as GISAID and NextStrain for curating the millions of reported infections and believe that these repositories are a tremendous and invaluable asset for the monitoring of SARS-CoV-2.
The shear amount of data that needs to be collected and annotated makes it understandable and perhaps necessary to represent these infections by a single consensus sequence. While the practice of reporting a single unambiguous sequence assists with tracking SARS-CoV-2 in regard to evolution and global distribution of lineages, emphasis on reporting the majority nucleotide signature significantly reduces our ability to detect emerging new variants that could ignite a new surge with public health impact.
Additionally, the majority consensus approach limits our ability to understand the complexity of infections, dynamic changes in the viral population over time within infected individuals, and between donor and recipient individuals across transmission events.
The collection of these samples preceded by one month the first reports from the State of Ohio that B.1.617.2 was present in the Cleveland area and occurred at a time when the City lifted its Civil Emergency Proclamation (May 28, 2021) and the State of Ohio rescinded its masking mandate and social distancing orders (June 2, 2021) [45].
Communicating knowledge of B.1.617.2 infection and transmission in Cleveland ahead of relaxing mitigation measures may have contributed to a more cautious public health strategy.
Additionally, nucleotides with biallelic mixtures observed repeatedly in multiple samples have included many of the functionally significant polymorphisms that have been reported in variants of concern. While widely distributed across the SARS-CoV-2 genome, biallelic mixtures are observed at amino acid positions within the SARS-CoV-2 spike protein’s receptor binding domain (e.g., L452R and T478K; B.1.617.2) that are prominent in lineage determination and associated with increased transmission efficiency (e.g., D614G).
Moreover, there are numerous examples of alternate alleles observed in this study that did not clear the 50% threshold for inclusion in conventional consensus sequences that have subsequently become included as lineage defining SNPs in Omicron BA.1 (ORF1a:T3255I; S:T95I; S:K417N; S:H655Y; M:D3G; N:P13L; N:R203K; N:G204R) and/or BA.2 (ORF1a:T3255I; ORF6:D61L; S:T19I; S:G142D; S:K417N; S:H655Y; N:P13L; N:R203K; N:G204R) variants.
In our assessment of transmission from infected donors to recipients we were also interested to see that the overall sequence diversity was largely retained. This appeared to be true regardless of the AAF:RAF mixtures (Fig 5B to 5D). These transmission plots provide both quantitative and qualitative perspectives on SARS-CoV-2 complexity.
Many of the AAF:RAF mixtures were observed to be present in donor and recipient sequences at nearly identical ratios. For the events captured in commuter vans (two independent events) where the direction of transmission was known to be from infected drivers to uninfected passengers, this suggests that the majority of SARS-CoV-2 strain diversity is included during transmission and subsequent infection complexity [35]. This appeared to be true even though transmission occurred within the context of circulating air. Maximum likelihood estimates of the bottlenecks among our transmission pairs (van driver to passengers as well as others Fig 5A) were similar or higher than those observed by Braun [32] and Lythgoe [31], but did not reach the highest values reported by Popa [29].
Finally, given reports that have suggested evolution of complexed populations of SARS-CoV-2 in immunocompromised people [41–44], we were interested to compare infections between IM+ and IM- individuals. While normalizing for overall complexity by comparing infections that occurred within time-similar cohorts, we did not observe increased numbers of iSNVs in IM- patients.
As we indicated previously, we acknowledge limitations in this part of our study. We were not able to study single patients over extended time series known to have extended for up to one year by other investigators [43]. Therefore, we did not have the opportunity to compare potential evolution of SARS-CoV-2 variants within individual IM+ and IM- patients over time. Our conclusions are limited by the fact that we studied a small sample size and were unable to control for potential confounding factors such as duration of infection, genetic differences, and COVID-19 treatment.
The continuing dynamics of the COVID-19 pandemic have become increasingly complex and unpredictable. Underlying the challenge of COVID-19 transmission among humans, Sender et al. estimate that every infected person carries 1 billion to 100 billion virions during peak infection [47].
As a result, despite proof-reading activity within the SARS-CoV-2 RdRp that would limit mutation, efficient virus replication and transmission increasingly favors dispersal of mutations across abundant globally distributed infections (>460,000,000 to-date [19,20]). As levels of immunity stimulated by infection and vaccination fluctuate in the human population ((a) no infection or vaccine exposure; (b) previous infection(s) and no vaccination; (c) no infections and vaccination(s); (d) previous infection(s) and vaccination(s)), the global SARS-CoV-2 population will encounter significant heterogeneity in acquired and natural human host defense mechanisms.
Given the global disparities influencing access to, and uptake of vaccines, as well as availability and practice of non-pharmaceutical interventions, the virus has a free range for making thousands of deleterious as well as advantageous mutations. Therefore, significant opportunity exists for random chance to determine what future chapters of the pandemic will present.
This emphasizes the need to study genetic complexity of SARS-CoV-2 infections more intensively and requires increased examination to achieve a greater understanding of the capacity of this virus to evade our best efforts at mitigation. While there are efforts to sequence the virus and document how it is evolving, most of the scientific community recognizes that this effort needs to be increased [48–53]. Coupled with this effort, approaches to report characteristics of multi-strain complex infections must be considered to enable closer monitoring and full understanding of the virus’s evolutionary capacity.
Lest it be thought that under-reporting of the genetic diversity of infections occurs only with SARS-CoV-2 genome sequence data, consensus sequence reporting is the submission practice for most (if not all) infectious disease pathogens to national and international data repositories (https://VEuPathDB.org) from the US National Institute of Allergy and Infectious Diseases (NIAID) and the Wellcome Trust (example pathogens and vectors include biosecurity pathogens (e.g. Yersinia pestis (plague), Marburg virus, Ebola virus; pathogens of global concern [e.g. Plasmodium, Mycobacterium tuberculosis).
Adequately addressing the under-reporting of infection complexity represented in data repositories has significant potential to alter vitally important characteristics of infectious diseases, including assessment of drug treatment efficacy and resistance and vaccine escape.