The major genetic risk factor for severe COVID-19 is inherited from Neanderthals


Everybody loves Neandertals, those big-brained brutes we supposedly outcompeted and ultimately replaced using our sharp tongues and quick, delicate minds.

But did we really, though? Is it mathematically possible that we could yet be them, and they us?

By the same token, could not the impossibly singular Mitochondrial Eve, her contemporary Y-chromosome Adam, and even the “Out of Africa” hypothesis simply be convenient fictions paleogeneticists tell each other at conferences to give their largely arbitrary haplotype designations and subsequently derived evolutionary trees more credence?

Perhaps one of the best ways to try to answer this question is to ask what the coronavirus has to say about the issue. Svante Pääbo, director of the genetics department at the Max Planck Institute certainly believes that Homo sapiens Neanderthalensis, or just Homo Neanderthalensis, if you prefer, is extinct. Pääbo, the son of 1982 Noble laureate Sune Bergström, has made a nice living off of Neandertal bones, finding gene after gene that is distinctly “Neandertal.”

In 1997, Pääbo successfully sequenced mitochondrial DNA from a specimen found in Feldhofer grotto in the Neander valley. Fast-forwarding past a few recent PR disasters, the Germans were able to capture the productive Swede and set him upon the task of dealing with these inconvenient heirloom skeletons that kept showing up.

This September, Pääbo and colleague Hugo Zeberg announced that the major genetic risk factor for severe COVID-19 is inherited from Neanderthals. (We note that Nature publications prefer to include the h.) By any measure, this is a bold statement.

The team found that severe COVID-19 disease is associated with specific genetic variants in six genes within a 50K-base-pair-long region of chromosome 3 that derived directly from a Neanderthal heritage. Similar investigations have also identified a protective Neanderthal haplotype on chromosome (chr) 12 that reduces the risk of severe COVID-9, and a protective region on chromosome 9 that is associated with the ABO blood groups.

Not content to rest on their laurels, Pääbo and Zeberg have just kicked things up a notch. The pair recently reported on the bioRxiv preprint server that another exclusively Neandertal variant, this time in the promoter region of the DPP4 gene at chr2q24.2, is really pulling the strings on COVID susceptibility.

DPP4 is a widely expressed extracellular dipeptidyl peptidase involved in immune function and glucose metabolism. As it happens, DPP4 is also the receptor gene for the MERS coronavirus. Now we are getting somewhere.

Although other researchers have insisted DPP4 is not a SARS-CoV-2 receptor, it can be tough to ignore coincidental findings like this when therapeutic options are sorely needed. Inhibitors of DPP4 that are already used clinically to treat diabetes appear to have effects on COVID-19 patients.

Amidst the flurry of ongoing SARS genetic research, we reported on Monday that a handful of immune-associated gene variants including IFNAR2 and TYK2 also control COVID outcomes. Curiously, this study also identified DDP9, a sister gene of DD4 residing at chr19p13.3, as a key mediator of inflammatory lung injury. DPP9 has a similar serine protease activity to DPP4, but differs in that it is not membrane-bound.

The DPP4 gene is not too far away from a long-defunct remnant centromere found nearby in the chr2q21.3–q22.1 region. There is also an additional vestigial telomere sitting down in the q13 band.

What are these structures doing here? If pressed for a one-line answer to the question of what is it that makes us human, an excellent answer is the fusion of two small ape chromosomes to make the human chr2. Do Neandertals have a fused chr2?

Of course they do. In fact, they seem to have the same version of the speech gene, FOXP2, which Pääbo put on the map in 2002. Human FOXP2, which differs from the chimp version in two key places, was famously mutated in the “KE’ family from Britain who all had a specific disability in their use of consonants.

In the more recent COVID risk factor studies, Pääbo searched for single nucleotide polymorphisms using data from the 1000 Genomes Project, then checked with the COVID-19 Host Genetics Initiative to see if Neanderthal haplotypes for DDP4 associated with disease severity.

The problem with this line of work is that we don’t have that much sequence data to tell us what makes a Neandertal a Neandertal. There are just a few good genomes available from skeletal remains ~120,000 years old and ~50,000 years old. These come from Europe and southern Siberia. These kinds of statistical shortfalls make normal folks suspicious when their 23AndMe scorecard calls them out as 0.98, or 1.67 Neanderthal.

One point that the COVID epidemic is now successfully driving home is that blind medicine no longer cuts it. Blind medicine refers to anything done in the absence of personal patient sequence data. Above, we took a few cheap shots at paleogenetics and their historical haplotype attributions.

This was for good reason, and we have a few more shots to take. When genomics data is given with respect to a reference sequence, problems can frequently arise. This is because, simply put, there is no such thing as a reference sequence – it, too, is completely arbitrary.

Updates and improvements are made to various reference sequences from time to time, but no true reference sequence will ever be had.

In contrast to the MERS DPP4 receptor, no ACE2 receptor variants have emerged as a risk locus for severe COVID-19. However, many of the other genes associated with SARS-CoV-2 infection process and life cycle have come to light. For example, four variants (rs464397, rs469390, rs2070788 and rs383510) robustly affect expression of the TMPRSS2 serine protease in lung tissue. TMPRSS2-upregulating variants are present at higher frequencies in European and American populations than in the Asian populations.

Perhaps of more immediate concern, now that vaccines are rolling out, is the question of whom the vaccine might help, and in some cases, whom might it harm. The latter prospect is usually framed in terms of the now well-known phenomenon of antibody-dependent enhancement (ADE). While for other diseases like Dengue or respiratory syncytial virus, ADE is taken very seriously, these three bad words are usually dismissed quite handily in discussions of COVID. However, recent research now suggests that ADE in COVID is very much a thing.

In particular, researchers have found that some anti-spike monoclonal antibodies from COVID-19 patients, particularly those against the N-terminal-domain (NTD) of the spike, dramatically enhanced the binding capacity to ACE2, and therefore increased SARS-CoV2 infectivity.

Mutational analysis was used to pinpoint a specific surface region of the NTD. All the patients studied had antibodies against this infectivity-enhancing site. As information about spike sequence mutations and ADE risk factors updates much faster than vaccine development times, it is important for the public to get information about the RNA vaccines currently being offered. Namely, what exact spike sequences are used to generate the vaccine?

Recent reports of new proliferations of spike mutants have raised further questions. How do the potentially vaccine-evading N501Y variant in the receptor binding domain or the double-NTD-deletion variants change the game? Or how does the new D614G spike variant that confers more efficient replication affect transmissibility and pathogenicity? Answers are coming in fast and furious, and they are to be ignored at one’s peril.

The coronavirus SARS-CoV-2 pandemic has caused considerable morbidity and mortality, claiming the lives of more than half a million people to date (WHO 2020). The disease caused by the virus, COVID-19, is characterized by a wide spectrum of severity of clinical manifestations, ranging from asymptomatic virus carriers to individuals experiencing rapid progression to respiratory failure (Vetter et al. 2020). Early in the pandemic it became clear that advanced age is a major risk factor, as well as male sex and some co-morbidities (Zhou et al. 2020).

These risk factors, however, do not fully explain why some have no or mild symptoms while others become seriously ill. Thus, genetic risk factors are being investigated. An early study (Ellinghaus et al. 2020) identified two genomic regions associated with severe COVID-19: one region on chromosome 3 containing six genes and one region on chromosome 9 that determines the ABO blood group.

A recently released dataset from the COVID-19 Host Genetics Initiative finds that the region on chromosome 3 is the only region significantly associated with severe COVID-19 at the genome-wide level (Fig. 1A) while the signal from the region determining ABO-blood group is not replicated (The COVID-19 Host Genetics Initiative 2020).

The genetic variants which are associated with severe COVID-19 on chromosome 3 (chr3: 45,859,651-45,909,024, hg19) are all in high linkage disequilibrium (LD), i.e. they are all strongly associated with each other in the population (r2>0.99), and span 49.4 thousand bases (kb) (Fig. 1B).

A haplotype of such length could be due to positive selection, to an unusually low recombination rate in the region, or to that the haplotype entered the human population by gene flow from Neandertals or Denisovans that occurred some 40,000 to 60,000 years ago (Sankararaman et al. 2012). Positive selection seems unlikely at least under current conditions when the odds ratio for respiratory insufficiency upon SARS-CoV-2 for heterozygous carriers of the haplotype is 1.70 (95% CI, 1.27 to 2.26, Ellinghaus et al. 2020).

The recombination rate in the region is not unusually low (0.53 cM/mb, Kong et al. 2002). We therefore investigated whether the haplotype may have come from Neandertals or Denisovans.

The previously identified lead risk insertion variant (rs11385942) (Ellinghaus et al. 2020) is present in all 33 DNA fragments covering this this position in the Vindija 33.19 Neandertal, a ~50,000-old-old Neandertal from Croatia in southern Europe (Prüfer et al. 2017).

Of 14 single nucleotide variants in the 1000 Genomes Project that are in high LD (r2>0.99) with the insertion risk variant in Eurasian populations, 12 occur in a homozygous form in the Vindija 33.19 Neandertal (Fig. 1B). Four of these variants occur the “Altai” as well as in the Chagyrskaya 8 Neandertals, both of whom come from the Altai Mountains in southern Siberia and are ~120,000 and ~50,000 years old, respectively (Table S1) while none occur in the Denisovan genome. Thus, the risk haplotype is similar to the corresponding genomic region in the Neandertal from Croatia and less similar to the Neandertals from Siberia.

We next investigated whether the risk haplotype of 49.4 kb might be inherited by both Neandertals and present-day people from the common ancestors of the two groups that lived in the order of 500,000 years ago (Prüfer et al. 2014). The longer a present-day haplotype that shared with Neandertals is, the less likely this is to be the case as recombination in each generations will tend to break up haplotypes into smaller segments.

Assuming a generational time of 29 years (Langergraber et al. 2012), the local recombination rate (0.53 cM/Mb), a split between Neandertals and modern humans of 550,000 years (Prüfer et al. 2014), and interbreeding between the two groups ~50,000 years ago, and using a published equation (Huerta-Sánchez et al. 2020), we exclude that the Neandertal-like haplotype derives from the common ancestor (p = 0.0009).

It thus  entered  the modern human population from Neandertals. Its close relationship to the Croatian Vindija 33.19 Neandertal is compatible with that this Neandertal has been shown to be closer to the majority of the Neandertals who contributed DNA to present-day people than the other two Neandertals (Mafessoni et al. 2020).

A Neandertal DNA haplotype present in the genomes of people living today is expected to be more similar to a Neandertal genome than to other haplotypes in the current human population. To investigate the relationships of the 49.4 kb-haplotype to Neandertal and to other human haplotypes we analysed all 5,008 genomes in the 1000 Genome Project for this genomic region. We included all positions which are called in the Neandertal genomes and excluded variants found on only one chromosome and haplotypes seen only once in the 1000 Genomes data.

This resulted in 253 present- day haplotypes containing 450 variable positions. Figure 2 shows a phylogenetic tree relating such haplotypes found more than 10 times (see Fig. S1 for all haplotypes). We find that all risk haplotypes associated with the risk for severe COVID-19 form a clade with the three high-coverage Neandertal genomes. Within this clade, they are most closely related to the Vindija 33.19 Neandertal.

Among the individuals in the 1000 Genome Project, the Neanderthal-derived risk haplotypes is almost completely absent in Africa, consistent with that gene flow from Neandertals into African populations was limited and probably indirect (Chen et al. 2020). The Neandertal haplotype occurs in South Asia at a frequency of 30%, in Europa at 8%, among admixed Americans at 4% and at lower frequencies in East Asia.

The highest frequency occurs in Bangladesh, where more than half the population (63%) carries at least one copy of the Neandertal risk variant and 13% is homozygous for the variant. The Neandertal variant may thus be a substantial contributor to COVID-19 risk in certain populations.

Currently it is not known what feature in the Neandertal-derived region confers risk for severe COVID-19 and if the effects of any such feature is specific to current coronaviruses or indeed to any other pathogens.

Once this is elucidated, it may be possible to speculate about the susceptibility of Neandertals to relevant pathogens. However, in the current pandemic, it is clear that gene flow from Neandertals has tragic consequences.

Figure 1. Genetic variants associated with severe COVID-19. A) Manhattan plot of a genome-wide association study comprising 3,199 hospitalized COVID-19 patients and 897,488 population controls. Dashed line indicates genome wide significance (p=5e-8). Data and figure modified from the COVID- 19 Host Genetics Initiative ( B) Linkage disequilibrium between a lead risk variant (rs11385942, Ellinghaus et al. 2020) and genetic variants in Eurasian populations. Red marks genetic variants where alleles are correlated to the risk variant (r2>0.1) and the correlated alleles match the Vindija 33.19 Neandertal genome. Note that some individuals carry a Neandertal- like haplotype that extends to ~400 kb. The x axis gives hg19 coordinates.
Figure 2. Phylogenetic tree relating DNA sequences covering the core Neandertal haplotype in 1000G individuals. The coloured box indicates Neandertal and risk haplotypes for severe COVID-19. Arabic numbers indicate bootstrap support (100 replicates). Phylogenies were constructed using maximum-likelihood and the Hasegawa-Kishino-Yano-85 model (Hasegawa et al. 1985). The tree is rooted with the inferred ancestral sequence of present-day humans from Ensembl (Yates et al. 2020). There are no heterozygous positions in this region in the three Neandertal genomes.

Figure 3. Geographical distribution of the Neandertal core haplotype conferring risk for severe COVID-19. Empty circles denote populations where the Neandertal haplotype is missing. Data from the 1000 Genomes Project.


  1. Chen, L. et al. (2020) Identifying and Interpreting Apparent Neanderthal Ancestryy in African Individuals. Cell. doi: 10.1016/j.cell.2020.01.012
  2. Ellinghaus, D. et al. (2020) Genome-wide Association Study of Severe Covid-19 with Respiratory Failure. NEJM, doi: 10.1056/NEJMoa2020283.
  3. The COVID-19 Host Genetics Initiative (2020) The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. doi: 10.1038/s41431-020-0636-6.
  4. Hasegawa, M. et al. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. doi: 10.1007/BF02101694
  5. Huerta-Sánchez, E., et al. (2014) Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. doi: 10.1038/nature13408.
  6. Langergrabe, K.E. (2012) Generation times in wild chimpanzees and gorillas suggest earlier divergence times in great ape and human evolution. Proceedings of the National Academy of Sciences. doi: 10.1073/pnas.1211740109
  7. Kong, A. et al. (2002). A high-resolution recombination map of the human genome. Nat. Genet. doi: 10.1038/ng917
  8. Mafessoni, F. et al (2020) A high-coverage Neandertal genome from Chagyrskaya Cave.Proceedings of the National Academy of Sciences. doi: 10.1073/pnas.2004944117
  9. Prüfer, K. et al. (2014). The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. doi: 10.1038/nature12886
  10. Prüfer, K. et al. (2017). A high-coverage Neandertal genome from Vindija Cave in Croatia.Science. doi: 10.1126/science.aao1887
  11. Sankararaman, S. et al. (2012). The date of interbreeding between Neandertals and modern humans. PLoS Genet. doi: 10.1371/journal.pgen.1002947
  12. Vetter, P., et al. (2020) Clinical features of covid-19. BMJ. doi: 10.1136/bmj.m1470
  13. WHO Coronavirus disease (COVID-2019) situation report 2 July 2020.
  14. Yates, A.D. et al. (2020) Ensembl 2020. Nucleic Acids Research. doi: 10.1093/nar/gkz966
  15. Zhou, F. et al. (2020) Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. doi:10.1016/S0140- 6736(20)30566-3

More information: Hugo Zeberg et al. The MERS-CoV receptor gene is among COVID-19 risk factors inherited from Neandertals, bioRxiv (2020). DOI: 10.1101/2020.12.11.422139


Please enter your comment!
Please enter your name here

Questo sito usa Akismet per ridurre lo spam. Scopri come i tuoi dati vengono elaborati.