Researchers at Vanderbilt University Medical Center and colleagues have identified genetic factors that increase the risk for developing pneumonia and its severe, life-threatening consequences.
Their findings, published recently in the American Journal of Human Genetics, may aid efforts to identify patients with COVID-19 at greatest risk for pneumonia, and enable earlier interventions to prevent severe illness and death.
Despite the increasing availability of COVID-19 vaccines, it will take months to inoculate enough people to bring the pandemic under control, experts predict. In the meantime, thousands of Americans are hospitalized and die from COVID-19 each day.
“This study is so important because we performed analyses separately in participants of Caucasian ancestry as well as African ancestry to identify genetic risk factors contributing to pneumonia susceptibility and severity,” said Jennifer “Piper” Below, Ph.D., associate professor of Medicine and the paper’s corresponding author.
“Combined with systemic racism and socioeconomic factors that have been reported by others, these genetic risk differences may contribute to some of the disparities we observe in COVID-19 outcomes,” Below said.
The researchers conducted genome-wide association studies (GWAS) of more than 85,000 patients whose genetic information is stored in VUMC’s BioVU biobank and which has been linked to “de-identified” electronic health records stripped of personal identifying information. GWAS can identify associations between genetic variations and disease.
With colleagues from the University of North Carolina at Chapel Hill, the University of Texas MD Anderson Cancer Center in Houston, and the Icahn School of Medicine at Mount Sinai in New York, the VUMC researchers identified nearly 9,000 cases of pneumonia in patients of European ancestry, and 1,710 cases in patients of African ancestry.
The strongest pneumonia association in patients of European ancestry was the gene that causes cystic fibrosis (CF). This disease produces abnormally thick mucus leading to chronic infections and progressive respiratory failure.
In patients of African ancestry, the strongest pneumonia association was the mutation that causes sickle cell disease (SCD), a red blood cell disorder that increases the risk for pneumonia, influenza and acute respiratory infections.
Children with CF and SCD are at particular risk for severe disease if they contract COVID-19.
The researchers found that “carriers” who are unaffected by CF yet carry a copy of the CF gene had a heightened susceptibility to pneumonia, and those who are unaffected by SCD yet carry a copy of the SCD mutation were at increased risk for severe pneumonia.
Further studies will be needed to determine whether these carriers also bear “a silent, heightened risk for poor outcomes from COVID-19,” the researchers said.
To identify other genetic variations that increase pneumonia risk, they removed patients with CF and SCD from their analysis, repeated the GWAS, and used another technique called PrediXcan, which correlates gene expression data with traits and diseases in the electronic health record.
This time they found a pneumonia-associated variation in a gene called R3HCC1L in patients of European ancestry, and one near a gene called UQCRFS1 in patients of African ancestry.
The molecular function of R3HCC1L is unclear, but deletion of the UQCRFS1 in mice disrupts part of their infection-fighting immune response.
“Although our understanding about the genetic mechanism of pneumonia is still limited, this study identified the novel candidate genes, R3HCC1L and UQCRFS1, and offered an insight for further host genetic studies of COVID-19,” said the paper’s first author, Hung-Hsin Chen, Ph.D., MS, a postdoctoral fellow in Below’s lab.
“Our findings may be applied to identify the individuals with high risk of seve pneumonia and develop a precise treatment for them,” Chen said.
The COVID-19 pandemic is a serious threat to public health; over 58 million confirmed cases from 191 countries have been reported (see “Coronavirus Disease 2019” and “COVID-19 Map” in Web resources). Pneumonia is a common complication of COVID-19 and may lead to acute respiratory distress syndrome (ARDS) and death.1
In the context of the COVID-19 pandemic, identifying factors that influence host susceptibility to and severity of pneumonia has never been more important. Several clinical factors and underlying conditions that influence susceptibility to and severity of community-acquired pneumonia in adults have been identified, including age, chronic bronchitis or chronic obstructive pulmonary disease (COPD), asthma, obesity, diabetes, and others.2
These risk factors have also been observed in COVID-19-associated pneumonia.3, 4, 5, 6 However, little is known about the role of the host genome in the susceptibility to and severity of pneumonia.
There is a paucity of studies interrogating host genetic susceptibility to and severity of pneumonia. While previous studies identified suggestive associations with childhood pneumonia, survival from sepsis due to pneumonia, and severe pneumonia following influenza A/H1N1 infection,7, 8, 9 three genome-wide significant associations with pneumonia, one in the HLA class I region, have been mapped,10 and two additional genome-wide significant independent hits on chromosome 15 were identified in a meta-analysis of the UK Biobank and FinnGen.11
To identify additional genetic loci impacting pneumonia in existing data resources, we aimed to identify genetic variants associated with susceptibility to and severity of pneumonia by leveraging electronic health records (EHRs) from a large-scale biobank.
The Vanderbilt University Medical Center biobank (BioVU) comprises over 110,000 participants with linked EHRs genotyped on the Illumina Expanded Multi-Ethnic Genotyping Array (MEGAEX) or another genome-wide array. We defined pneumonia cases by using 81 pneumonia-related diagnosis codes (Supplemental box) from the International Classification of Diseases Ninth Revision, Clinical Modification (ICD-9 CM) and used the subjects without any diagnosis of pneumonia as population-based controls.
We used hospitalization status as a proxy for severity, determined via relevant Current Procedural Terminology codes recorded within 5 days of pneumonia diagnosis (Supplemental box). In 69,819 MEGAEX-genotyped European ancestry (EA) individuals, 8,889 individuals with pneumonia were identified, including 5,774 with pneumonia-associated hospitalization (inpatients). In 15,603 MEGAEX-genotyped African ancestry (AA) individuals, we identified 1,710 individuals with pneumonia, of which 1,043 were inpatients.
In both EA and AA subjects, we observed a significantly higher prevalence of obesity, COPD, diabetes, asthma, and liver and renal diseases in subjects with a pneumonia diagnosis (Tables S1 and S2). We sought replication of our top findings in two independent datasets: UK Biobank (n = 451,305) and 7,985 non-overlapping BioVU subjects, who were genotyped on arrays other than MEGAEX (non-MEGAEX) (Tables S3–S5).
Genetic imputation in MEGAEX-genotyped subjects was conducted with minimac4 on the Michigan Imputation Server12 with a reference panel of Haplotype Reference Consortium r1.1. A total of 39,635,008 SNPs was imputed. In EA, only 7,167,360 SNPs with an imputation info score greater than 0.4 and minor allele frequency (MAF) greater than 1% were used for further GWAS and GReX imputation. In AA, 13,633,982 SNPs passed quality control filter.
A more stringent MAF cutoff was applied in the comparison of AA pneumonia inpatient versus outpatient because of the smaller sample size (n = 1,710; 7,594,451 variants with MAF > 5% and imputation info score > 0.4). Because of the existence of genetic relatedness in BioVU, we utilized a generalized estimating equations (GEE) model to perform GWASs with SUGEN.13 Since SUGEN requires known family networks representing close relatedness within a dataset, we used PRIMUS to reconstruct non-directional family networks, including all first- and second-degree,14 and we used ERSA to verify the families with more than five members.15
Among MEGAEX-genotyped subjects, 5,019 families (size 2 or greater) were identified in BioVU EAs and 1,699 families in BioVU AAs. We included age, sex, and three principal components to capture genetic ancestry as covariates in the association tests.
Separate GWASs for susceptibility and severity of pneumonia were performed in EA and AA, each of which identified a major locus.
In EA, rs113827944 (MAF = 2.1%) was significantly associated with both pneumonia susceptibility and severity (affected individuals versus control individuals, odds ratio [OR] = 1.84, p value = 1.2 × 10−36; inpatients versus outpatients, OR = 1.70, p value = 4.0 × 10−9).
Replication of this lead SNP was observed in BioVU non-MEGAEX-genotyped EA (affected individuals versus control individuals, OR = 1.92, p value = 2.4 × 10−6). Another SNP, rs334 (p.Glu7Val, MAF = 5.8%) was significantly associated with both susceptibility and severity in AA (affected individuals versus control individuals, OR = 1.63, p value = 3.5 × 10−13; inpatients versus outpatients, OR = 2.56, p value = 4.5 × 10−12) (Table 1, Figures 1 and 2, and Figures S1–S3).
Table 1Significant findings from discovery pneumonia GWAS, replication, and COVID-19 validation
|Gene||SNP||Chr||Pos||Pop||BioVU MEGAEX||BioVU non-MEGAEX||UK Biobanka||COVID-19 HGIb|
|OR||p value||OR||p value||OR||p value||OR||p value|
|Susceptibility to pneumonia/COVID-19|
|Severity of pneumonia/COVID-19|
The lead SNP in EA, rs11382794, is located in the intron of cystic fibrosis transmembrane conductance regulator (CFTR [MIM: 602421]), the causal gene for cystic fibrosis (CF [MIM: 219700]). CF causes abnormally thick mucus that blocks airways, leading to chronic infections, persistent inflammation, airway remodeling, and progressive respiratory failure.16,17
Both acute and chronic lung infections are major contributors to morbidity and mortality in individuals with CF.18,19 In a study of 19,802 CF carriers (individuals with one defective copy of CFTR) and 99,010 control individuals, CF carriers were more likely than non-carriers to have pneumonia (OR = 1.16), a personal history of recurrent pneumonia (OR = 2.76), and other respiratory infections.20
The top finding in AA, rs334, is a nonsynonymous variant in hemoglobin subunit beta (HBB [MIM:141900]) and the causal mutation for SCD, including sickle cell anemia (SCA [MIM: 603903]). The relationship between SCA and pneumonia risk has been previously described in epidemiological studies.21 Children with SCA are more likely to have pneumonia and influenza (OR = 7.38) and acute respiratory infections (OR = 1.29).22 Acute chest syndrome (ACS) is a common complication of SCD and can be triggered by pneumonia and vaso-occlusive crises and is the leading cause of death in individuals with SCA.23 Clinically differentiating between ACS and pneumonia can be difficult, and they often overlap in the EHR.
Our two initial findings provide genetic support of previously observed epidemiological predictors of severe pneumonia risk, namely, the two autosomal recessive disorders CF and SCD. Previous studies have also reported associations between carrier status for CFTR and CF-associated symptoms.20 In contrast, SCA carriers had slightly lower risk of pneumonia and influenza (OR = 0.93) compared to subjects with normal hemoglobin,22 although pneumonia severity was not studied.
We applied Fisher’s exact test to investigate the effect of the risk allele in heterozygous carriers of the risk allele compared to those homozygous for the reference allele. We found heterozygous carriers of rs113827944 (CFTR) were at greater risk of developing pneumonia (OR = 1.38, p value = 1.9 × 10−7), including in sensitivity analyses excluding all individuals with diagnosed CF (OR = 1.17, p value = 0.019; Table 2). Heterozygous carriers of rs334 (HBB) were at greater risk of pneumonia-associated hospitalization (OR = 1.55, p value = 0.023), and sensitivity analyses excluding 138 individuals diagnosed with CF and SCD slightly attenuated this effect (OR = 1.49, p value = 0.109), most likely because of a reduction in power. Heterozygote effects observed herein are consistent with an intermediate phenotype or an undetected second variant contributing to compound heterozygosity.20,24,25
To identify additional effects masked by the strong effects of CFTR and HBB, we repeated GWASs after removing individuals diagnosed with CF and SCD. In EA, after removal of individuals with CF, we identified a genome-wide significant signal in R3HCC1L (rs10786398, MAF = 30.7%, OR = 1.22, p value = 3.5 × 10−8, Table 1, and Figures 1 and 3). Replication of effects at this gene were observed in a distinct haplotype in BioVU AA (rs7086391, MAF = 16.9%, OR = 1.42, p value = 1.14 × 10−3, multiple-testing-corrected p value = 0.0490, Figure S4) and at a distinct sentinel variant in UK Biobank EA (rs884811, MAF = 44.2%, OR = 1.05, p value = 3.49 × 10−3, multiple-testing-corrected p value = 0.0198, Table 1).
We note that this region of the genome exhibits low linkage disequilibrium: the lead variants from our discovery analyses and from the UK Biobank validation had an R2 of 0.55 in the BioVU data despite their being less than 3 kb apart (Figure S5). We also observed effects of rs10786398 in the latest release of the COVID-19 Host Genetics Initiative GWAS comparing hospitalized individuals with COVID-19 to non-hospitalized affected individuals (OR = 1.19, MAF = 31.4%, p value = 0.037),26 indicating relevance of this variant for susceptibility to and severity of COVID-19.
A previously published meta-analysis of lung function in UK Biobank and the SpiroMeta consortium provided additional validation of lead variants in R3HCC1L (Table S6).27 R3HCC1L encodes a coiled-coil domain-containing protein. Variants in its promoter region have been reported to be associated with inflammatory skin disease28 and body mass index.29,30 However, the molecular function of R3HCC1L is still unclear and more studies are needed to clarify its role in the pathogenesis of pneumonia.
In AA, after removal of individuals with CF and SCD, an additional SNP near UQCRFS1 (MIM: 191327) and non-coding RNA LOC105372352 (rs148218440, MAF = 2.4%; OR = 2.21, p value = 3.6 × 10−8; Figures S6 and S7) was identified in the comparison of AA pneumonia inpatients and age-, sex-, and ancestry-matched control individuals. UQCRFS1 encodes a Rieske iron-sulfur protein, which is part of the mitochondrial respiratory chain.31
Rare mutations in UQCRFS1 have been previously reported as the cause of a mitochondrial disorder (MIM: 618775).32 Further, Uqcrfs1 deletion in mice abolishes mitochondrial reactive oxygen species that are required for antigen specific T cell responses, a hallmark of the adaptive immune response.33 We did not observe replication of rs148218440 in BioVU non-MEGAEX AA (Table 1), however we note that this analysis was underpowered to detect effect (n = 292 affected individuals and 1,145 control individuals, Table S4); UK Biobank did not have enough AA individuals with pneumonia to perform a replication analysis (n = 76 affected individuals and 9,272 control individuals, Table S5).
To further investigate the functional role of associated variation at our lead genes, we utilized PrediXcan to impute the genetically regulated expression (GReX) in MEGAEX-genotyped EA and AA with the tissue-specific models trained by GTEx V8.34,35 The association tests were conducted with SUGEN and adjusted for three principal components, sex, and age (Table S7 and Figures S8 and S9). The GReX of R3HCC1L in lungs is significantly associated with severity of pneumonia in EA (β = 1.140, p value = 2.0 × 10−7).
In addition, the association of the GReX of HBB with pneumonia susceptibility in EA offers additional evidence of its role (β = 30.670, p value = 7.3 × 10−4). GReX of CFTR is not well captured in lung tissue,34 however GReX of this gene in other tissues are significantly associated with pneumonia susceptibility in EA (heart left ventricle, β = 0.464, p value = 3.7 × 1010).
Moreover, we estimated the heritability of pneumonia susceptibility and severity in MEGAEX-genotyped EA. We used genome-wide complex trait analysis (GCTA)36 to construct a genetic relatedness matrix and calculated the proportion of phenotypic variance explained by the matrix. The SNP-based heritability (h2) for susceptibility to pneumonia is estimated to be 0.029 and 0.026 after removing individuals diagnosed with CF (Table S8).
As expected due to greater similarity of exposure in affected individuals and control individuals, genetic variation explains a larger portion of pneumonia severity compared to susceptibility (h2 = 0.121; after removing individuals with CF, h2 = 0.150). Our results are similar to previous estimates in UK Biobank (0.075 for self-reported pneumonia, 0.15 for pneumonia inpatient).37
The data used for the present discoveries exhibit several key limitations. Extracting accurate phenotypes from EHR data is a well-known challenge38 and most likely results in some misclassification of cases. Such misclassification most likely reduces power to detect effects but is unlikely to introduce systemic bias that would invalidate results.
Additionally, distinguishing viral and bacterial pneumonia via clinical criteria is challenging, and data on positive cultures of blood or pleural fluid for bacterial or positive nasopharyngeal sample or positive serology for viral were not widely available in BioVU. Despite these challenges, we were able to detect four loci linked to pneumonia, including two regions. Further research will be required to differentiate the genetic architecture of specific pneumonia etiologies and to confirm the UQCRFS1 association, and functional studies will be needed to determine the biological mechanism underlying both signals.
In summary, we leveraged a large-scale biobank to identify genetic variants associated with the susceptibility and severity of pneumonia. Two clinically relevant Mendelian disease genes, CFTR and HBB, were implicated. These important genetic results indicate that individuals with CF and SCA are at heightened risk for development of and severe outcomes from pneumonia, an effect which may translate to COVID-19 outcomes. Heterozygous carriers of the CF risk allele demonstrated elevated risk of pneumonia susceptibility, and carriers of the SCD risk allele demonstrated elevated risk for pneumonia severity.
These findings may have important implications for genetically informed patient care in infectious lung disease. They are also critically important in the context of the COVID-19 pandemic, and future studies will be needed to establish whether these carriers exhibit a silent, heightened risk for poor outcomes from COVID-19 as well.
We also identified two additional pneumonia-related genes: R3HCC1L and UQCRFS1. Although we were most likely underpowered to replicate effects of UQCRFS1 variation in AA in our data resources, we successfully replicated R3HCC1L effects in two independent datasets and validated effects of R3HCC1L in independent GWASs of COVID-1926 and lung function.27
Characterizing host genome effects on infectious disease susceptibility and severity can offer important insight into the molecular etiology of risk; our findings may help elucidate pathophysiological processes for pneumonia, an important COVID-19 sequelae.
reference link: https://www.cell.com/ajhg/fulltext/S0002-9297(20)30446-8
More information: Hung-Hsin Chen et al, Host genetic effects in pneumonia, The American Journal of Human Genetics (2020). DOI: 10.1016/j.ajhg.2020.12.010