In most cases, your genes have less than five percent to do with your risk of developing a particular disease, according to new research by University of Alberta scientists.
In the largest meta-analysis ever conducted, scientists have examined two decades of data from studies that examine the relationships between common gene mutations, also known as single nucleotide polymorphisms (SNPs), and different diseases and conditions.
And the results show that the links between most human diseases and genetics are shaky at best.
“Simply put, DNA is not your destiny, and SNPs are duds for disease prediction,” said David Wishart, professor in the University of Alberta’s Department of Biological Sciences and the Department of Computing Science and co-author on the study.
“The vast majority of diseases, including many cancers, diabetes, and Alzheimer’s disease, have a genetic contribution of 5 to 10 percent at best.”
The study also highlights some notable exceptions, including Crohn’s disease, celiac disease, and macular degeneration, which have a genetic contribution of approximately 40 to 50 percent.
“Despite these rare exceptions, it is becoming increasingly clear that the risks for getting most diseases arise from your metabolism, your environment, your lifestyle, or your exposure to various kinds of nutrients, chemicals, bacteria, or viruses,” explained Wishart.
The study also highlights some notable exceptions, including Crohn’s disease, celiac disease, and macular degeneration, which have a genetic contribution of approximately 40 to 50 per cent.
Wishart and his research collaborators suggest that measuring metabolites, chemicals, proteins, or the microbiome provides a much more accurate measure of human disease risk and are also more accurate for diagnosis.
The findings fly in the face of many modern gene testing businesses models, which suggest that gene testing can accurately predict someone’s risk for disease.
“The bottom line is that if you want to have an accurate measure of your health, your propensity for disease or what you can do about it, it’s better to measure your metabolites, your microbes or your proteins–not your genes,” added Wishart.
“This research also highlights the need to understand our environment and the safety or quality of our food, air, and water.”
As genome wide association studies (GWAS) have grown in size, often now numbering tens of thousands of research participants, the number of genes identified as contributing to disease susceptibility have correspondingly grown.
This is as true for Alzheimer’s Disease (AD) as it is for many other disorders, and bioinformatic and pathway analyses of this large number of susceptibility genes is providing a highly efficient method of proposing and prioritising underlying biological pathways for further study [1,2].
In some cases, this understanding adds to existing observations–such as the evidence from GWAS that pathways of inflammation are important in AD–whereas in other cases pathway analysis yields less expected findings, such as evidence that cholesterol synthesis and endocytic recycling are part of the pathological process [3].
However, such analyses have their limitations due to our incomplete understanding of the canonical molecular pathways. Namely, many pathways described as categorical entities through data sources such as Kyoto Encyclopedia of Genes and Genomes (KEGG) and similar repositories, are skeletal at best and only contain a fraction of the genes that might be involved in any given process. Furthermore, many genes are nominated in multiple different pathways through their pleiotropic function.
As an example, one of the most replicated associations with AD, the gene CLU encoding clusterin, is involved in processes as diverse as complement signaling, protein binding and chaperoning, and cell survival [4]. In the context of this incomplete understanding together with inherent molecular pathway complexity, determining the underlying biology of disease from GWAS studies alone becomes difficult and hence inevitably limited.
In an effort to address this limitation, we reasoned that it should be possible to hone pathway analysis by utilising orthogonal datasets. Specifically, we hypothesised that pathways are more relevant to disease aetiopathogenesis if diseases that shared pathways also shared morbidity.
Put another way, if two or more diseases are more commonly found to co-occur rather than by chance, and if those comorbid diseases also share molecular pathways, one would predict that those shared pathways are more likely to play a role in aetiopathogenesis. In order to test this reasoning, we combined pathway analysis of the GWAS associations from all diseases (as reported in the GWAS catalog; https://www.ebi.ac.uk/gwas/) together with a co-morbidity study from real-world data to identify shared pathological processes.
We then tested the resulting pathway in observational and empirically derived genome wide expression datasets from human and rodent studies, and finally validated the results in empirical studies in rat models in vitro and in vivo.
The results, demonstrating a role for JAK-STAT signaling in AD, are in line with the known contribution of inflammatory processes to the disease, but they further nominate a specific target for therapy and provide a possible approach to interpretation of GWAS data for other disease areas.
Discussion
We have presented here a series of integrated analyses predicated on the underlying hypothesis that co-morbidity of disease can, in some cases, indicate shared genetic susceptibility to disease and that this is manifested most robustly at the level of pathways more than at the level of single genes.
By combining genome wide pathway association from all diseases together with their comorbidity with AD, we identify JAK-STAT signaling as a shared factor correlated with the degree of comorbidity. In the first data-driven phase utilizing gene association data we find this disease cluster to include a series of disorders of immunity and inflammation together with age-related macular degeneration (ARMD) and Type 1 Diabetes mellitus (T1DM).
The association with ARMD is particularly interesting as it has previously been found to be a risk factor for AD [24,25], because Aβ is a component of the drusen pathology in the retina of people with ARMD [26,27] and because the gene most associated with ARMD, CFH, encodes for a protein replicated as a biomarker of AD, complement factor H [28,29,30].
That our approach of using all GWAS data in a pathway clustering analysis identifies a relationship between AD and ARMD is all the more remarkable in because this association, in our study, is not driven by CFH. The association we find with T1DM is also intriguing as, although T2DM diabetes is one of the most substantiated risk factors for AD [31], our data now suggests that there might be a relationship also with early onset T1DM that is worth further attention.
However, the most extensive association between shared pathways and disease we find is with disorders of immunity and inflammation. The role of inflammation in AD has been apparent for many years. This evidence is very extensive and comes from many directions [32].
It includes evidence the use of non-steroidal anti-inflammatory drugs appear to decrease the risk of AD [33,34,35], post-mortem studies showing that inflammation is associated with AD pathology [36,37], and in vivo data showing that markers of inflammation are predominant amongst protein biomarkers both in AD and in pre-dementia conditions [38,39].
In addition, there is increasing evidence from GWAS studies and from rare mutations, that genes encoding proteins involved in immunity are amongst the most consistently associated with disease [32]. However, when considered in isolation, the pathways and processes identified by AD genetic studies are predominantly those of complement signaling and microglial function [3,40,41].
These pathways are clearly important in diseases with very considerable evidence to support their role, but the approach we have used here, triangulating between GWAS, real-world and empirical data, and including all disease and all genes, nominates a pathway as part of the aetiopathogenic process that is not identified by such AD gene focused studies.
As in any ‘big data’ approach, there are limitations both to the datasets available and to our use of them. First, in using the GWAS catalogue as a primary data-source, we limit ourselves to disease-gene associations where a significant number of genes have been identified. We do this in order to provide sufficient power for analysis, but acknowledge that the limit of 25 susceptibility genes to enable a disease to enter analysis is both arbitrary and dependent on the size and numbers of studies that happen to have been conducted to date. Almost certainly, we miss information as a consequence of data limitation.
Secondly, by segregating genes into pathways we attempt to overcome the intrinsic limitation of GWAS studies, in that biology is mechanistically enacted at the level of pathway and not gene, let alone SNP. Given the large number of SNPs and genes in the human genome, two diseases may have no elements measured in GWAS, or even sequencing studies, in common and yet share an overlapping disease pathogenesis. Measuring association not with SNP but with multiple SNPs across a gene (‘gene-wide association’) is one attempt to overcome the limitation; here we go one step beyond this with a pathway-wide association approach.
However, in attempting to derive such information from the GWAS data, we are severely limited by current understanding of biological pathways, which is rudimentary at best. This limitation is bound to hinder our derivation of knowledge from information in this context. Thirdly, in seeking to identify diseases comorbid with AD, we have crudely utilised a dataset of concurrent diagnoses, taking no account of some of the confounds or other concerns of conventional epidemiology.
Indeed we cannot be sure whether the co-morbidity we observe is due to the disease itself or the drugs used to treat the disease. However, we note that similar claims-level analyses of real-world clinical data have recently proven valuable in studying genetic and environmental factors shared amongst diseases [42].
We accept the limitations of our approach described above. However, in mitigation of these potential limitations, the datasets we examine are huge; namely, all genetic studies with all genes and all diseases in the first phase, and a dataset of over six million people in the second. Furthermore, we would suggest that some confounds are less critical in the analysis we present here.
For example, in studies of risk and protection then clearly understanding direction of effect–whether it is the disease or the treatment that affects risk–is fundamental. However, it becomes less important, potentially irrelevant, where we are determining simply whether the same processes are involved, as both disease and treatment will at some level and in some cases affect the same molecular pathways, which are the axis of our analysis. Finally, despite the limitations of deriving knowledge from data using this approach, the fact that the findings replicate in observational molecular studies in man and in experimental studies in rodents offers strong support to the results.
The JAK-STAT signaling pathway, nominated as a potential target for therapy through data-driven genomics and real-world data in this study, is a key regulator of the response to mediators of inflammation, including cytokines, chemokines [43] and microglia activation [44]. The binding of cytokines (interleukin, interferon and growth factors) and other ligands (such as hormones) to their receptors increases tyrosine kinase activity of Janus Kinases (JAKs, including TYK2), which in turn phosphorylate the receptors and recruit STATs which are themselves then phosphorylated.
Subsequent dimerization leads to nuclear translocation and transcription factor activity. Given this critical role in the modulation of the inflammatory response, it is not surprising that JAK-STAT signaling has previously been associated with inflammatory disease and targeted for therapeutics, some of which have been approved for clinical use [45].
The pathway has also been identified as of importance in relation to diabetes [46] but less often in relation to AD. Very high concentrations of Aβ have been shown to increase tyrosine phosphorylation and transcriptional activity in a Tyk2 dependent manner in rodent models, and tyrosine phosphorylation of STAT3 is increased in AD brain [47], while inhibition of JAK-STAT3 signaling inhibits activation of astrocytes and microglia in animal models of neurodegeneration [48].
JAK-STAT signaling has also been identified as a component of plasticity, specifically long-term depression (LTD) [49,50], interesting not least because LTD and long term potentiation are regulated, in opposite directions, by GSK3, the predominant tau-kinase implicated in AD pathogenesis [51,52,53].
In summary, a combined, sequential analysis of GWAS data agnostic to disease type, combined with real-world data of co-incidence of AD with other diseases, nominated JAK-STAT signaling amongst other pathways as a possible underlying pathogenetic mechanism shared across multiple diseases.
Remarkably, these diseases–inflammatory disorders, ARMD and Diabetes had previously been implicated as risk factors for AD. Adding to the weight of evidence for JAK-STAT signaling in AD, we subsequently found altered gene expression of the pathway in multiple human and rodent datasets and in empirical studies of Aβ exposure in rodents, both in vitro and in vivo. These data are not the first to nominate JAK-STAT signaling for therapeutic intervention [54], as experiments suggest that humanin and colivelin protect against AD-related neurotoxicity thought its activity in JAK-STAT.
However the combination of genetic, human and rodent, observational and empirical data, make a strong case to pursue this pathway as a target for therapies for AD, not least because clinically approved compounds already exist. This further suggests that this novel integration of orthogonal data is a promising approach to find novel targets in complex disorders.
Source:
University of Alberta
Media Contacts:
Katie Willis – University of Alberta
Image Source:
The image is in the public domain.
Original Research: Open access
“Assessing the performance of genome-wide association studies for predicting disease risk”. Jonas Patron, Arnau Serra-Cayuela, Beomsoo Han, Carin Li, David Scott Wishart.
PLOS ONE doi:10.1371/journal.pone.0220215.