A new study by researchers at The University of Manchester suggests that the current statistical model used by GPs to predict a patient’s risk of developing cardiovascular disease (CVD) could be producing misleading results.
The research was published in the open-access journal Scientific Reports.
“Clinicians often use risk scores to predict a patient’s risk of developing a disease in the future,” says Professor Tjeerd Van Staa, lead researcher on the study, “An example is QRISK3, which is being used currently by GPs in England to decide whether to start treatment with a statin based on the predicted risk of CVD.
The threshold is a risk of 10 percent as recommended by the National Institute for Health and Care Excellence.”
The QRISK3 model is based on routinely collected data from patients’ electronic health records (EHRs).
Using this data set and information about the patient – such as their body mass index (BMI), their blood pressure, and where they live – QRISK calculates the risk of developing CVD in the next 10 years.
The main finding of the paper is that QRISK does not fully capture the variability between patients and medical practices, which leads to uncertainty in predicting individual CVD risk.
Part of the problem is that each practice has its own way of recording EHRs, due to differences in computer systems and clinical coding.
This means that the quality of data from different practices can vary, and this affects patients’ QRISK scores.
As Prof Van Staa and other researchers in the study explain: “A patient with a predicted risk of 10 percent of developing CVD in the next 10 years could have a risk between 7.2 percent and 13.7 percent, depending on which practice they came from.”
To investigate the accuracy of this calculation, the researchers used anonymized data from 3.6 million patients across 392 practices, available from the Clinical Practice Research Datalink (CPRD).
Another issue is that, while the QRISK calculation does use individual patient data such as blood pressure or family medical history, it doesn’t capture enough information on an individual level.
As such, other factors, such as genetics or a detailed and personal life style, are not currently considered by the model.
According to Prof Van Staa and other researchers in this study, all of this missed data “may mean that a patient may have a much lower risk than predicted by QRISK3 (and may not require the statin) or may have a much higher risk than predicted (and not getting treated with a statin).”
The researchers are careful to note that this doesn’t mean that the QRISK model itself is flawed, however, but its application on an individual level is.
The paper points out that “Risk prediction models based on routinely collected health data perform well for populations but with great uncertainty for individuals. Clinicians and patients need to understand this uncertainty.”
Cardiovascular disease (CVD) was the primary cause of death in USA, Europe and China in 20171. Multiple studies have suggested that the identification of patients with high CVD risk is important in its prevention2–5.
Risk prediction models are often used to predict CVD risk for individual patients5.
Examples are the Framingham risk score (FRS) and QRISK which provide risks of developing CVD in the next 10 years. Information is used on risk factors such as age, gender, body mass index (BMI), ethnicity, smoking history and disease histories6,7.
FRS models have good performance in the USA population, but the risk predictions may be problematic when applied to cohorts that are hugely different from the cohort used for model development8.
In the UK, treatment guidelines for the primary prevention of CVD recommend the use of QRISK2 (second version) to identify patients with high CVD risk9.
QRISK is based on routinely collected data from general practices in the UK7.
Conventional approaches were used to measure discrimination and calibration in the overall population7.
However, there can be substantial variation between general practices in the style of coding clinical information (coding style) and completeness of data recording10.
Different coding dictionaries are also currently being used in UK primary care as the EHR systems either use Read version 2 or CTV3 codes11.
The patient case-mix (referring to a variation in risk factors for disease) may also vary between practices.
This variability in the underlying data sources is currently not routinely considered in the development of risk prediction models, but it could potentially lead to heterogeneity in the prediction model’s performance12.
The objective of this study was to assess the level of generalisability of risk prediction models that are based on routinely collected data from EHRs, and to measure the effects of practice heterogeneity on the individual predictions of risk.
The QRISK3 prediction model (for the 10 year risk of CVD) was used as an exemplar.Go to:
Methods
Data source
This study used data from the Clinical Practice Research Datalink (CPRD) which is a database with anonymised EHRs from 674 GP practices in the UK.
The database includes 4.4 million (6.9% of the UK population) patients and is broadly representative of the UK general population in terms of age, gender and ethnicity13.
CPRD includes patient records of demographics, symptoms, tests, diagnoses, therapies, health-related behaviours and referrals to secondary care.
Data from over half of the practices have been linked using unique patient identifiers to other datasets from secondary care, disease-specific cohorts and mortality records13. This study was restricted to 392 general practices that have been linked to Hospital Episode Statistics (HES), Office for National Statistics (ONS) and Townsend scores7. Over 1,700 publications have used CPRD data14.
Previously, CPRD data has been used to externally validate QRISK215.
QRISK prediction models
QRISK is a statistical model which is being used to predict a patient’s risk over 10 years of developing CVD (including coronary heart disease, stroke or transient ischaemic attack). The second version (QRISK2) was derived in 2008 using data from 355 practices in the QResearch database16, and validated using data from 364 practices from the THIN database17.
QRISK3 is the latest version published in 2017, which includes more clinical variables, such as migraine and chronic kidney disease, than QRISK27.
The QRISK3 predicted risks were calculated using the open access algorithm18. Calculations were successfully verified to be the same as predictions by the online calculator. This was done for simulated different patient groups in which each risk factor was changed sequentially covering the changes of all QRISK3 risk factors.
Study population
The study population in this study was similar to that used for the development cohort for QRISK37.
Patients were included if they were aged between 25 and 84 years, had no CVD history or prescribing of statins prior to the index date.
The follow-up of patients in CPRD cohort started one year after start of data collection, patient’s registration date, date of reaching age 25 years, or January 1 1998 (whatever came last) and it ended at the end of data collection, a patient leaving the practice, date patient’s death or the CVD outcome (whatever came first).
Patients were censored by the earliest date among the first statin prescription, transfer or the end of follow-up19.
The index date (as the start date for evaluating CVD and the baseline date for assessing a patient’s history) was chosen randomly from the period of follow-up.
The random index date19 was preferred, because it gets a better spread of calendar time and age, and captures the time-relevant practice variability (e.g., change of recording and second trend of CVD incidence rate). This study considered the same risk factors as in QRISK37.
Statistical analysis
The QRISK3 predicted risks were estimated for each patient and were also averaged within each practice. Averaged predicted risks were compared to the observed risks at year 10 which were based on Kaplan Meier life tables.
The observed risks were extrapolated for the 13.5% of practices with less than 10 years of follow-up. It was assumed that the life tables of these practices followed the pattern of the overall population life table.
We calculated each year’s CVD relative risk (RR) by dividing the current year’s CVD proportion by the next year’s CVD proportion.
The extrapolation was verified using practices with 10 years follow-up.
Specifically, we randomly remove records to make these practices have less than 10 years follow-up and then compared the extrapolated risk to the observed risk.
We found no evidence20 that the extrapolated risks were statistically significant to the actual observed risks.
A Cox model with a frailty (random effect) term for each practice was fitted to assess the effects of practice heterogeneity21.
Patient survival time (time until censoring or CVD) was the outcome (dependent variable) and the linear predictor from the QRISK3 model was included as an offset.
Each patient’s linear predictor was calculated using the patient’s risk factors and corresponding QRISK3 coefficients. Each practice’s random effects on individual risk prediction and the standard deviation of all practices’ random effects were extracted from the frailty model.
Patient QRISK3 predictions and their corresponding practice random effects were combined to calculate a random effects model predicted risk.
These were compared with the QRISK3 predicted risks.
The distribution of the differences between the QRISK3 and the random effects model’s predicted risks were plotted.
Limited practice size or duration of follow-up could contribute to the unknown variability between risks predicted by QRISK3 and the random effects model.
In order to measure this random error, we simulated data under a null hypothesis of no practice level variability and estimated the distribution of the practice level random effects, and compared this with the distribution of the practice level random effects observed in the CPRD data (i.e. a permutation test).
Specifically, simulations were conducted using 2,000 datasets of the same size and follow-up as the CPRD data.
The CVD outcomes were simulated by assigning a random probability from a uniform distribution (0, 1) to each patient.
The random effects model was then fitted to these simulated data in order to quantify the random variability.
The comparison between effects of unknown random variability and effects of practice level variability on individual patients was plotted using one million patients (50% male and 50% female) who had a QRISK3 predicted risk of 10%.
We used classical model performance measurements to compare QRISK3 with the random effects model.
The data from each practice were randomly divided into two (70% and 30%) stratified by gender.
The first part was used to develop the random effects model and the second part to test and calculate model performance measurements including the C-statistic22, brier score23,24 and net benefit25.
These measurements were calculated using QRISK3 predictions, predictions of random effects model, patient follow-up time and patient status at the time of censoring. Empirical confidence intervals were calculated using 1,000 bootstrap samples.
Missing values for ethnicity, BMI, Townsend score, systolic blood pressure (SBP), standard deviation of SBP, cholesterol, High-Density Lipoprotein (HDL) and smoking status (only these have missing values) were imputed using Markov chain Monte Carlo (MCMC) method with monotone style26.
The QRISK3 and random effects risks were then averaged based on ten imputations. We calculated random effects of CPRD practices and random effects separately for females and males consistent with QRISK3 development. The random effects of practices were calculated independently by both SAS and R with almost identical results.
The random effects model used procedures from SAS 9.4 and “coxme” package for the R 3.4.2. The analyses of the datasets, missing value imputation, extrapolation validation and life tables were produced by SAS. R was used to model the data.
The protocol for this work was approved by the independent scientific advisory committee for Clinical Practice Research Datalink research (protocol No 17_125RMn2). We confirm that all methods were performed in accordance with the relevant guidelines and regulations.
Results
Table 1 shows the patient characteristics and level of data recording across the 392 general practices. The mean age of patients varied between practices (5% percentile was 40.0 years and 95% percentile was 49.8 years).
Presence of CVD risk factors also varied between practices.
The 5–95% range between practices was 1.9 to 16.4 for recorded history of severe mental illness.
The level of data completeness also varied substantially between practices. Ethnicity was not recorded for 19.6% of patients in the 5th percentile of practices compared to 93.9% in the 95% percentiles. Life table analysis are shown in eTable 1 in the Supplement.
Table 1 – Characteristics of the general practices included in the study and the distribution of data recording.
Figure 1 shows the variation of CVD incidence rate among practices by plotting CVD incidence rate per 100 person years against the total follow-up time. A large amount of variation of CVD incidence rate were found between practices.
Figure 2 shows that the random effects model has less variation of differences between observed and predicted risk on practice level than QRISK3.
Random effects model’s Brier score (0.067 (95% CI: 0.0667, 0.0682)) was close to QRISK3’s brier score (0.067 (95% CI: 0.0666, 0.0680)). The difference of Brier score between random effects model and QRISK3 was 0.002 (95% CI: 0.00008, 0.0023). Random effects model’s C-statistic (0.852 (95% CI: 0.850, 0.854)) was also close to QRISK3’s C-statistic (0.850 (95% CI: 0.848, 0.852)). The difference of C-statistic between the two models was 0.0017 (95% CI: 0.0015, 0.0020). The net benefit analysis25 shows that both of models could predict three true CVD events without adding a false negative CVD events in every 100 patients with a given threshold of 10% (visualised in eFigure 2 in the Supplement). Standard deviation of random effects of CPRD practice between females (0.174) and males (0.177) were close to each other.
Table 2 shows the inconsistencies between the risks predicted for the same group of individual patients by QRISK3 and the random effects model (visualised in eFigure 1 in the Supplement). Patients with a predicted QRISK3 risk between 9.5~10.5% were found to have a much larger range of risks in the random effects model (between about 7.6~13.3%). Table 2 also shows the level of reclassification to below or above the treatment risk threshold of 10% when using the random effect model instead of the QRISK3 predicted risk. It was found that 19.7% patients with QRISK3 predicted risk between 8.5~9.5% had a risk above the treatment threshold when using the random effects model. For patients with QRISK3 predicted score between 10.5~11.5%, 24.4% of patients were reclassified to below the treatment threshold when using the different model.
Table 2
Inconsistencies between individual CVD risks as predicted by QRISK3 or by random effects model that incorporated practice variability.
QRISK3 predicted CVD risk(over 10 years) | Predicted risk according to random effects model incorporating practice variability | Total number of patients | ||||||
---|---|---|---|---|---|---|---|---|
Percentile | % below/above treatment threshold of 10 year CVD risk (10%) | |||||||
2.5th~97.5th | 5th | 25th | 75th | 95th | ≤10 | >10 | ||
<6.5 | 0.1~6.0 | 0.1 | 0.4 | 2.6 | 5.4 | 100.0 | 0.0 | 2561602 |
6.5~7.5 | 5.3~9.4 | 5.5 | 6.3 | 7.6 | 8.9 | 99.0 | 1.0 | 96981 |
7.5~8.5 | 6.0~10.7 | 6.3 | 7.2 | 8.7 | 10.2 | 94.0 | 6.0 | 82768 |
8.5~9.5 | 6.8~12.0 | 7.1 | 8.2 | 9.7 | 11.4 | 80.3 | 19.7 | 72098 |
9.5~10.5 | 7.6~13.3 | 7.9 | 9.1 | 10.8 | 12.6 | 54.0 | 46.0 | 64477 |
10.5~11.5 | 8.4~14.6 | 8.8 | 10.0 | 11.9 | 13.9 | 24.4 | 75.6 | 56550 |
11.5~12.5 | 9.2~15.8 | 9.6 | 11.0 | 13.0 | 15.1 | 9.1 | 90.9 | 50278 |
12.5~13.5 | 10.0~17.1 | 10.4 | 11.9 | 14.0 | 16.3 | 2.4 | 97.6 | 45126 |
≥13.5 | 12.7~55.4 | 13.5 | 17.8 | 34.7 | 50.2 | 0.1 | 99.9 | 600938 |
Figure 3 plots the distribution of risks predicted with the random effect model for those with a QRISK3 predicted risk of 10%. The effects of random variability (measured by simulation analysis) in the random effect model is also presented in this figure. It was found that the effect of practice variability on predicted risks for patients cannot be fully explained by random variability, as the overall distribution (blue area) with a random effects’ standard deviation of about 0.17 was much larger than the distribution due to random variability (green area) with a standard deviation for random effects of about 0.01.
More information: Yan Li et al. Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?, Scientific Reports (2019). DOI: 10.1038/s41598-019-47712-5
Callum Wood
Journal information: Scientific Reports
Provided by University of Manchester