Integrating Polygenic Scores with Clinical, Lifestyle, and Social Risk Factors to Enhance Heart Failure Risk Prediction

This web page was created programmatically, to learn the article in its unique location you’ll be able to go to the hyperlink bellow:
https://pmc.ncbi.nlm.nih.gov/articles/PMC12952681/
and if you wish to take away this text from our website please contact us

Abstract

Heart failure (HF) is extremely prevalent, high-burden dysfunction with its prevalence anticipated to extend. Early detection of HF can scale back morbidity and mortality; subsequently, novel early detection strategies are wanted. Polygenic scores (PGS) can mix frequent variants throughout the genome and supply phenotype-specific threat scores. However, there are additionally many well-known, non-genomic threat components of HF, within the scientific, way of life, and social determinant of well being (SDOH) domains, and it isn’t clear how genetic and non-genetic threat components collectively contribute to HF threat. To handle this query, we assessed whether or not combining HF PGS with scientific, way of life, and SDOH threat components improves threat prediction. Leveraging knowledge from the All of Us Research Program (n = 22,275), scientific threat components have been aggregated right into a scientific threat rating (CRS) whereas way of life and SDOH threat components have been aggregated right into a polyexposure rating (PXS). Feature choice was performed with LASSO regression and statistical significance thresholding from logistic regression fashions (p < 0.05). Features have been included within the mannequin in the event that they have been statistically important and essential in ≥ 95% of 1000 iterations. To assess mannequin efficiency, logistic regressions with HF case/management standing have been performed with every threat rating individually, in addition to built-in fashions. The built-in mannequin (PGS + CRS + PXS) carried out higher than particular person threat scores (AUROC = 0.763, AUPRC = 0.047, F1 rating = 0.062, balanced accuracy = 0.683). To assess the validity of the CRS and PXS, an built-in mannequin with the PGS together with scientific and publicity threat components as unbiased options was additionally evaluated. Based on AUPRC and F1 rating, this built-in threat mannequin (PGS + CRS threat components + PXS threat components) carried out higher than the combining the PGS with the CRS and PXS (AUROC = 0.738, AUPRC = 0.047, F1 rating = 0.066, balanced accuracy = 0.657). These findings reveal that integration of threat components throughout a number of domains can enhance HF prediction. Knowing that PGS mixed with scientific, way of life, and SDOH threat components is predictive of HF threat offers larger alternative for the identification of people susceptible to HF previous to illness onset with the purpose of prevention or early intervention.

Keywords: Heart Failure, Polygenic Score, Clinical Risk Score, Polyexposure Score, Integrated Risk Model

1. Introduction

1.1. Heart Failure Risk Factors

Heart failure (HF) is a big burden on the inhabitants, as 6.7 million people within the United States over age twenty are affected. Due to an getting older inhabitants and elevated survival charges after prognosis, prevalence is predicted to rise by thousands and thousands every decade¹. In addition, HF mortality charges have been growing since 2012¹. Early detection of HF might lower morbidity and mortality, by means of early implementation of guideline-directed medical remedy, the gold-standard therapy for HF^1–3. Major heart problems (CVD) occasions might be prevented with early detection, so, novel strategies should be developed to enhance early detection of HF⁴.

To date, the main threat components for HF embrace older age, smoking, atrial fibrillation (AF), hypertension, ischemic coronary heart illness, weight problems, and diabetes mellitus¹. Cardiovascular circumstances are interrelated and share many scientific, environmental, way of life, and social determinant of well being (SDOH) threat components⁴. Hypercholesterolemia and hyperlipidemia can result in atherosclerosis and growth of CVD, so many CVD remedies intention to decrease lipid ranges⁴. These threat components are assessed in scientific settings with lipid panels that measure high-density lipoprotein (HDL) ldl cholesterol, low-density lipoprotein (LDL) ldl cholesterol, and triglycerides⁴. Other medicines intention to decrease blood stress, as hypertension can be a serious threat issue for CVD⁴. Another scientific threat issue for CVD is diabetes (sort I and kind II), which is measured by elevated glucose and hemoglobin A1c (HbA1c) ranges^1,4.

In addition to scientific threat components, there are lots of lifestyle-related threat components for CVD, together with smoking, lack of bodily exercise, and poor food plan⁴. Modifying way of life has a serious influence on CVD as it’s identified to sluggish or reverse development⁴. Sedentary way of life and unhealthy food plan can result in weight problems, which is measured by physique mass index (BMI)⁴. Lifestyle and SDOH threat components are sometimes interconnected^4,5. For instance, low revenue and training degree are related to diet standing^4,5. In addition, neighborhood components reminiscent of grocery retailer availability, park and sidewalk entry, poorly stored up housing, vandalism, and graffiti can affect food plan and bodily exercise⁴. Other SDOH components which are related to CVD are single-living standing, neighborhood deprivation, social isolation, employment standing, meals insecurity, childhood adversity, dwelling alone, social deprivation index, and census-based revenue^4–8. There are additionally environmental threat components for CVD, reminiscent of air air pollution and ambient temperatures^1,5,9,10.

Genomics additionally play a job in HF growth. HF has a heritability of 34% when excluding cardiomyopathies, suggesting some genomic contribution¹¹. Familial/Mendelian traits which are brought on by variants in single genes reminiscent of cardiomyopathies and hypercholesterolemia considerably enhance the danger of HF^12–14. Besides these Mendelian genetic threat components, many frequent and uncommon genetic variants have additionally been related to non-Mendelian HF¹⁵.

1.2. Heart Failure Risk Prediction

Known threat components might be leveraged to foretell HF threat. For instance, genetic testing of variants in single genes that contribute to familial cardiomyopathies and hypercholesterolemia is used for early detection^12–14. However, not all HF is Mendelian; subsequently polygenic scores (PGS) supply a chance to combination frequent variants throughout the genome and supply phenotype-specific threat scores¹⁶. PGS are the cumulative, mathematical aggregation of threat derived from the overall contribution of variants throughout the genome¹⁶. PGS have been proven to be predictive on the population-level for complicated traits reminiscent of CAD, AF, sort II diabetes (T2D), breast most cancers, schizophrenia, bipolar dysfunction, amongst different traits^17–26. Additionally, current research have discovered that PGS is predictive of HF^15,27–29. The scientific utility of PGS is an energetic space of investigation and dialogue within the subject, with essential issues reminiscent of interpretability, integration with present scientific threat fashions, cost-effectiveness, equitable entry throughout numerous populations, and the necessity for clinician training and infrastructure to assist its implementation^26,30–36

Other threat scores, reminiscent of scientific threat scores (CRS) and polyexposure scores (PXS), can be utilized to combine non-genomic threat components, reminiscent of scientific, way of life, environmental, and SDOH variables into predictive threat fashions^6,37,38. CRS is the linear mixture of scientific threat components related to a illness of curiosity³⁸. Many CRS have been developed and validated to foretell HF, using identified HF predictor variables together with scientific circumstances, way of life components, medicines, and different threat components³⁸. These fashions have been proven to be predictive of HF (highest space underneath the receiver working curve (AUROC) = 0.87)³⁸. Conversely, PXS linearly integrates way of life, environmental, and SDOH threat components right into a singular rating³⁷. A PXS in a single current research was proven to be predictive of T2D standing (C-index = 0.762)³⁷.

These threat scores have been proven to be individually predictive for numerous CVDs. Integration of PGS with non-genomic threat components has beforehand improved predictive skill for CAD, T2D, and aortic stenosis^39,40. Another research discovered that combining PGS, CRS, and PXS collectively improved T2D classification accuracy³⁷. Leveraging genomic, digital well being document (EHR), and survey knowledge, this research goals to establish whether or not integration threat components throughout a number of domains can enhance prediction of HF threat.

2. Methods

2.1. Data and Study Participants

The All of Us Research Program (AOU) is a longitudinal, cohort research based mostly within the United States⁴¹. Participants supplied knowledgeable consent for non-compulsory knowledge assortment, together with blood pattern assortment for entire genome sequencing (WGS), entry to digital well being data (EHR) and wearables, and completion of health-related surveys and bodily measurements on the time of enrollment⁴¹. Data from model 8 (v8) was utilized on this research⁴¹.

2.2. Genotyping and Quality Control

AOU genome facilities extracted DNA from blood samples, which have been genotyped with an Illumina NovaSeq 6000 instrument and processed on the Illumina DRAGEN platform⁴¹. Following processing, samples have been included with imply protection ≥ 30x, genome protection ≥ 90% at 20x, protection of hereditary illness threat genes ≥ 95% at 20x, aligned Q30 bases ≥ 8 × 10¹⁰, cross-individual contamination < 3%, concordance with independently processed genotype array, and concordance between intercourse name and self-reported intercourse at delivery⁴¹. During joint calling, extra sample-level high quality management (QC) was performed, together with pattern arduous threshold flagging (variety of single nucleotide polymorphisms (SNPs) < 2.4 million and > 5.0 million, variety of variants not current in gnomAD 3.1 > 100K, and heterozygous to homozygous ratio (Het/Hom) > 3.3 (for SNPs and insertions and deletions (INDELs) individually)), and pattern inhabitants outlier flagging (eight median absolute deviations (MAD) away from the median residual in deletion rely, insertion rely, SNP rely, variety of variants not in gnomAD 3.1, insertion to deletion ratio, transition to transversion ratio, or SNP or INDEL Het/Hom). Variant-level high quality management was additionally performed, excluding variants with no high-quality genotype (genotype high quality (GQ) ≥ 20, depth of protection (DP) ≥ 10, and allele steadiness (AB) ≥ 0.02 for heterozygotes), extra heterozygosity < 54.69, SNP high quality (QUAL) rating < 60, INDEL QUAL rating < 69, > 100 alternate alleles, and variants which are possible artifacts (utilizing the Variant Extract-Train-Score Filtering (VETS) algorithm)⁴¹. Finally, eight well-characterized management samples have been included to validate the QC pipeline by calculating sensitivity and precision⁴¹.

2.3. Polygenic Score Calculation

PGS weights have been generated utilizing the most important HF genome-wide affiliation research (GWAS) up to now (n-individuals = 2,322,691, n-variants = 1,274,692)¹⁵. Weights have been extracted from the PGS catalog (PGS005097) and utilized to the AOU cohort utilizing the PGSC-CALC pipeline, which adjusts for the confounding results of genetically inferred ancestry by normalizing based mostly on variations in inhabitants means and commonplace deviations^42–44.

2.4. Phenotyping

Individuals have been categorized as instances and controls for ICD-based phenotypes based mostly on ICD-9 and ICD-10 mappings to PhecodeX, which symbolize significant phenotypes in statistical genetics⁴⁵. The consequence, HF standing, was outlined by PhecodeX CV_424. T2D standing, a predictor included within the CRS, was outlined by PhecodeX EM_202.2 (Table 1). For each phenotypes, people needed to have at the very least two cases of mapped ICD codes to be a case (rule of two), and 0 cases to be a management⁴⁶. Age at first prognosis, represented by age at first HF ICD code, was computed for HF instances, whereas age finally knowledge launch (October 1^st, 2023) was computed for HF controls. Reported intercourse at delivery was encoded numerically, together with solely men and women.

Table 1:

Known threat components for HF to be included in built-in threat scores.

Clinical Risk Score	Polyexposure Score

Clinical Risk Factors	Lifestyle Risk Factors	Social Determinant of Health Risk Factors

Diastolic Blood Pressure^†	BMI	Census-based revenue
Glucose^‡	Nutrition standing^*	Education degree
HbA1c^‡	Physical exercise	Income degree
HDL ldl cholesterol	Smoking standing	Neighborhood^†
LDL ldl cholesterol		Single-level standing^*
Systolic Blood Pressure^†		Deprivation index^†‡
T2D standing
Triglycerides^†‡

Lab values have been derived from the digital well being document (EHR) and have been obtained from serum or plasma. Triglycerides, HDL ldl cholesterol, LDL ldl cholesterol, non-fasting glucose, HbA1c, systolic blood stress (SBP), and diastolic blood stress (DBP) have been included within the CRS whereas BMI was included within the PXS (Table 1). Values have been first filtered by measurement title. Many values have been extraordinarily irregular so values ≥ 5 MAD away from the median have been excluded, in addition to values ≤ 0. Many people had a number of measurements, so the closest worth earlier than their calculated age (age at first prognosis for instances, age at knowledge launch for controls) was retained. Individuals who didn’t have lab measurements earlier than their coded age have been excluded.

Lifestyle threat components (smoking standing, bodily exercise, and diet standing), in addition to SDOH threat components (revenue degree, highest achieved training degree, neighborhood, and single-living standing), have been extracted to be included within the PXS (Table 1). This knowledge was derived from surveys, which have been taken as soon as at recruitment. Income, training degree, and bodily exercise have been derived from one query. Conversely, twenty-four neighborhood questions have been extracted and stored as separate options. Smoking standing was derived from 9 questions which have been built-in into one variable, classifying people as nonsmokers (0), former people who smoke (1), or present people who smoke (2)⁴⁷. Individuals needed to reply as a non-smoker in all inquiries to be labeled as a non-smoker. Individuals have been labeled as former people who smoke in the event that they recognized as a former smoker or the age they fully give up smoking was lower than their coded age. Individuals who responded as a smoker in any query have been labeled as a present smoker. Answers for all questions have been coded numerically, such that every variable mirrored a constructive affiliation with HF. The questions, solutions, and reply encodings for every variable are described within the supplementary material (Supplementary Table 1). Census-based revenue and social deprivation index (SDI) have been additionally included within the PXS, which have been calculated by AOU based mostly on three-digit zip codes (Table 1). Physical exercise might change after a HF prognosis, so the closest worth previous to computed age was utilized. Individuals who didn’t have a bodily exercise variable previous to their coded age have been excluded.

All steady variables within the CRS and PXS, together with labs, measurements, census-based revenue, and SDI have been normalized with inverse-normal transformation. Prior to integration, all variables have been placed on the identical scale. Variables have been downscaled to match the variable with the bottom variety of classes of their threat rating group. CRS variables have been scaled to 0–1 to align with T2D standing and PXS variables have been scaled to 0–2 to align with smoking standing. Individuals with lacking knowledge in any variable have been eliminated.

2.5. Clinical Risk Score and Polyexposure Score Construction

Data was break up into 70% prepare and 30% take a look at splits. The coaching portion was used for characteristic choice and weight era. It was break up in half, so 35% of the dataset was used for logistic regressions and 35% was used for LASSO regression. Logistic regression was used to establish threat components that have been considerably related to HF (p < 0.05), whereas LASSO regression was used to establish essential threat components in HF prediction. In these coaching splits, SMOTE was used to fight case/management imbalance and enhance pattern measurement⁴⁸. In each regression fashions, the end result was HF case/management standing, and age and intercourse have been used as covariates within the logistic regression. 1000 iterations of those splits have been performed (for each 70/30 coaching/testing break up and 35/35 coaching break up), utilizing a distinct random seed every time. Sampling was performed with out alternative. Only threat components that have been important and essential ≥ 95% of iterations (95% confidence) have been included in CRS and PXS era. In the testing set, threat components have been mixed into cohesive scores utilizing a weighted sum, utilizing impact sizes (betas) from logistic regressions because the weights. As a comparability, unweighted sum was assessed as nicely.

2.6. Model Evaluation

Model analysis was performed within the testing break up utilizing logistic regressions. The testing set was break up in half, so 15% of the dataset was used for mannequin coaching and 15% was used for mannequin testing. Like earlier than, 1000 iterations of those splits have been performed. SMOTE was once more used within the coaching set⁴⁸. Each threat rating was examined individually in addition to each doable grouping of scores. Models with particular person threat components have been examined as a comparability to the CRS and PXS. This resulted in seventeen distinct fashions (Table 3). In every mannequin, age and intercourse have been utilized as covariates. Performance metrics used have been AUROC, space underneath the precision-recall curve (AUPRC), F1 rating, and balanced accuracy. Mean metrics throughout the 1000 iterations have been computed.

Table 3:

Overview of analysis fashions.

Model Number	Model	Number of Features
1	PGS	3
2	CRS_SUM	3
3	CRS_WEIGHTED_SUM	3
4	PXS_SUM	3
5	PXS_WEIGHTED_SUM	3
6	PGS + CRS_SUM	4
7	PGS + CRS_WEIGHTED_SUM	4
8	PGS + PXS_SUM	4
9	PGS + PXS_WEIGHTED_SUM	4
10	CRS_SUM + PXS_SUM	4
11	CRS_WEIGHTED_SUM + PXS_WEIGHTED_SUM	4
12	PGS + CRS_SUM + PXS_SUM	5
13	PGS + CRS_WEIGHTED_SUM + PXS_WEIGHTED_SUM	5
14	CRS Risk Factors	7
15	PXS Risk Factors	12
16	CRS Risk Factors + PXS Risk Factors	17
17	PGS + CRS Risk Factors + PXS Risk Factors	18

3. Results

Eighteen threat components have been initially chosen for inclusion within the research (Table 1). Eight scientific threat components have been chosen for the CRS, whereas 4 way of life threat components and 6 SDOH threat components have been chosen the PXS (Table 1, Supplementary Table 1). Nutrition and single-living standing have been excluded as a consequence of excessive knowledge missingness (Table 1). DBP, SBP, and 7 neighborhood variables have been excluded as a result of they weren’t important in ≥ 95% of logistic regression iterations (Table 1, Table 2, Supplementary Table 1). Glucose and HbA1c have been excluded as a result of they weren’t essential in ≥ 95% of LASSO regression iterations (Table 1, Table 2). Additionally, triglyceride ranges have been excluded as a result of they have been neither important nor essential (Table 1, Table 2). After filtering, ten threat components remained, together with three scientific threat components (HDL ldl cholesterol, LDL ldl cholesterol, and T2D) within the CRS, and three way of life threat components (bodily exercise, smoking standing and BMI) and 4 SDOH threat components (revenue degree, training degree, neighborhood, and census-based revenue) within the PXS, together with seventeen neighborhood variables (Table 1, Table 2, Supplementary Table 1).

Table 2:

Significance and significance of variables throughout 1000 iterations.

Category (Risk Score)	Risk Factor	Mean P-Value	Percent Significant	Mean Absolute Coefficient	Percent Important
Clinical (CRS)	Diastolic Blood Pressure	0.048	88.30%	1.322	96.30%
Clinical (CRS)	Glucose	0.007	97.80%	0.819	93.80%
Clinical (CRS)	HbA1c	0.001	99.70%	0.914	91.90%
Clinical (CRS)	HDL ldl cholesterol	0.008	98.00%	2.014	97.70%
Clinical (CRS)	LDL ldl cholesterol	0.000	99.90%	1.403	95.00%
Clinical (CRS)	Systolic Blood Pressure	0.112	73.50%	0.946	92.60%
Clinical (CRS)	T2D	9.50E-81	100.00%	0.980	100.00%
Clinical (CRS)	Triglycerides	0.120	71.40%	0.924	92.60%
Lifestyle (PXS)	BMI	3.82E-20	100.00%	2.511	100.00%
Lifestyle (PXS)	Everyday Physical Activity	4.40E-48	100.00%	0.847	100.00%
Lifestyle (PXS)	Smoking	0.004	99.10%	0.242	99.70%
SDOH (PXS)	Annual Income	1.16E-54	100.00%	0.397	99.40%
SDOH (PXS)	Census Median Income	3.40E-18	100.00%	0.914	96.90%
SDOH (PXS)	Highest Education	6.23E-08	100.00%	0.385	97.30%
SDOH (PXS)	Neighborhood- Abandoned Buildings	0.002	99.40%	0.293	97.10%
SDOH (PXS)	Neighborhood- Alcohol Use	0.023	94.30%	0.667	99.40%
SDOH (PXS)	Neighborhood- Lots of Crime	0.005	99.00%	0.365	98.00%
SDOH (PXS)	Neighborhood- Cleanliness	0.016	95.30%	0.288	97.90%
SDOH (PXS)	Neighborhood-Crime Rate Makes It Unsafe to Walk at Night	0.001	99.90%	0.343	99.00%
SDOH (PXS)	Neighborhood- Crime Rate Makes It Unsafe to Walk During the Day	1.95E-13	100.00%	0.544	99.00%
SDOH (PXS)	Neighborhood- Drug Use	0.003	99.40%	0.484	97.20%
SDOH (PXS)	Neighborhood- Facilities to Bike	2.47E-07	100.00%	0.324	99.40%
SDOH (PXS)	Neighborhood- Free/Low-Cost Recreation Facilities	0.001	99.20%	0.243	99.30%
SDOH (PXS)	Neighborhood- Get Along with Neighbors	0.113	73.00%	0.835	99.20%
SDOH (PXS)	Neighborhood- Graffiti	0.014	96.20%	0.301	97.90%
SDOH (PXS)	Neighborhood-Main Type of Housing	0.075	80.80%	0.256	98.30%
SDOH (PXS)	Neighborhood- Neighbors Can Be Trusted	0.001	99.90%	0.451	96.70%
SDOH (PXS)	Neighborhood- Neighbors Take Good Care of Their Homes	0.006	98.80%	0.256	97.80%
SDOH (PXS)	Neighborhood- Neighbors Watch Out for Each Other	0.046	88.00%	0.289	98.40%
SDOH (PXS)	Neighborhood- Noise	0.099	77.40%	0.255	97.70%
SDOH (PXS)	Neighborhood- People Share the Same Values	0.046	88.30%	0.293	97.90%
SDOH (PXS)	Neighborhood- Safe from Crime	0.002	99.40%	0.303	97.80%
SDOH (PXS)	Neighborhood- Shops, Stores, Markets or Other Places to Buy Things are Within Walking Distance	0.000	99.90%	0.309	99.40%
SDOH (PXS)	Neighborhood- Sidewalks on Most Streets	0.008	97.90%	0.159	98.70%
SDOH (PXS)	Neighborhood- Too Many People Hanging Around on the Streets Near Home	0.014	96.40%	0.345	96.60%
SDOH (PXS)	Neighborhood- Transit Stop Within Walking Distance	0.019	95.20%	0.152	98.70%
SDOH (PXS)	Neighborhood- Trouble with Neighbors	0.002	99.50%	0.347	97.70%
SDOH (PXS)	Neighborhood- Vandalism	0.006	98.50%	0.304	97.20%
SDOH (PXS)	Social Deprivation Index	0.090	77.60%	0.697	93.40%

Prevalence of HF was ~5.1% in AOU. Removing lacking knowledge decreased the pattern measurement considerably. The pattern measurement decreased from 406,513 (n-controls = 386,518, n-cases = 19,995) to 22,594 (n-controls = 22,275, n-cases = 319). Lab values and neighborhood variables had the bottom proportion of non-missingness, which contributed essentially the most to the pattern measurement lower (Supplementary Table 2). The case rely specifically dropped as a result of people needed to have lab values and bodily exercise survey solutions earlier than their first HF prognosis to be included. However, SMOTE was used to extend the case rely to match the variety of controls in all splits besides the ultimate testing set.

Model 13 (PGS + weighted CRS + weighted PXS) carried out greatest based mostly on AUROC and balanced accuracy, whereas mannequin 17 (PGS + CRS threat components + PXS threat components) carried out greatest based mostly on AUPRC and F1 rating (Table 3, Table 4, Figure 2). Based on AUROC and balanced accuracy, the following greatest performing mannequin was mannequin 11 (weighted CRS + weighted PXS) (Table 3, Table 4, Figure 2). Based on F1 rating and AUPRC, the following greatest performing fashions have been mannequin 13 (PGS + weighted CRS + weighted PXS) and mannequin 16 (CRS threat components + PXS threat components) (Table 3, Table 4, Figure 2). After this, the efficiency rating of the fashions various by metric (Table 3, Table 4, Figure 2). However, based mostly on AUROC, F1 rating, and balanced accuracy, the worst performing fashions have been mannequin 4 (unweighted PXS), mannequin 6 (PGS + unweighted CRS), mannequin 2 (unweighted CRS), and mannequin 1 (PGS) (Table 3, Table 4, Figure 2). Model 4 (unweighted PXS), mannequin 2 (unweighted CRS), and mannequin 1 (PGS) have been among the many worst performing fashions based mostly on balanced accuracy, with the addition of mannequin 5 (weighted PXS) (Table 3, Table 4, Figure 2).

Table 4:

Model analysis metrics.

Model Number	Risk Factor	AUROC	AUPRC	F1 Score	Balanced Accuracy
1	PGS	0.669	0.034	0.044	0.607
2	CRS_SUM	0.676	0.033	0.042	0.598
3	CRS_WEIGHTED_SUM	0.736	0.042	0.055	0.655
4	PXS_SUM	0.709	0.032	0.052	0.646
5	PXS_WEIGHTED_SUM	0.730	0.034	0.056	0.660
6	PGS + CRS_SUM	0.678	0.036	0.043	0.603
7	PGS + CRS_WEIGHTED_SUM	0.737	0.046	0.057	0.662
8	PGS + PXS_SUM	0.715	0.036	0.054	0.652
9	PGS + PXS_WEIGHTED_SUM	0.734	0.038	0.057	0.665
10	CRS_SUM + PXS_SUM	0.732	0.039	0.055	0.652
11	CRS_WEIGHTED_SUM + PXS_WEIGHTED_SUM	0.762	0.043	0.061	0.678
12	PGS + CRS_SUM + PXS_SUM	0.734	0.041	0.057	0.658
13	PGS + CRS_WEIGHTED_SUM + PXS_WEIGHTED_SUM	0.763	0.047	0.062	0.682
14	CRS Risk Factors	0.729	0.040	0.055	0.655
15	PXS Risk Factors	0.723	0.041	0.060	0.647
16	CRS Risk Factors + PXS Risk Factors	0.738	0.046	0.065	0.658
17	PGS + CRS threat components + PXS threat components	0.738	0.047	0.065	0.657

Figure 2: — AOU Receiver-operator (ROC) curves (**Panel A**) and precision-recall (PRC) curves (**Panel B**).

4. Discussion

This research exhibited that integration of genetic, scientific, way of life, and SDOH threat components improved predictive efficiency of coronary heart failure threat compared to the separate threat scores alone. Based on all metrics, the built-in threat fashions containing threat components throughout all domains (mannequin 13, PGS + weighted CRS and PXS, and mannequin 17, PGS + CRS and PXS threat components) have been the highest performing fashions (Table 3, Table 4, Figure 2). Based on AUPRC and F1 rating, mannequin 17 (PGS + CRS and PXS threat components) carried out greatest however mannequin 13 (PGS + weighted CRS and PXS) carried out greatest based mostly on AUROC and balanced accuracy (Table 3, Table 4, Figure 2). Thus, it’s unclear which integration metric is greatest, however fashions containing threat components throughout a number of domains have been all the time the highest performing mannequin. This discovering is in line with earlier literature^37,39,40.

The PGS appeared to contribute the least to mannequin efficiency, as mannequin 1 (PGS) carried out worse than mannequin 3 (weighted CRS), mannequin 14 (CRS threat components), mannequin 5 (weighted PXS), and mannequin 15 (PXS threat components) based mostly on all metrics, which is in line with earlier literature (Table 3, Table 4, Figure 2)³⁷. Additionally, mannequin 11 (weighted CRS + weighted PXS) and mannequin 16 (CRS threat components + PXS threat components) have been among the many high performing fashions. It is unclear whether or not scientific threat components or way of life and SDOH threat components contributed extra, as efficiency rating various by metric, and quantitative variations have been very small (Table 3, Table 4, Figure 2). In addition, fashions 3 (weighted CRS) and 5 (weighted PXS) carried out higher than fashions 2 (unweighted CRS) and 4 (unweighted PXS), demonstrating that integrating threat issue impact sizes enhances predictive efficiency.

Model 1 (PGS) yielded a decrease AUROC in comparison with the research through which the weights have been derived, presumably as a consequence of variations within the take a look at dataset, pattern measurement, or consequence phenotyping (Table 3, Table 4, Figure 2)¹⁵. Additionally, the PGSC-CALC pipeline normalized the PGS based mostly on variations in inhabitants means and commonplace deviations, whereas the earlier research didn’t normalize the PGS and as an alternative scaled/centered the PGS inside particular person ancestry teams and utilized principal elements as covariates within the regression fashions^15,42–44. Models 2 (unweighted CRS) and three (weighted CRS) additionally had a decrease AUROC than seen in earlier literature (Table 3, Table 4, Figure 2)³⁸. However, these research included different scientific variables reminiscent of numerous scientific circumstances, lab measurements, and medicines, so it’s doable that these variables along with totally different pattern sizes, datasets, or mixture strategies, might have led to AUROC variations³⁸. Model 5 (weighted PXS) has solely been evaluated in a single prior research, which didn’t use the identical efficiency metrics, stopping direct pairwise comparability³⁷.

Model coaching yielded variables that have been thought-about most predictive. Based on logistic regressions, the ten threat components with the bottom imply p-values have been T2D, revenue, bodily exercise, BMI, census-based revenue, training, 4 neighborhood variables, and LDL ldl cholesterol (Table 2). T2D, LDL, BMI, bodily exercise, and census-based revenue overlapped with the ten threat components with the best imply absolute coefficients based mostly on LASSO regressions (Table 2). Other variables deemed extra essential by LASSO regressions have been HDL ldl cholesterol, DBP, SBP, triglycerides, and HbA1c (Table 2). SDOH threat components appeared extra important in logistic regressions, whereas scientific threat components appeared extra essential in LASSO regressions (Table 2). Thus, there’s some variability based mostly on the characteristic choice technique.

This research had a number of limitations. Sample measurement was low as a consequence of elimination of lacking values, and efficiency might have improved if it was greater, significantly within the coaching units. It can be doable that extra predictors might have appeared important and essential in ≥ 95% of iterations, which may have modified efficiency. This research additionally didn’t embrace a validation dataset, and replication would have strengthened these findings. However, these findings are replicated by related research within the literature^37,38,40. Several different components might have improved efficiency. For instance, efficiency might have improved with the inclusion extra scientific, way of life, and SDOH threat components related to coronary heart failure, as not all identified HF/CVD threat components have been included on this research. CVDs have some overlapping genetic structure, so correcting for this may increasingly have improved efficiency⁴⁹. PGS efficiency was worse than threat components in different domains, and it’s doable that stratifying the pattern to high-risk people labeled by different CVD threat components might enhance its efficiency²⁶. In addition, efficiency might have been enhanced with ancestry-stratified analyses, however the pattern measurement was too low to additional subset the cohort. As AOU continues to develop, we anticipate that pattern sizes will enhance and far of those threat components will turn out to be populated in these datasets sooner or later. There are additionally some limitations concerning the phenotyping technique. Rule of two phenotyping accounts for some error in ICD-based phenotyping, however it’s doable that there could also be some current errors as ICD codes are used for billing functions and never prognosis documentation⁴⁶. In addition, lots of the way of life and SDOH threat components have been based mostly on self-reported survey knowledge, which is inclined to bias⁵⁰. Last, prevalence of HF within the common inhabitants is estimated to be 1.9 – 2.8%, which is decrease than prevalence in these datasets¹. This, along with participation bias of people who decide in to biobank research, might influence the generalizability of this research’s findings to the final inhabitants⁵¹.

Despite these limitations, this research demonstrated that integrating threat components throughout a number of domains improves predictive efficiency of HF threat, highlighting the necessity to contemplate these threat components in scientific settings. Integrating polygenic scores with scientific, way of life, and SDOH threat components could also be used to enhance early detection of HF, and subsequently its opposed penalties. Such multi-domain threat fashions might facilitate extra exact threat stratification in scientific settings, enabling earlier interventions and improved administration for people in danger for HF.

Supplementary Material

suppl desk

Supplementary Material

All supplemental materials might be discovered at: https://ritchielab.org/publications/supplementary-data/psb-2026/hf-irm. All code might be discovered at: https://github.com/RitchieLab/HF_IRM.git

5. Acknowledgements

We gratefully acknowledge All of Us members for his or her contributions, with out whom this analysis wouldn’t have been doable. We additionally thank the National Institutes of Health’s All of Us Research Program for making out there the participant knowledge examined on this research. Funding was supplied by HL169458.

References

1.Bozkurt B
et al.
HF STATS 2024: Heart Failure Epidemiology and Outcomes Statistics An Updated 2024 Report from the Heart Failure Society of America. J. Card. Fail. 31, 66–116 (2025).
[DOI] [PubMed] [Google Scholar]
2.Wang H
et al.
Importance of early prognosis and therapy of coronary heart failure throughout the spectrum of ejection fraction. Eur. Heart J. 44, ehad655.892 (2023). [Google Scholar]
3.Kittleson MM
et al.
2024 Update to the 2020 ACC/AHA Clinical Performance and Quality Measures for Adults With Heart Failure: A Report of the American Heart Association/American College of Cardiology Joint Committee on Performance Measures. Circ. Cardiovasc. Qual. Outcomes
17, e000132 (2024).
[DOI] [PubMed] [Google Scholar]
4.2025 Heart Disease and Stroke Statistics: A Report of US and Global Data From the American Heart Association | Circulation. https://www.ahajournals.org/doi/10.1161/CIR.0000000000001303. [DOI] [PMC free article] [PubMed]
5.Bazoukis G
et al.
Impact of Social Determinants of Health on Cardiovascular Disease. J. Am. Heart Assoc. 14, e039031 (2025).
[DOI] [PMC free article] [PubMed] [Google Scholar]
6.Ana Palacio MD
et al.
Social Determinants of Health Score: Does It Help Identify Those at Higher Cardiovascular Risk?
26, (2020). [DOI] [PubMed] [Google Scholar]
7.Jilani MH
et al.
Social Determinants of Health and Cardiovascular Disease: Current State and Future Directions Towards Healthcare Equity. Curr. Atheroscler. Rep. 23, 55 (2021).
[DOI] [PubMed] [Google Scholar]
8.Bevan GH, Nasir Okay, Rajagopalan S & Al-Kindi S
Socioeconomic Deprivation and Premature Cardiovascular Mortality within the United States. Mayo Clin. Proc. 97, 1108–1113 (2022).
[DOI] [PMC free article] [PubMed] [Google Scholar]
9.Jia Y
et al.
Effect of Air Pollution on Heart Failure: Systematic Review and Meta-Analysis. Environ. Health Perspect. 131, 76001 (2023).
[DOI] [PMC free article] [PubMed] [Google Scholar]
10.Feng J, Zhang Y & Zhang J
Epidemiology and Burden of Heart Failure in Asia. JACC Asia
4, 249–264 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
11.Lindgren MP
et al.
A Swedish Nationwide Adoption Study of the Heritability of Heart Failure. JAMA Cardiol. 3, 703–710 (2018).
[DOI] [PMC free article] [PubMed] [Google Scholar]
12.Miller DT
et al.
ACMG SF v3.2 checklist for reporting of secondary findings in scientific exome and genome sequencing: A coverage assertion of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. Off. J. Am. Coll. Med. Genet. 25, 100866 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Brownrigg JR
et al.
Epidemiology of cardiomyopathies and incident coronary heart failure in a population-based cohort research. Heart Br. Card. Soc. 108, 1383–1391 (2022). [DOI] [PubMed] [Google Scholar]
14.Tada H
et al.
Familial hypercholesterolemia is expounded to heart problems, coronary heart failure and atrial fibrillation. Results from a population-based research. Eur. J. Clin. Invest. 54, e14119 (2024).
[DOI] [PubMed] [Google Scholar]
15.Lee DSM
et al.
Common-variant and rare-variant genetic structure of coronary heart failure throughout the allele-frequency spectrum. Nat. Genet. 57, 829–838 (2025).
[DOI] [PMC free article] [PubMed] [Google Scholar]
16.Choi SW, Mak TS-H & O’Reilly PF
Tutorial: a information to performing polygenic threat rating analyses. Nat. Protoc. 15, 2759–2772 (2020).
[DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ratman D
et al.
Polygenic threat scores enhance CAD threat prediction in people at borderline and intermediate scientific threat. Npj Cardiovasc. Health
2, 13 (2025). [Google Scholar]
18.Patel AP
et al.
A multi-ancestry polygenic threat rating improves threat prediction for coronary artery illness. Nat. Med. 29, 1793–1803 (2023).
[DOI] [PMC free article] [PubMed] [Google Scholar]
19.Gibson JT & Rudd JHF
Polygenic threat scores in atrial fibrillation: Associations and scientific utility in illness prediction. Heart Rhythm
21, 913–918 (2024).
[DOI] [PubMed] [Google Scholar]
20.Khera AV
et al.
Genome-wide polygenic scores for frequent ailments establish people with threat equal to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
[DOI] [PMC free article] [PubMed] [Google Scholar]
21.Ge T
et al.
Development and validation of a trans-ancestry polygenic threat rating for sort 2 diabetes in numerous populations. Genome Med. 14, 70 (2022).
[DOI] [PMC free article] [PubMed] [Google Scholar]
22.Roberts E, Howell S & Evans DG
Polygenic threat scores and breast most cancers threat prediction. Breast Edinb. Scotl. 67, 71–77 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Mars N
et al.
Polygenic and scientific threat scores and their influence on age at onset and prediction of cardiometabolic ailments and customary cancers. Nat. Med. 26, 549–557 (2020).
[DOI] [PubMed] [Google Scholar]
24.Duncan L
et al.
Polygenic scores for psychiatric issues in a various postmortem mind tissue cohort. Neuropsychopharmacology
48, 764–772 (2023).
[DOI] [PMC free article] [PubMed] [Google Scholar]
25.Liu H, Wang L, Yu H, Chen J & Sun P
Polygenic Risk Scores for Bipolar Disorder: Progress and Perspectives. Neuropsychiatr. Dis. Treat. 19, 2617–2626 (2023).
[DOI] [PMC free article] [PubMed] [Google Scholar]
26.O’Sullivan JW
et al.
Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation
146, e93–e118 (2022).
[DOI] [PMC free article] [PubMed] [Google Scholar]
27.Soh CH, Xiang R, Takeuchi F & Marwick TH
Use of Polygenic Risk Score for Prediction of Heart Failure in Cancer Survivors. JACC CardioOncology
6, 714–727 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
28.Han Y
et al.
A novel polygenic threat rating improves prognostic prediction of coronary heart failure with preserved ejection fraction within the Chinese Han inhabitants. Eur. J. Prev. Cardiol. 30, 1382–1390 (2023).
[DOI] [PubMed] [Google Scholar]
29.Ahn H-J
et al.
Polygenic risk-based prediction of coronary heart failure in younger sufferers with atrial fibrillation: an evaluation from UK Biobank. EP Eur. 27, euaf104 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Martin AR
et al.
Clinical use of present polygenic threat scores might exacerbate well being disparities. Nat. Genet. 51, 584–591 (2019).
[DOI] [PMC free article] [PubMed] [Google Scholar]
31.Abramowitz SA
et al.
Evaluating Performance and Agreement of Coronary Heart Disease Polygenic Risk Scores. JAMA
333, 60–70 (2025).
[DOI] [PMC free article] [PubMed] [Google Scholar]
32.Kumuthini J
et al.
The scientific utility of polygenic threat scores in genomic medication practices: a scientific assessment. Hum. Genet. 141, 1697–1704 (2022).
[DOI] [PMC free article] [PubMed] [Google Scholar]
33.Xiang R
et al.
Recent advances in polygenic scores: translation, equitability, strategies and FAIR instruments. Genome Med. 16, 33 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
34.Lewis CM & Vassos E
Prospects for utilizing threat scores in polygenic medication. Genome Med. 9, 96 (2017).
[DOI] [PMC free article] [PubMed] [Google Scholar]
35.Torkamani A, Wineinger NE & Topol EJ
The private and scientific utility of polygenic threat scores. Nat. Rev. Genet. 19, 581–590 (2018).
[DOI] [PubMed] [Google Scholar]
36.Lennon NJ
et al.
Selection, optimization and validation of ten power illness polygenic threat scores for scientific implementation in numerous US populations. Nat. Med. 30, 480–487 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
37.He Y
et al.
Comparisons of Polyexposure, Polygenic, and Clinical Risk Scores in Risk Prediction of Type 2 Diabetes. Diabetes Care
44, 935–943 (2021).
[DOI] [PMC free article] [PubMed] [Google Scholar]
38.Sinha A
et al.
Risk-Based Approach for the Prediction and Prevention of Heart Failure. Circ. Heart Fail. 14, e007761 (2021).
[DOI] [PMC free article] [PubMed] [Google Scholar]
39.Small AM
et al.
Novel Polygenic Risk Score and Established Clinical Risk Factors for Risk Estimation of Aortic Stenosis. JAMA Cardiol. 9, 357–366 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
40.van Dam S
et al.
The necessity of incorporating non-genetic threat components into polygenic threat rating fashions. Sci. Rep. 13, 1351 (2023).
[DOI] [PMC free article] [PubMed] [Google Scholar]
41.Bick AG
et al.
Genomic knowledge within the All of Us Research Program. Nature
627, 340–346 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lambert SA
et al.
Enhancing the Polygenic Score Catalog with instruments for rating calculation and ancestry normalization. Nat. Genet. 56, 1989–1994 (2024).
[DOI] [PMC free article] [PubMed] [Google Scholar]
43.Khera AV
et al.
Whole-Genome Sequencing to Characterize Monogenic and Polygenic Contributions in Patients Hospitalized With Early-Onset Myocardial Infarction. Circulation
139, 1593–1602 (2019).
[DOI] [PMC free article] [PubMed] [Google Scholar]
44.Khan A
et al.
Genome-wide polygenic rating to foretell power kidney illness throughout ancestries. Nat. Med. 28, 1412–1420 (2022).
[DOI] [PMC free article] [PubMed] [Google Scholar]
45.Shuey MM
et al.
Next-generation phenotyping: introducing phecodeX for enhanced discovery analysis in medical phenomics. Bioinformatics
39, btad655 (2023).
[DOI] [PMC free article] [PubMed] [Google Scholar]
46.Schrodi SJ
The Impact of Diagnostic Code Misclassification on Optimizing the Experimental Design of Genetic Association Studies. J. Healthc. Eng. 2017, 7653071 (2017).
[DOI] [PMC free article] [PubMed] [Google Scholar]
47.Tindle HA
et al.
Lifetime Smoking History and Risk of Lung Cancer: Results From the Framingham Heart Study. J. Natl. Cancer Inst. 110, 1201–1207 (2018).
[DOI] [PMC free article] [PubMed] [Google Scholar]
48.Chawla NV, Bowyer KW, Hall LO & Kegelmeyer WP
SMOTE: artificial minority over-sampling approach. J Artif Int Res
16, 321–357 (2002). [Google Scholar]
49.Qiao J
et al.
Shared genetic structure contributes to threat of main cardiovascular ailments. Nat. Commun. 16, 8368 (2025).
[DOI] [PMC free article] [PubMed] [Google Scholar]
50.Choi BCK & Pak AWP. A Catalog of Biases in Questionnaires. Prev. Chronic. Dis. 2, A13 (2004).
[PMC free article] [PubMed] [Google Scholar]
51.Ridgeway JL
et al.
Potential Bias within the Bank: What Distinguishes Refusers, Non-responders and Participants in a Clinic-based Biobank?
Public Health Genomics
16, 10.1159/000349924 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This part collects any knowledge citations, knowledge availability statements, or supplementary supplies included on this article.

Supplementary Materials

suppl desk

This web page was created programmatically, to learn the article in its unique location you’ll be able to go to the hyperlink bellow:
https://pmc.ncbi.nlm.nih.gov/articles/PMC12952681/
and if you wish to take away this text from our website please contact us