1. Background
Breast cancer remains one of the most common cancers among women worldwide and represents a substantial global health burden (1-3). Data from the Global Cancer Statistics 2020 indicate that breast cancer accounted for the largest proportion of cancer diagnoses and cancer-related mortality among women worldwide (4). The report documented approximately 2.26 million new cases, accounting for 11.7% of all cancers, and 684,996 related deaths, accounting for 6.9% of all cancer-related deaths worldwide in 2020 (4). Despite advances in early detection and therapeutic strategies, identifying modifiable and biological factors associated with breast cancer risk remains a major research priority.
In clinical practice, liver abnormalities and metabolic liver diseases are frequently observed among patients with breast cancer. Several observational studies have reported an increased prevalence of nonalcoholic fatty liver disease (NAFLD) in women with breast cancer, as well as a higher incidence of liver-related complications during endocrine therapy or chemotherapy. For example, Taroeno-Hariadi et al. (5) documented that 35.0% of patients diagnosed with hormone receptor-positive, HER2-negative breast cancer also had a concurrent diagnosis of NAFLD. Experimental and clinical evidence further suggests that commonly used treatments, such as tamoxifen, may promote hepatic steatosis through pathways involving MAPK8/FoxO signaling (6). In addition, hepatitis B virus reactivation during or after chemotherapy has been widely reported in patients with breast cancer, underscoring the clinical relevance of liver-related comorbidities in this population (7).
Several previous studies have suggested potential biological links between liver diseases and breast cancer. Another study (8) emphasized in their review that NAFLD is the most typical metabolic abnormality, which not only induces liver cancer tumors but may also increase the risk of extrahepatic tumors, such as breast cancer, prostate cancer, thyroid cancer, gynecological cancer, kidney cancer, and lung cancer. However, most existing evidence linking liver diseases to breast cancer is derived from observational studies, which are inherently susceptible to confounding by age, sex, metabolic status, menopausal status, and treatment-related factors. Consequently, it remains unclear whether these associations reflect shared risk factors, disease-related metabolic alterations, or underlying biological links.
Cross-sectional epidemiological data were obtained from National Health and Nutrition Examination Survey (NHANES), a nationally representative US dataset incorporating interview, examination, and laboratory components (9-11). To further evaluate the relationship between liver diseases and breast cancer from a genetic perspective, we performed two-sample Mendelian randomization (MR) analyses using publicly available genome-wide association (GWAS) summary statistics (12-14). Single-nucleotide polymorphisms (SNPs) associated with liver disease phenotypes were selected as instrumental variables according to established analytical procedures (15-17).
2. Objectives
By integrating descriptive epidemiological data with genetic analyses, this study aimed to explore potential associations between liver diseases and breast cancer and to generate hypotheses regarding possible biological links, thereby informing future prospective and mechanistic investigations.
3. Methods
3.1. National Health and Nutrition Examination Survey Data Source and Statistical Analysis
Data were obtained from NHANES, a population-based program administered by the National Center for Health Statistics. A total of 8,727 participants surveyed between August 2021 and August 2023 were included in the analysis. Baseline demographic variables, including sex, height, and weight, were extracted from the Demographic Variables and Sample Weights files. Information on breast cancer history was obtained from the Medical Conditions Questionnaire items MCQ230A - C, and liver disease history was extracted from items MCQ510A - F.
Statistical analyses were performed using R software version 4.3.2. Continuous variables were evaluated for distributional characteristics and, given their nonnormality, are presented as medians with interquartile ranges. Comparisons between individuals with a history of breast cancer and cancer-free participants were performed using the Wilcoxon rank-sum test for continuous variables and the chi-square test for categorical variables. Because of the limited number of breast cancer cases, only descriptive comparisons were performed without multivariable adjustment. These analyses were exploratory in nature. All statistical tests were 2-sided, with the significance threshold set at P < 0.05.
3.2. Outcome Data for Breast Cancer
This study summarized GWAS findings at the summary level for breast cancer data, retrieved from the IEU OpenGWAS project using relevant search terms (18, 19). The selected dataset (GWAS ID: ebi-a-GCST007236) included 89,677 individuals of European ancestry, comprising 46,785 breast cancer cases and 42,892 controls, with a total of 13,742,658 SNPs available for analysis (Table 1) (20).
| GWAS ID | Trait | Population | Sample Size | Case Size | Control Size | SNPs |
|---|---|---|---|---|---|---|
| ebi-a-GCST007236 | Breast cancer | European | 89677 | 46785 | 42892 | 13742658 |
| finn-b-NAFLD | NAFLD | European | 218792 | 894 | 217898 | 16380466 |
| finn-b-CIRRHOSIS_BROAD | Cirrhosis, broad definition used in the article https://doi.org/10.1101/594523 | European | 218792 | 1931 | 216861 | 16380466 |
| finn-b-AB1_VIRAL_HEPATITIS | Viral hepatitis | European | 218792 | 1226 | 217566 | 16380466 |
| ebi-a-GCST90025969 | Autoimmune hepatitis | European | 166614 | 85 | 166529 | 12452657 |
Abbreviations: GWAS, genome-wide association; SNP, Single-nucleotide polymorphism; NAFLD, nonalcoholic fatty liver disease.
3.3. Selection of Instrumental Variables
Given the focus on liver diseases, GWAS summary statistics for 4 liver-related traits were obtained from the IEU OpenGWAS and FinnGen databases: NAFLD (GWAS ID: finn-b-NAFLD), cirrhosis (GWAS ID: finn-b-CIRRHOSIS_BROAD) (21, 22), autoimmune hepatitis (GWAS ID: ebi-a-GCST90025969) (23), and viral hepatitis (GWAS ID: finn-b-AB1_VIRAL_HEPATITIS) (Table 1).
Single-nucleotide polymorphisms associated with each exposure were selected using a stepwise genome-wide significance threshold strategy (P < 5 × 10-8, P < 1 × 10-7, P < 10-6, and P < 1 × 10-5), depending on the availability of variants for each phenotype. This approach was applied to evaluate the consistency of the findings across different instrument selection criteria and to ensure sufficient instrument strength for subsequent analyses. To minimize bias arising from linkage disequilibrium, independent SNPs were identified using PLINK-based clumping procedures, applying an r2 cutoff of 0.001 within a 10,000-kb window. In subsequent analyses, only independent SNPs were retained.
3.4. Two-Way Mendelian Randomization Analysis
Two-sample MR analyses were performed to examine the association between genetically predicted liver diseases and breast cancer risk. The primary MR methods included inverse variance weighted (IVW), MR-Egger regression, weighted median, simple mode, and MR-PRESSO approaches. A reverse MR framework was additionally applied to explore the potential association in the opposite direction. Heterogeneity among instrumental variables was evaluated using Cochran Q statistics, and horizontal pleiotropy was assessed using the MR-Egger intercept test. The MR-PRESSO procedure was used to identify potential outliers.
All MR analyses were performed using the TwoSampleMR package version 0.5.8 and the MRPRESSO package version 1.0 in R software version 4.3.2. Leave-one-out analyses were performed to assess the stability of the MR estimates.
4. Results
4.1. Baseline Characteristics of Participants with and Without a History of Breast Cancer in the National Health and Nutrition Examination Survey
Based on questionnaire data extracted from the NHANES database, a total of 8,727 participants were included in the present analysis. Among them, 161 individuals reported a history of breast cancer, 5,148 participants reported no history of any malignancy and were defined as controls, and 3,418 participants with other cancer types or missing information were excluded from subsequent analyses. Demographic characteristics, including age, sex, height, and body weight, were compared with those of the control group with no history of cancer. The Shapiro-Wilk test indicated that age, height, and weight were not normally distributed (P < 0.01). Consequently, nonparametric tests were used.
Participants with a history of breast cancer were older than controls (median age, 69.0 vs. 53.0 years). All individuals reporting breast cancer were female. Among female participants, body weight and Body Mass Index (BMI) were comparable between groups, whereas height was slightly lower in the breast cancer group. Differences in racial distribution were also observed, with a higher proportion of non-Hispanic White participants in the breast cancer group (Table 2).
| Variables | No History of Cancer (N = 5148) | History of Breast Cancer (N = 161) | χ2/W Statistics | Chi-square Test/Wilcoxon Signed-Rank Test P-Value |
|---|---|---|---|---|
| Age | 53.00 [37.00 - 65.00] | 69.00 [64.00 - 77.00] | 666685.00 | < 0.0001 |
| Gender | - | - | ||
| Male | 2317 (45.01) | 0 (0.00) | ||
| Female | 2831 (54.99) | 161 (100.00) | ||
| Female weight, kg | 74.20 [63.08 - 90.10] | 75.10 [61.30 - 91.40] | 226209.00 | 0.8659 |
| Female height, cm | 160.90 [155.90 - 165.70] | 159.10 [153.30 - 174.30] | 222807.00 | 0.0334 |
| Female BMI | 30.33 [24.50 - 34.80] | 29.60 [25.00 - 35.00] | 232261.00 | 0.4326 |
| Female race | 24.40 | 0.0002 | ||
| Mexican American | 206 (7.27) | 2 (1.24) | ||
| Other Hispanic | 321 (11.33) | 11 (6.83) | ||
| Non-Hispanic White | 1576 (55.67) | 166 (72.05) | ||
| Non-Hispanic Black | 385 (13.59) | 12 (7.45) | ||
| Non-Hispanic Asian | 176 (6.22) | 7 (4.35) | ||
| Other race, including multiracial | 167 (5.89) | 13 (8.07) |
Abbreviation: BMI, Body Mass Index.
a Values are expressed as No (%) unless otherwise indicated.
4.2. Distribution of Liver Diseases Among Participants with and Without a History of Breast Cancer
Liver disease status was assessed using Medical Conditions Questionnaire items MCQ510A - F, covering fatty liver disease, liver fibrosis, cirrhosis, viral hepatitis, autoimmune hepatitis, and other liver conditions.
Fatty liver disease was reported by 168 participants in total, including 147 individuals (2.85%) in the cancer-free control group and 21 individuals (13.14%) in the breast cancer group. The prevalence of fatty liver disease was significantly higher among participants with a history of breast cancer than among controls (χ² test, P < 0.0001).
No cases of liver fibrosis or cirrhosis were reported among participants with a history of breast cancer. The prevalence of viral hepatitis and autoimmune hepatitis did not differ significantly between the 2 groups. In contrast, other liver diseases were more frequently reported among cancer-free controls than among participants with a history of breast cancer (Table 3).
| Variables | No History of Cancer (N = 5148) | History of Breast Cancer (N = 161) | χ2/W Statistics | Chi-square Test/Wilcoxon Signed-Rank Test P-Value |
|---|---|---|---|---|
| Fatty liver (n = 168) | 5001 (97.15) / 147 (2.85) | 140 (86.86) / 21 (13.14) | 49.608 | < 0.0001 |
| Liver fibrosis (n = 4) | 5144 (99.93) / 4 (0.07) | 161 (100.00) / 0 (0.00) | - | - |
| Liver cirrhosis (n = 28) | 5120 (99.51) / 28 (0.49) | 161 (100.00) / 0 (0.00) | - | - |
| Viral hepatitis (n = 41) | 5109 (99.24) / 39 (0.76) | 159 (98.76) / 2 (1.24) | 0.0551 | 0.8517 |
| Autoimmune hepatitis (n = 9) | 5140 (99.86) / 8 (0.14) | 160 (99.38) / 1 (0.62) | 0.19515 | 0.6587 |
| Other liver disease (n = 50) | 5103 (91.26) / 45 (8.74) | 156 (96.89) / 5 (3.11) | 6.1124 | 0.0134 |
a Values are expressed as N (%).
4.3. Mendelian Randomization Assumptions and Study Design
To explore potential genetic associations between liver diseases and breast cancer susceptibility, we applied a two-sample MR design grounded in 3 fundamental assumptions: (1) The genetic variants selected as instrumental variables are strongly associated with liver disease phenotypes; (2) these variants are not correlated with confounding factors; and (3) their effects on breast cancer risk operate exclusively through liver disease-related pathways rather than alternative biological mechanisms (Figure 1).
Genome-wide association study summary statistics for NAFLD, liver cirrhosis, viral hepatitis, and autoimmune hepatitis were retrieved from the IEU OpenGWAS database and treated as exposure variables, while breast cancer GWAS data were used as the outcome (Figure 1). In addition, breast cancer as an exposure factor was explored using reverse MR.
4.4. Selection and Validation of Instrumental Variables
For each selected SNP, the proportion of explained variance (R²) and F statistics were calculated to evaluate instrument strength. All retained variants demonstrated F statistics greater than 10. Using this approach, 19 SNPs were identified for NAFLD, 31 for liver cirrhosis, 22 for viral hepatitis, 12 for autoimmune hepatitis, and 424 for breast cancer (Supplementary Table 1).
4.5. Associations Between Liver Diseases and Breast Cancer Risk
For NAFLD, 19 SNPs were successfully harmonized with the breast cancer GWAS dataset and retained for MR analysis after quality control (Supplementary Table 2; Supplementary Figure 1A). In 2-sample MR analyses, the multiplicative random-effects IVW method indicated a positive association between genetically predicted NAFLD and breast cancer risk (odds ratio = 2.2789; 95% confidence interval, 1.2851 - 4.0413). Consistent directions of effect were observed across several complementary MR approaches (Supplementary Table 2; Figure 2; Supplementary Figure 2A).
Similarly, for liver cirrhosis, 30 SNPs were retained as instrumental variables (Supplementary Table 2; Supplementary Figure 1B). Positive associations with breast cancer risk were observed in multiple MR methods, although effect estimates varied across analytical approaches (Supplementary Table 2; Figure 2; Supplementary Figure 2B).
Similarly, 20 SNPs for viral hepatitis and 10 SNPs for autoimmune hepatitis were retained after quality control (Supplementary Table 2; Supplementary Figures 1C and D). Across different MR models, genetically predicted viral hepatitis and autoimmune hepatitis also demonstrated positive associations with breast cancer risk, with some variability in effect magnitude (Supplementary Table 2; Figure 2; Supplementary Figures 2C and D).
Reverse MR analyses did not demonstrate consistent evidence supporting an association between genetically predicted breast cancer liability and the risk of liver diseases (Supplementary Table 2).
4.6. Sensitivity Analyses
Assessment of between-instrument heterogeneity using Cochran Q statistics revealed significant variability among instrumental variables in the IVW and MR-Egger models (all P < 0.05). Accordingly, multiplicative random-effects IVW models were applied to account for between-instrument variability (Supplementary Table 3). The MR-Egger intercept tests did not detect statistically significant evidence of directional horizontal pleiotropy (all P > 0.05). The MR-PRESSO analysis was further conducted to identify potential outliers, and, where applicable, outlier-corrected estimates were reported. Leave-one-out analyses did not reveal substantial changes in the overall effect estimates after the sequential removal of individual variants (Supplementary Figure 1).
5. Discussion
By integrating cross-sectional epidemiological observations from the NHANES cohort with genetic evidence derived from 2-sample MR analyses, this study provides a comprehensive evaluation of the potential relationship between liver diseases and breast cancer. The NHANES-based analysis revealed a higher prevalence of self-reported NAFLD among individuals with a history of breast cancer, suggesting a possible epidemiological link between metabolic liver disorders and breast cancer. However, given the cross-sectional design and reliance on self-reported diagnoses, these findings should be interpreted as descriptive rather than causal. In the MR analyses, positive associations were observed between genetically predicted liver diseases and breast cancer risk across several analytical approaches. Notably, the multiplicative random-effects IVW estimator indicated a positive association for NAFLD, whereas effect estimates for other liver disease phenotypes varied across MR methods. Heterogeneity among instrumental variables and variability between estimators underscore the complexity of the underlying biological relationships and warrant cautious interpretation.
Breast cancer remains one of the most prevalent malignancies worldwide and a leading cause of cancer-related mortality among women (24-26). In our NHANES cohort, participants with a history of breast cancer were older and exclusively female, whereas no clinically meaningful differences in body weight were observed between breast cancer cases and cancer-free controls. These findings are consistent with previous large-scale clinical and epidemiological studies reporting that BMI alone may not fully capture breast cancer risk, particularly across menopausal strata (27, 28). Differences in racial distribution were also observed. However, the NHANES-based comparison was purely descriptive and cannot establish independence because of substantial age and metabolic confounding.
Accumulating observational evidence suggests a potential link between metabolic liver disorders and breast cancer. For instance, another study (29) conducted a large historical cohort study and reported that NAFLD was significantly associated with increased incidence of hepatocellular carcinoma, colorectal cancer in men, and breast cancer in women, even after adjustment for metabolic confounders. In that study, NAFLD was associated with a higher overall cancer incidence rate compared with non-NAFLD status, providing further clinical epidemiological support for a potential relationship between hepatic steatosis and extrahepatic malignancies. In addition, Cuzick et al. (30) reported a significantly higher prevalence of NAFLD among patients with breast cancer than among controls (45.2% vs. 16.4%; P = 0.002), and multivariable analysis demonstrated that NAFLD was independently associated with breast cancer risk (odds ratio = 2.82; 95% confidence interval, 1.2 - 5.5; P = 0.016). Similarly, Neuhouser et al. showed that the Fatty Liver Index, a surrogate marker of NAFLD, was associated with an increased risk of breast cancer among postmenopausal women, with hazard ratios of 1.07 (95% confidence interval, 1.04 - 1.11) for a Fatty Liver Index of 30 - 60 and 1.11 (95% confidence interval, 1.05 - 1.17) for a Fatty Liver Index of 60 or greater (31).
Beyond epidemiological associations, several biologically plausible mechanisms may underlie the relationship between liver diseases and breast cancer. Chronic hepatic inflammation, altered estrogen metabolism, insulin resistance, and systemic inflammatory signaling represent shared metabolic disturbances that may influence both liver pathology and breast carcinogenesis. In advanced liver diseases such as cirrhosis, impaired hepatic estrogen clearance and altered hormonal regulation may increase systemic estrogen exposure. Autoimmune hepatitis, characterized by sustained immune dysregulation, may further contribute to systemic inflammatory activation and potential alterations in tumor immune surveillance. At a more specific level, emerging studies have explored liver-breast communication pathways. For example, hepatic fibroblast growth factor 21 has been implicated in tumor-promoting metabolic reprogramming, suggesting that hepatokines could contribute to systemic tumor-related signaling (32). Experimental evidence from murine models further indicates that exosomes derived from fatty liver may accumulate within mammary adipose tissue, potentially shaping a pro-tumorigenic microenvironment (33). Importantly, these findings do not necessarily imply a direct unidirectional causal pathway from liver disease to breast cancer; rather, they support the possibility of shared inflammatory and metabolic networks that may underlie the observed associations.
Despite these supportive lines of evidence, our MR findings did not demonstrate fully consistent associations across all estimators. Notably, a comprehensive 2-sample MR study evaluated genetically predicted NAFLD and 22 extrahepatic cancer outcomes, reporting significant associations for several tumor types, including female breast cancer, cervical cancer, laryngeal cancer, and lung cancer. However, similar to our analysis, the authors acknowledged potential pleiotropic influences and shared metabolic pathways that may complicate causal interpretation. Although certain MR approaches in our study suggested statistically significant associations for multiple liver disease phenotypes, effect estimates varied across methods. Such variability may reflect complex genetic architecture, pleiotropic pathways, or phenotype heterogeneity rather than a single direct causal mechanism. Therefore, the MR results should be interpreted as indicative of broader systemic mechanisms beyond a single direct causal pathway rather than definitive evidence of causality (34, 35).
This study has several limitations. Although the MR design reduces confounding and reverse causation, NAFLD and related liver diseases are complex metabolic phenotypes that share genetic determinants with obesity, insulin resistance, systemic inflammation, and hormonal regulation, which are well-established risk factors for breast cancer. Therefore, some instrumental variants may influence breast cancer risk through shared metabolic pathways rather than exclusively through liver-specific mechanisms, and violation of the exclusion restriction assumption cannot be completely excluded. In addition, heterogeneity across MR estimators suggests potential biological complexity or phenotype heterogeneity. These methodological considerations warrant cautious interpretation of the observed associations. Furthermore, the NHANES-based analysis relied on self-reported liver disease diagnoses without imaging or histological confirmation. Therefore, misclassification and limited clinical accuracy cannot be excluded, and the reported prevalence should not be interpreted as definitive population estimates.
5.1. Conclusions
In summary, by integrating cross-sectional population data with genetic epidemiological analyses, this study provides a systematic assessment of the relationship between liver diseases and breast cancer. Positive associations were observed in both epidemiological and genetic analyses, and emerging experimental studies offer biologically plausible mechanisms linking liver dysfunction to breast tumor progression. However, variability across MR estimators and the presence of heterogeneity limit definitive causal inference. These findings should therefore be considered hypothesis-generating and highlight the need for future longitudinal studies and mechanistic investigations to clarify the biological pathways connecting liver disease and breast cancer.

