Factors accurately predicting a patient's treatment response or progression are paramount in disease treatment studies. As a result, doctors can prescribe medications with more favorable effects and flexibility in treating various disorders. Disease progression can be halted through the management of modifiable risk factors. Classical statistical analysis is frequently used to identify potential dangers. However, there may be restrictions on their use, such as a lack of complete data or an insufficient sample size. Machine learning-based techniques are one novel approach to these issues. This study identified the factors influencing breast cancer metastasis to lymph nodes by fitting the best random forest model to the data. The accuracy of the fitted forest with its corrections was 72.2 percent.
The factors influencing lymph node metastasis in breast cancer were identified based on the results obtained from two indexes, namely MDA and MDG. According to MDA, the initial ten influential factors are grade, tubule formation, skin involvement, p53, peripheral involvement, nuclear pleomorphism, Ki67, tumor location, ER, and PR. According to MDG, the primary factors that influence lymph node metastasis in breast cancer are as follows: Age, grade, tubule formation, tumor size, nuclear pleomorphism, level of disease, mitosis, skin involvement, tumor location, and margin involvement. Identifying the factors influencing lymph node metastasis in this article is also supported by the results of other studies. In summary, we refer to the following studies:
In a study conducted by Jiang et al., machine learning and Shapley algorithms were employed to analyze a cohort of 1 405 breast cancer patients. The findings revealed that tumor size, age, Her2 marker, ER marker, and PR marker were identified as significant factors influencing breast cancer metastasis to lymph nodes. According to the findings of the present study, as indicated by the MDG index, five specific factors have been identified as influential in the process of lymph node metastasis in breast cancer (
31). In a study conducted by Purushotham and Venkatesh in 2021, 100 breast cancer patients were examined. The findings revealed a significant correlation between tumor size, grade, and stage and the occurrence of metastasis to lymph nodes in breast cancer. Specifically, the study found that an increase in these three factors was associated with an elevated likelihood of metastasis to lymph nodes (
32). In 2021, a cross-sectional study was conducted by Hermansyah et al. to analyze the data from 51 medical records of breast cancer patients. The study revealed a significant relationship between the grade variable and the occurrence of metastasis to lymph nodes in breast cancer, as determined by the chi-square test results. In the present study, the variable "grade" is identified as one of the ten factors influencing breast cancer metastasis to lymph nodes (
33). In their study, Li et al. analyzed the medical records of 1131 patients diagnosed with breast cancer. Their findings indicate a significant association between Ki67 expression and various factors, including grade, PR, ER, Her2, and P53. According to our study, the expression of Ki67 and factors such as grade, ER, PR, and P53 marker are among the ten influential factors in breast cancer (
34).
Sujarittanakaren et al. discovered a noteworthy correlation between PR, ER, and Her2 markers and the occurrence of metastasis in lymph nodes. Consequently, in cases where it is not possible to assess the status of PR, ER, and Her2 in the primary tumor, evaluating the status of lymph node metastasis can be an alternative method. In the present article, the two markers, ER and PR, have been identified as two of the ten factors influencing breast cancer metastasis to lymph nodes (
35). Chand et al. examined 50 cases involving patients diagnosed with breast cancer. Their research findings indicate a significant association between the variable of tumor location and the occurrence of metastasis to lymph nodes. According to the current study's findings, the tumor location variable has been identified as one of the ten influential factors in the metastasis of breast cancer to lymph nodes (
36). In a cohort study conducted in 2023, Zahra Zarean Shahraki et al. utilized the random forest algorithm to analyze a sample of 3 580 female patients diagnosed with breast cancer. The study identified tumor status, age at diagnosis, lymph node status, type of surgery, tumor stage, and duration of breastfeeding as the most influential variables for predicting the probability of breast cancer survival. Based on the present study's findings, age and tumor stages have been identified as factors that impact the metastasis of breast cancer to lymph nodes. Consequently, it is probable to consider that the factors that impact the survival of breast cancer patients may also influence the occurrence of lymph node metastasis in these individuals (
37). According to Shahrbanu Keyhanian et al., breast cancer is the predominant cancer among women, a significant cause of cancer-related mortality globally. The study revealed that factors such as tumor size and type, histological grade, and the status of estrogen and progesterone receptors were identified as significant determinants of lymph node involvement. Additionally, it was determined that there is no significant correlation between age and the combined status of estrogen and progesterone receptors concerning lymph node involvement. The present study has identified these factors as influential factors in breast cancer (
38).
In their study, Dolatkhahi et al. examined the medical records of 5 208 patients at the Cancer Research Center of Shahid Beheshti University of Medical Sciences and Health Services. The researchers employed decision trees, random forests, and support vector machines as machine learning techniques. Their findings indicate that the random forest method achieved the highest level of performance, with an accuracy of 94.75% and a reliability of 97.26%, surpassing the results obtained from the other two methods (
39). Kabir Ahmad and Yusoff analyzed a dataset consisting of 700 samples. This dataset included 458 cases classified as benign and 241 as malignant. The objective of their research was to employ random forest as a method for accurately classifying breast cancer lesions through fine needle aspiration (FNA). The researchers discovered that the random forest method, with a precision rate of 72%, demonstrated the ability to effectively classify different types of breast cancer. This approach demonstrates significant potential as a valuable tool for early cancer detection, facilitating the differentiation between malignant and benign tumors (
40). In the study conducted by Olivotto et al., it was determined that several factors, including tumor size, margin involvement, tumor grade, and patient age, impact breast cancer metastasis to the lymph nodes. The current study identified the four factors above as part of a comprehensive list of ten factors impacting lymph node metastasis (
41).
4.1. Conclusions
The random forest algorithm demonstrates satisfactory accuracy in effectively discerning between different categories. Given the missing data within the study, this algorithm offers a viable approach for effectively handling missing data. The random forest algorithm, which incorporates multiple sampling of variables and their utilization in constituent trees, effectively addresses the issue of small data volume. As a result, it yields accurate and acceptable results from a clinical perspective and in similar studies. It is recommended that medical professionals utilize the random forest model developed in the present study.