This study presents a retrospective analysis of patient data to predict the mortality of COVID-19 patients hospitalized in referral hospitals between 2010 and 2021. Machine learning algorithms were applied to predict disease outcomes based on clinical data from hospitalized patients.
Lai et al. used the Adaptive Boosting algorithm to identify the most effective variables for predicting mortality in COVID-19 patients. Their findings revealed that lymphocyte counts were significantly lower in patients with severe COVID-19 compared to those with mild cases (
18).
Lymphocyte count and CRP are two important variables in predicting the risk of death in patients with COVID-19. Several studies have shown that lymphocyte count serves as a universal predictor of health outcomes in COVID-19 patients (
19).
Windradi et al. have indicated that CRP, as an acute-phase protein, is an effective marker for predicting severe COVID-19 (
20). In a meta-analysis study, it was demonstrated that CRP is a significant variable in distinguishing between severe and mild cases of COVID-19 (
21).
In the present study, we found that RBC was an effective variable for predicting the risk of death in COVID-19 patients. Hemoglobin in RBCs is considered an important biomarker, reflecting oxygen levels in the blood and serving as a significant variable in predicting COVID-19 mortality (
22). Thomas et al. showed that RBC counts were significantly higher in COVID-19 patients compared to healthy individuals (
23).
Additionally, age has been identified as a crucial variable for predicting COVID-19 mortality (
24,
25). Bonanad et al. conducted a meta-analysis of 611,583 COVID-19 patients across five continents to investigate mortality rates among different age groups. They found that the mortality rate for individuals under 50 years old was 1.1%, and this rate increased with age, peaking in individuals aged 80 years or older (
26). Another study found that individuals aged 55-64 years had an 8.1-fold higher COVID-19 mortality rate than those under 55 years of age (
27). These findings suggest that age is a significant predictor of COVID-19 mortality. As age increases, the mortality rate also rises, with the highest mortality rates observed in patients aged 80 years and above (
24).
Lyu et al. aimed to evaluate the severity of COVID-19 based on HRCT images. They found that the mean lung density, measured on the HU Scale, was higher in patients with severe COVID-19 compared to healthy individuals (
28). In our study, the mean lung density in deceased individuals was also found to be higher than in those who recovered. Notably, the diagnostic value of CT scanning in assessing lung density has already been well-established and is considered preferable to other subjective visual examinations (
29).
The data suggest that lung density is a potential imaging tool for assessing the severity of COVID-19, and its results can be valuable for identifying patients at risk of severe disease progression (
30). However, further studies are necessary to validate the clinical utility of lung density analysis in managing COVID-19.
Additionally, we observed that the average D-dimer level was significantly lower in recovered individuals compared to deceased patients (P-value = 0.001). D-dimer is a blood biomarker that plays a critical role in predicting outcomes for patients with COVID-19 (
31). One study indicated that the mean D-dimer level in patients with mild COVID-19 was approximately one-sixth of that in patients with severe disease (
32).
It has also been demonstrated that patients with malignancies are at a higher risk of COVID-19 infection and severe complications due to their immunocompromised state (
33). Similarly, other studies have reported an increased rate of COVID-19-associated mortality among cancer patients (
34,
35).
The risk of severe COVID-19 outcomes increases with age, and patients with malignant tumors are at a higher risk for severe illness due to their underlying medical conditions (
36). During the COVID-19 pandemic, cancer patients have had limited access to medical facilities and services, which has increased the likelihood and severity of their conditions (
37). In our study, a significant difference was observed in the proportion of cancer patients between the deceased and recovered groups (P-value = 0.02). Patients with malignancies are at higher risk for severe complications and mortality from COVID-19 due to their immunocompromised state and underlying medical conditions. Vaccination has been shown to help reduce deaths and severe illness from COVID-19, as well as to decrease transmission in these patients (
38).
In recent studies, predicting the severity and mortality of COVID-19 has been a major focus. Several studies have explored the relationship between COVID-19 and mortality, including excess mortality due to COVID-19, as well as machine learning models to predict mortality and critical events in COVID-19 patients. In a study by Akhtar et al., 10 machine learning algorithms were used to predict COVID-19 infection based on CBC results (
39). According to their results, the highest accuracy (100%) in predicting infection was achieved by three algorithms: Random Forest, K Nearest Neighbor (KNN), and kStar. These findings suggest that machine learning algorithms can be useful in predicting COVID-19 infection based on CBC results. Further research is needed to establish the clinical utility of these algorithms in managing COVID-19. Moulaei et al. conducted a study on 1500 COVID-19 patients to predict mortality using various machine learning models. Their results showed that the ML and RF methods had the highest accuracy (> 80%) (
1). In another study, Zakariaee et al. assessed the performance of four machine learning algorithms (LR, RF, SVM, and XGBoost) and found that XGBoost had the best performance in terms of AUC (
40).
Schiaffino et al. conducted a study on 897 hospitalized COVID-19 patients to predict in-hospital mortality using HRCT scans. The algorithms used in this study were Support Vector Machine (SVM) and multi-layer perceptron (MLP). The area under the ROC curve for the SVM and MLP models was 0.74 and 0.84, respectively (
41). Nuthalapati et al. used deep learning methods to predict mortality or hospitalization in the intensive care unit (ICU) for COVID-19 patients. Other variables, such as HRCT images and electronic health record (HER) data, were used in this study. They found that the normal lung volume, normal lung percentage (NLperc), muscle volume, fat volume, muscle-fat ratio, age, sex, and lesion percentage were the most important variables for predicting mortality and ICU hospitalization. The area under the ROC curve was approximately 0.77 (
42). Other studies have also explored the use of deep learning algorithms in analyzing body composition on CT scans to predict outcomes in COVID-19 patients. In this context, Zhang et al. (as cited by Nachit et al.) used a deep learning algorithm to analyze body composition on CT scans and found that myosteatosis was a key predictor of mortality in asymptomatic adults (
43). These findings suggest that deep learning algorithms can be useful in predicting outcomes in COVID-19 patients based on body composition analysis. Further research is needed to establish the clinical utility of these algorithms in COVID-19 management.
Machine learning algorithms have been used in many studies to predict COVID-19 mortality. Some studies have used only clinical features, while others have incorporated radiological features as well. The selection of ML algorithms was based on related studies in the field and the quality of the selected dataset. The most commonly used algorithms were SVM, MLP, RF, KNN, and kStar. The performance of the models was evaluated using metrics derived from the confusion matrix, such as AUC and MCC. Important predictors for COVID-19 patient mortality included lymphocyte count, CRP, age, mean lung density, lung tissue percentage, RBC, D-Dimer, and emphysema. The AUC of the models ranged from 0.74 to 0.96. Some studies also used deep learning techniques and EHR data to predict mortality or hospitalization in COVID-19 patients.
In most studies, only the ROC curve, which is a function of the accuracy of predictions, is reported, typically yielding good results. However, in the present study, the agreement of the 4 cells in the contingency table was calculated using MCC. This showed that, although the model may perform well in predicting patient improvement, it may not perform as well in predicting patient mortality, which is the primary concern. For example, in the Gong study, it was shown that all confusion matrix indices focus solely on false positives, while only the MCC Index takes into account both false positives and false negatives (
44).
5.1. Conclusions
The main limitations of our study include the possibility that our analysis may not fully account for confounding factors that could influence patient outcomes, such as variations in treatment protocols or differences in healthcare access across different populations. We suggest that simulation studies should be used to enhance understanding and create appropriate indices for machine learning methods, which can be selected based on the type of data. The three variables with the greatest impact on predicting mortality in COVID-19 patients were related to laboratory results, with age being the next most significant variable. Therefore, we recommend that, due to cost, HRCT should only be performed if risk factors are observed in laboratory results, and if necessary, HRCT should be performed promptly.