It is believed that the early prediction of possible death due to COVID-19 based on novel, objective, and scientific techniques can help tackle the cumbersome burden of the disease on healthcare systems by effectively triaging critically ill patients and optimally managing the scarce hospital resources (
15,
27). Therefore, the present study’s purpose was to retrospectively develop and validate two statistical and computational ML models based on the most relevant determinants of COVID-19 mortality. For this purpose, we applied two ML models (statistical: BLR and computational: ANN) to predict mortality among hospitalized confirmed COVID-19 patients based on the clinical data available in the registry database of Ilam University of Medical Sciences. The most important mortality predictors were determined using the Phi coefficient at P < 0.01, leading to the identification of sixteen final variables as the most important predictors.
Investigating the BLR model’s performance using the confusion matrix and log-likelihood demonstrated that in 5th step, the model’s performance was superior compared to step 1st, with TP = 133, TN = 252, FP = 55, FN = 43, and the average log-likelihood of -61.6. On the other hand, the best performance of the ANN model was obtained with the structure of 16-20-10-1 with the MSE of 0.037122, validated at the 22nd training iteration. At this point, the classification of a large number of samples delivered a near zero value, as observed in the error histogram, with TP = 281, TN = 167, FN = 9, and FP = 25 in the total confusion matrix. Comparison of the two selected algorithms regarding the confusion matrix (as a common performance criterion) demonstrated that the structure of 16-20-10-1 in ANN resulted in a better performance compared with the 5th step of BLR.
So far, several studies have evaluated the applicability of ML techniques in predicting mortality in patients with COVID-19. For instance, Karthikeyan et al. retrospectively studied the clinical data of 2779 confirmed or suspected COVID-19 patients to construct an intelligent prediction model via selected ML algorithms. Finally, the ANN model attained the best performance with an accuracy of 96% (
39). Furthermore, Das et al. conducted a retrospective analysis on chest X-ray data of 3299 COVID-19 subjects and showed that the ANN model, with an accuracy of 0.981% and AUC- ROC of 0.886, claimed the best predictive ability (
40). Also, Yadaw et al. assessed the performance of four ML algorithms, including LR, RF, SVM, and eXtreme Gradient Boosting (XGBoost), using a dataset of 3841 patients to predict COVID-19 mortality. In the above-mentioned model, XGBoost with an AUC of 91% and LR with AUC of 78% gained the best and worst performance among other models developed (
24). Gao (2020) conducted a retrospective analysis on the data of 2520 COVID-19 hospitalized patients. The results showed that the ANN model yielded the best performance with an AUC-ROC of 0.9760 in predicting COVID-19 patients’ physiological deterioration and death compared with other models developed by LR, support vector machine (SVM), and gradient boosted decision tree (
15). Vaid et al., (
26) also compared the efficiency of the logistic regression with L1 regularization (LASSO) and ANN-MLP models in predicting mortality among 4029 confirmed COVID-19 patients. Ultimately, the best performance was reported for the modified ANN-MLP model with the sensitivity of 90.7%, specificity of 91.4%, and AUC-ROC of 0.963 (
41). An et al. achieved the best COVID-19-related mortality predictive performance using the ANN technique with RMSE of 5.9451 and MAE of 4.6354 (
41). Similarly, we observed the best performance for the ANN model with PPV of 0.96%, NPV of 0.86, sensitivity of 0.94, specificity of 0.94, and accuracy of 0.93.
Studies have also reported some important clinical variables (predictors) for COVID-19 patients’ mortality through leveraging a feature selection analysis technique. It should be noted that the features selected are regarded as inputs for developing ML-based models for predicting mortality among COVID-19 patients. Apparently, the strongest predictive variables include age (
7,
19,
24,
40,
42-
45), ICU hospitalization (
15,
27,
40,
42), low oxygen saturation (decreased SPO
2) (
6,
7,
10,
19,
22,
27,
45), dyspnea (
7,
10,
40,
43), loss of taste/smell, hypertension (
19,
24,
27,
40,
44,
45), cardiovascular diseases (
24,
27,
40,
42,
44,
46), raised ALT and/or AST (
22,
24,
40,
46,
47), elevated LDH (
21,
22,
40,
43), and raised leukocyte/neutrophil count (
15,
22,
24,
40,
42,
44,
48). In the present study, the most important variables (COVID-19 mortality predictors) were identified based on correlation coefficients at the level of P < 0.01 (ie, feature selection). These variables included ICU hospitalization, activated partial thromboplastin time, length of hospitalization, pleural fluid, and absolute lymphocyte count. In general, these variables were relatively in line with the results of previous studies that have categorized and prioritized these parameters.
Interestingly, the selected computational ML algorithm (ie, the ANN model) could predict mortality in COVID-19 patients with acceptable performance, which, in turn, can help optimally use limited hospital resources for treating patients with more critical conditions and assist professionals to provide more qualitative care, reducing medical errors accordingly. The models proposed in this study may facilitate the early detection and effective management of COVID-19 patients, minimizing the death rate in them. Also, developing a valid anticipative model may enhance the quality of care and increase the survival rate of COVID-19 patients. Therefore, mortality prediction and risk analysis models can greatly contribute to identifying high-risk patients, followed by the adoption of the most effective and reliable support and treatment plans. Besides, quantitative, objective and evidence-based models for risk stratification, mortality prediction, and care plan development would efficiently obviate uncertainties and ambiguities. These models also offer a better strategy for clinicians to lessen disease complications and improve patient survival likelihoods.
Despite timely and accurate identification of high-risk cases, the present study faced some limitations that may have caused classification bias as follows. First, we dealt with a retrospective dataset that might lack any unfilled and imbalanced data fields. Second, this study was conducted at a single center and on merely 482 data, so the generalizability of the proposed model is subjected to certain limitations. Third, we used only two ML algorithms for clinical-based prediction analyses, and last but not least, the selected dataset lacked some important clinical variables such as radiological parameters. To sum up, the performance of our model will be enhanced if more classification techniques are tested using larger, multicenter, and prospective datasets in the future.
5.1. Conclusions
In this study, we developed and evaluated two ML-based prediction models for in-hospital mortality in COVID-19 patients using the most important clinical characteristics (16 predictors). It was observed that the ANN model performed best in terms of classification accuracy compared to the BLR algorithm. The proposed model can be suitably used to anticipate the mortality risk of hospitalized COVID-19 patients, allowing for the optimal allocation of restricted hospital resources. Also, this model could automatically identify high-risk patients as soon as they are admitted and hospitalized. To conclude, the ML algorithms coupled with qualitative and comprehensive hospital databases can be beneficial in the timely and accurate mortality risk stratification of COVID-19 patients.