Given the heterogeneity of the clinical manifestations of COVID-19, it is critical to develop models for predicting the likelihood of ICU admission by ML techniques. This study created five ML-based models using the most relevant variables in determining the risk of ICU admission derived from a correlation coefficient analysis. The techniques used herein included ANN, DT, KNN, SVM, and RF, which were trained through the most significant predictors from 1225 laboratory-confirmed COVID-19 patients at the time of admission. Finally, based on our analysis of selected algorithms, we found that RF with a mean accuracy of 99.5%, a mean specificity of 99.7%, and a mean sensitivity of 99.4% have better performance than other ML algorithms in predicting the probability of ICU transfer after hospital admission.
In the ICU, the need for informed decision-making is critical, especially in crisis circumstances, such as the current COVID-19 pandemic, where the healthcare systems encountered an increasing surge of patients and severe shortage in hospital resources (
21,
22). The models developed in this study could be simply computerized as an alternative to manual and subjective clinical assessment methods. To correctly extract clinical predictors for estimating the potential need for ICU services, we evaluated clinical features at the time of admission and not at the progressive/severe course of the disease. In addition, the critical patients at admission time were discarded from the analysis. Thus, if validated, these features could be applied for predicting the likelihood to enter in ICU at the first hospitalization. For this purpose, Feature selection is a significant step to prepare the data before entering into the model (
23). Hence, we identified the most important variables (n = 11) through correlation coefficient at P-value < 0.05. The most significant predictors of ICU admission were older age, high creatinine, leukocytosis, increased BUN, elevated ASP/ALT, augmented LDH, dry cough, hypertension, cardiovascular disorders, diabetes, dyspnea, decreased SPO
2, pneumonia, and high C-reactive protein.
Many studies have focused on determining the key risk factors for ICU admission (
8,
9,
11,
13,
24,
25). The ten top clinical variables predicting ICU risk in reviewed studies encompassed age (older age), body temperature (high), oxygen saturation (decreased), neutrophil count and lymphocyte count (raised), C-reactive protein (elevated), D-dimer (increased), ALT and/or AST (augmented), LDH (elevated), loss of consciousness, and hypertension/cardiovascular diseases. In general, high compliance was observed between the results of reviewed studies and the most common variables in the current study.
In general, the developed ML algorithms in this study, similar to those reported in the previous studies (
26), have achieved optimum results with an accuracy range of 86.4% - 90.37%. In particular, the experimental findings showed that RF had the best performance compared to the other four ML techniques with the mean accuracy of 99.5%, mean specificity of 99.7%, mean sensitivity of 99.4%, Kappa metric of 95.7%, and RMSE of 0.015. According to the results of the previous studies, the ANN and RF techniques have the most remarkable performance in predicting COVID-19 outcomes, which is consistent with the present study.
As a screening instrument for the development of severe disease, model developed in our study has several opportunities for clinical use. These models decrease the existing uncertainty and ambiguity in COVID-19 clinical practice by presenting measurable, non-subjective, and evidence-based approaches (
12,
18). Accurate ICU admission prediction can support the sharing of limited hospital resources and improve the quality of care along with patient survival chance (
12). The timely identification of at-risk patients could diminish the need for imminent ICU beds and invasive mechanical ventilators. Moreover, using proposed model in present study can surge the tolls of timely ICU transfers, resulting in reduced mortality and shorter lengths of ICU stay. Designing a scientific and valid ML-based prediction model would assist in early detection and effective supportive intervention to improve patient outcomes, the quality of care, and ultimately a reduction in the mortality rate of COVID-19 patients. Ambiguity declines due to offering quantitative, objective, and evidence-based models for risk stratification, prediction, and care planning (
9,
10).
This study had several limitations. First, we retrospectively analyzed a dataset without control over data fields or incomplete data. Second, the dataset was extracted from a single hospital with a low sample size of 1225, making the results ungeneralizable. Third, this study only included 11 clinical features at admission to the hospital. It does not mean that these should be the only criteria for determining ICU admission. Longitudinal changes in these clinical features need to be investigated. Moreover, we only used five ML algorithms for prediction analyses. Finally, the selected dataset lacked some critical clinical variables, such as radiological indicators. In the future, the performance accuracy of our model and its generalizability will be enhanced if we test more ML techniques for larger, multicenter, and prospective datasets equipped with more qualitative and validated data.
5.1. Conclusions
We trained and validated different ML algorithms to predict the need for ICU transfer in COVID-19 hospitalized patients based on the data collected easily and routinely at the time of hospital admission. This study first identified the highly ranked clinical predictors that can predict the likelihood of ICU admission more precisely. Second, we developed and compared five ML-driven prediction models based on these selected predictors. It was observed that the RF model performed best on classification accuracy compared to the other ML algorithms. This method has the potential to provide frontline clinicians with an objective instrument to manage COVID-19 patients more efficiently in such time-sensitive, resource-demanding, stressful, and potentially resource-constrained situations. Finally, the results of comparing the performance of prediction models in this study were satisfactory to some extent, and we believe that further investigations are needed to validate our model for a larger, multi-central, and more qualitative dataset.