AKI is a serious complication of cardiac surgery that can occur at a rate of 1 to 30%. Of these, AKI, which requires kidney replacement therapy, has an incidence of about 1 to 5% (
22). Perioperative AKI is independently associated with an increase in short-term morbidity, treatment costs, and long-term mortality (
23). In cardiac surgery patients, postoperative AKI is associated with an increase in ICU admission and the length of hospital stay. Also, the development of kidney disease is accompanied by high rates of gastrointestinal bleeding, respiratory infection, and sepsis. In patients undergoing CABG on a cardiopulmonary bypass, the incidence of renal failure is between 1 to 15%, with a mortality rate of 19%. The incidence of AKI cases requiring dialysis after CABG is about 2%, with a 23 to 88% mortality rate (
24).
Kidney dysfunction in cardiac surgery patients is usually multifactorial. The most common cause is acute tubular necrosis which results from hypoxic damage to nephrons in the medullary region of the kidney due to hypotension, hypovolemia, or dehydration. Other common risk factors include preoperative renal disease with an elevated level of creatinine, type 1 diabetes mellitus, over 65 years of age, major vascular surgery, more than 3 hours of cardiopulmonary bypass, recent exposure to nephrotoxic agents such as dyes radiocontrast, bile pigments, aminoglycoside antibiotics and nonsteroidal anti-inflammatory drugs (NSAIDs) (
24).
Early detection of patients at high risk for AKI after cardiac surgery using risk scores can enable the anesthesiologist to apply early protective and therapeutic strategies to reduce AKI risk. Numerous risk scores have been developed to predict AKI, but there is still no guideline to recommend a predictive model (
23). This study attempted to use ML techniques in predicting AKI after cardiac surgery. In this regard, ML methods were applied to this prediction. Evaluation of these methods was performed for two labels related to the first and seventh days after surgery, and the AUC of each method is reported in
Tables 3 and
4. Based on the results, the best ML methods for classifying data are RF and XGBoost, with an AUC of around 0.8. RF and XGBoost are ensemble tree-based methods that usually show high efficiency in classification problems. Multiple imputations as a method of handling missing values have had a more significant impact on the output of the ML methods. However, there is not much effect on the RF and XGBoost results because of the ability of these methods to cope with the missing values. Also, using the combination of SMOTE and class weight methods for data oversampling gives the best results. In a study by Lee et al. (
10), a similar attempt was made to evaluate machine learning methods to predict AKI for 2010 patients. In this study, the XGBoost method showed the highest performance in prediction.
Examining the interpretability of machine learning models is essential to ensure they work. In medical applications such as this study, the reliability of the model output is more critical than in other applications. What follows in the interpretability of models is how each of the features is involved in the prediction. The interpretability of models can be described in general and local terms. In general, we are looking to interpret the model based on the average of all the samples in the dataset. We have examined this in
Figure 2 for both XGBoost and RF models. Based on this analysis, it can be generally said that the Cr (creatinine), CPB time (cardiopulmonary bypass time), BS (blood sugar), and Alb (albumin) features have the most significant impact on the predictions, respectively. However, the interpretability of a model in the local term examines how each feature affects a given sample. Therefore, to investigate the interpretability in the local term for black-box models such as XGBoost and RF, which have shown the best performance in the prediction, LIME (
20) and Shapley (
21) methods were used. Examining the results of these methods shows that for a particular patient predicted as a case with the risk of postoperative AKI, what features played a crucial role in this prediction?
Interpretation by the LIME method for a patient with postoperative AKI risk prediction shows that the Cr (creatinine) feature has the most significant positive effect on this prediction (see
Figure 3A).
Figure 4 compares the predicted values of the LIME local model and the main machine learning models for this patient. To trust the LIME interpretability, the predicted values for each primary model must be close to the corresponding values predicted by the LIME local model. In this plot, these predicted values for RF and XGBoost models are very close to each other, so it can be said that the interpretation obtained from the LIME method is reliable for this patient.
The Shapley method can also be used to interpret the machine learning models. Like LIME, this method examines models' interpretability based on individual samples. As shown in
Figure 3B, the dominant feature with a positive role in prediction using RF and XGBoost is Cr (creatinine) for the same patient. In the force plot of
Figure 5, the base value is 0.25. This value indicates the mean prediction of the test data. Features that force the prediction to move positively are displayed in red, and those that seek to predict negatively are shown in blue. Thus, the Cr (creatinine) feature largely makes the prediction positive.
Hence, the treatment team can first predict AKI incidence after cardiac surgery using patient information and then evaluate the prediction's outcome based on the model's interpretability for that patient. According to the importance of the determinant features, the treatment team can decide on the validity of the prediction.
One of the contributions of this study is the use of information from three different academic centers, which will help increase the validity of the results. Simultaneous use of retrospective and prospective data also improved the quality of existing data to provide high-quality and quantitative information suitable for machine learning models. Furthermore, most importantly, the use of interpretable machine learning methods makes it possible to assess the reliability of the methods appropriately. Limitations in the process of this study include inconsistent patient reports that increase the number of missing values. Also, the low incidence of AKI in stages 2 and 3 postoperatively among our patients led us to predict AKI regardless of its staging.
5.1. Conclusions
It can be concluded that using machine learning methods such as RF and XGBoost can predict AKI after cardiac surgery with promising efficiency. Interpretability of models can also help the treatment team ensure the validity of predictions. A reliable prediction of AKI incidence in patients can help the treatment team develop treatment strategies to prevent postoperative AKI. Preventing AKI can reduce treatment costs, length of hospital stay, and risk of death. In future work, we will optimize the parameters during surgery to reduce the risk of AKI in patients. In other words, we want to determine the anesthesia parameters during the surgery in such a way as to reduce the risk of AKI for the patient.