In the current study, the predictive ability of PRISM III, PIM3, and PELOD-2 models were evaluated in the MICUs/SICUs. The mean scores of PRISM III, PIM3, and PELOD-2 were significantly higher in nonsurvivors compared with survivors (P < 0.001, P < 0.001, and P < 0.001, respectively). Also, analysis of AUCs showed that both PIM3 and PELOD-2 were good at discriminating survivors and nonsurvivors (AUC = 0.824, P < 0.001 and AUC = 0.803, P < 0.001, respectively), and discrimination power of PRISM III was moderate (AUC = 0.773, P = 0.001). The Hosmer-Lemeshow chi-square test showed a good calibration for PRISM III, PIM3, and PELOD-2 scores (χ2 = 4.73, P = 0.79; χ2 = 3.08, P = 0.93; and χ2 = 5.01, P = 0.66, respectively). It indicated the good performance of these three scoring models in the pediatric population. Also, the survivors were significantly older than nonsurvivors (P < 0.001).
The optimal cutoff point based on the Youden index was 11.5, 2.71, and 10.5 for PRISM III, PIM3, and PELOD-2, respectively. The optimal cutoff point for PRISM III was 5.5, 7, 7.5, and 8 in four studies, with a sensitivity range of 68% - 83% and a specificity range of 82% - 87.5% (
19-
22). The cutoff point reported in other studies were lower than those of the current study, but the sensitivities and specificities were similar to those of the present study. The optimal cutoff point for PIM3 and PELOD-2 was not reported in other studies.
Findings of several studies were consistent with those of the current study indicating that lower scores of PRISM III, PIM3, and PELOD-2 were significantly associated with a higher mortality rate (
23-
26).
Inconsistent with the current study findings, several studies noted the good performance of the three predictive models in terms of discrimination and calibration power. Goncalves et al. (
11) in a study compared PRISM III and PELOD-2 for the prediction of mortality in a PICU in a Portuguese population. A total of 556 patients consecutively admitted to PICU, with the mean age of 65 months (range: one month to 17 years), the male to female ratio of 1:5, and the median PICU LOS of three days were enrolled in their study. Both models had good discrimination; the AUC for the PRISM III was 0.92 and for the PELOD-2 was 0.94. The calibration power was good just for PRISM III (PRISM III: χ
2 = 3.820, P = 0.282; PELOD-2: χ
2 = 9.576, P = 0.022). Unlike the current study findings, they concluded that the PELOD-2 needs recalibration to be a more reliable predictor.
Straney et al. (
15) in an international, multicenter, prospective cohort study evaluated the predictive ability of PIM3 for mortality risk among children admitted to an ICU. Sixty ICUs admitting pediatrics in Australia, New Zealand, Ireland, and the United Kingdom were selected. A total of 53,112 children, under 18 years old, admitted to ICU from 2010 to 2011 were enrolled in their study. Children who transferred to another ICU were not included. The PIM3 model discriminated well (AUC, 0.88, 0.88 - 0.89); however, the performance of the model in Australia and New Zealand was superior to those of the United Kingdom and Ireland (AUC 0.91, 95% CI: 0.90 - 0.93 and 0.85, 95% CI: 0.84 - 0.86, respectively).
In the current study, the age was a predictor of mortality as survivors were significantly older than nonsurvivors (P < 0.001). Unlike to the current study findings, the result of the retrospective study by Campbell et al. (
23) on 83 children aged 1 to 18 years showed that the important predictors for mortality were the younger age at the time of injury, higher PRISM III score, and lower GCS score. The overall mortality rate in the present study was 17.8%, which was 3.9% and 28.7% in the studies by Wolfler et al. (
27) and Qureshi et al. respectively (
12).
By increasing the potential use of the scoring models through education, and standardization of assessments across different ICUs, and customizing an appropriate model, it is hoped that the role of predictive models is maintained in clinical practices and research in the future (
28,
29). There are several limitations to the current study that should be addressed in further research: the first was the substantial influence of sample size on model calibration. The second was different contexts (case mix), quality of care, and policies that might cause bias. Ethical considerations were considered in the study.
5.1. Conclusions
The performance of the three predictive models was good. In terms of discrimination power, the performance of PIM3 and PELOD-2 was slightly better than that of the PRISM III. Further recalibration of predictive models in different contexts, multicenter studies, on larger sample sizes would enable the generalizability of the most validated ICU scoring systems.