1. Background
Liver transplantation is a standard and high survival treatment in patients with advanced failure and irreversible liver (1-3). To determine the best time for liver transplantation as well as the independent prognostic factors for survival of the candidates are considered as the debatable subjects in this field (4). In Iran, the first liver transplantation was performed in Namazee Hospital, Shiraz, Iran, in 1993 (5). Nowadays, liver transplantation is commonly executed worldwide. The survival rate analysis of the patients with liver transplant may help clinicians to reach the best possible decision in pre and post-operative cares with neutrality (5-10). There were various methods to analyze the survival phenomena. Classic statistical methods need to consider some underlying assumptions on data structure. For instance, Cox Proportional Hazard (Cox PH) regression assumes that predictors have similar impacts on survival over time, and hazard rate is also non-dependent of the predictors (11). Usually, such mentioned assumptions do not consist the structure of data and this fact may limit the application of these methods. In recent years, soft computing methods help classical methods to model the unstructured huge data set (12-19). These methods do not have any theoretical assumptions on data structure (20). And since they model the relations among the data by learning their patterns they work better on larger data set. For instance, artificial neural network (ANN) is a method similar to a box that consists of layers and small units in each layer are called neuron. Neurons are combined in each layer by defining the initial weights. Updating the weights among neurons of a layer and the connections between two adjacent layers are the topics which will be dealt with in learning process. Indeed, the network learns the relations between input and output data (20). Modeling the survival of clinical phenomena using ANN method has been applied in previous researches for different diseases (21-28) and also for liver disease (6-10, 29, 30). However, the definition of inputs and outputs, the structure of the networks, their learning algorithms and the population study are different in liver disease studies. Furthermore, the comparison among survival modeling methods of liver transplant data has been studied less in previous researches (5-7, 10, 29, 30).
2. Objectives
The current study aimed to model the survival of patients over two years old after liver transplantation surgery using Cox PH regression and ANN models. The order of important factors on survival after transplantation was derived for both models. Furthermore, one to five survival probabilities of the patients were estimated by Kaplan-Meier method. To compare the performance of the models, the area under the ROC curve, accuracy rate, sensitivity and specificity values, positive and negative predictive values and the Youden index J (31) were considered as the index.
3. Materials and Methods
A census based on a historical cohort study was done to collect the clinical findings of 1361 patients over two years old who underwent liver transplant surgery from March 2008 to March 2013 at Shiraz Namazee Hospital Organ Transplantation Center, Shiraz, Southern Iran. Almost the majority of patients over the country are referred to this center for liver transplantation. Patients were excluded if transplanted more than once, survived less than one-day or missing some important data. Accordingly, the total sample size decreased to 1168 subjects. Thirty-seven features including recipient age, recipient weight, lung complication after transplantation, comorbidity disease, exploration after transplantation, Cytomegalovirus (CMV) infection, diabetes after transplantation, duration of hospital stay, vascular complication after transplantation, pack cell (PC), etc. were considered as the input features (variables) for each patient.
3.1. Statistical Analysis
The current study mainly aimed to determine the effective factors on patients’ survival after liver transplantation by two different methods. First, probability of survival was calculated regardless of the input variables using Kaplan-Meier method. This is a well-known non-parametric method to estimate the survival rate after treatment. However, it is affected by censoring data (32). The Kaplan-Meier method, also named product limit estimate, involves computing the occurrence probabilities of event at a certain point of time (33). The survival probability at any particular time was computed as follows (Equation 1):
Accordingly, total probability of survival from the start time until the specified time interval was calculated by multiplying all the probabilities of survival at all-time intervals proceeding that time (33).
Second, to model the survival data ANN models were utilized. A supervised feed-forward neural network was applied with back-propagation (BP) training algorithm. The learning steps in this algorithm were as follows (20):
Inputs were entered into the system and continued through the network layers with forward method until the output layer was reached. Then the output was predicted by considering the initial values for the parameters (weights and biases).
The network errors were defined as the difference between the predicted output and the target output.
Then it went back and tried to decrease the errors by adjustment of the weights. Therefore, the mean square deviation between the predicted and target outputs was minimized in this method.
The steps were repeated reciprocally until the errors between the predicted and the actual outputs were minimized.
To model the process, the data were randomly divided into two sets: 80% (934) for training and 20% (234) for test sets. Then the neural networks with different architectures with respect to the number of hidden layer neurons (5 to 20), momentum (0.8 to 0.95) and the rate of learning (0.01 to 0.4) were fitted. Finally, the best ANN was selected according to the maximum accuracy and specificity. It was a three-layer ANN and the first layer consisted of thirty-eight nodes, each of them represented one input variable. Time was also entered as an input variable in the ANN. The second layer or hidden layer included 5 - 20 nodes with the sigmoid function as its activation function. The output of ANN was a binary variable (died due to complications from liver transplantation or censored). Therefore, one output node and sigmoid activation function was considered for the third layer.
Third, Cox PH regression model was fitted to the test data for comparison. Cox PH model is a semi-parametric model used to analyze the data in which the predictors are not dependent on the time and also the hazard ratio is constant during the time (11). It models the hazard function based on the predictors through the following formula (Equation 2):
In which h0 (t) is the basic hazard function and Xi, s and βi are the predictors and models parameters, respectively.
Finally, the prediction power of these two models were compared according to the area under the ROC curve, the accuracy rate based on the cut points of Youden index, sensitivity, specificity, positive and negative predictive values and the Youden index J (31). For data analysis, the Stata, SPSS, MedCalc and MATLAB software were employed.
4. Results
In the current study, 1168 patients with a Mean ± SD age of 32.4 ± 17.6 years including 734 (62.8%) males and 434 (37.2%) females participated. Patients were followed up for a period of five years after the liver transplantation in order to observe the event died due to complications from liver transplantation. Among them, 129 (11%) patients died due to complications from liver transplantation and 1039 (89%) patients were alive or missed in the follow up (censored data). Tables 1 and 2 describe the patients regarding their qualitative and quantitative features. The mean survival time for these patients was 52 months and 18 days with a standard deviation of 18 days. One to five years estimated survival probability of the patients were 91%, 89%, 85%, 84% and 83%, respectively by Kaplan-Meier method (Table 3).
Variable Name | No. (%) |
---|---|
Recipient gender | |
Male | 734 (62.80) |
Female | 434 (37.20) |
Recipient diagnosis | |
Metabolic | 180 (15.40) |
Cholestatic | 233 (19.90) |
Hepatitis | 455 (39.00) |
Tumors | 16 (1.40) |
Cryptogenic | 71 (6.10) |
Other causes | 213 (18.2) |
Comorbidity disease b | |
No | 1011 (86.6) |
Yes | 157 (13.4) |
MELD and PELD score | |
< 20 | 619 (53.00) |
> 20 | 549 (47.00) |
CHILDclass | |
A | 130 (11.1) |
B | 506 (43.3) |
C | 532 (45.5) |
Previousabdominalsurgery | |
No | 1022 (87.5) |
Yes | 146 (12.5) |
Renalfailurebeforetransplantation | |
No | 1119 (95.8) |
Yes | 49 (4.2) |
Typeliver | |
Whole organ | 980 (83.9) |
Split | 63 (5.4) |
Living | 125 (10.7) |
Primarynonfunction | |
No | 1156 (99.00) |
Yes | 12 (1.00) |
CMinfectionaftertransplantation | |
No | 1099 (94.10) |
Yes | 69 (5.90) |
Diabetesaftertransplantation | |
No | 918 (78.60) |
Yes | 250 (21.4) |
Vascularcomplicationaftertransplantationc | |
No | 1104 (94.50) |
Yes | 64 (5.50) |
Renalfailureaftertransplantation | |
No | 1065 (91.20) |
Yes | 103 (8.80) |
Lung complication after transplantation d | |
No | 1127 (96.50) |
Yes | 41 (3.50) |
Bile duct complication after transplantation | |
No | 1099 (94.10) |
Yes | 69 (5.90) |
Exploration after transplantation | |
No | 958 (82.00) |
Yes | 210 (18.00) |
Acute rejection | |
No | 693 (59.30) |
Yes | 475 (40.70) |
Chronic rejection | |
No | 1132 (96.9) |
Yes | 36 (3.10) |
Relapse e | |
No | 1140 (97.6) |
Yes | 28 (2.40) |
PTLD | |
No | 1151 (98.50) |
Yes | 17 (1.5) |
Donor gender | |
Male | 786 (67.30) |
Female | 382 (32.70) |
Donor cause of death | |
Living | 127 (10.90) |
Trauma | 651 (55.70) |
CVA | 292 (25.00) |
Other | 98 (8.40) |
Description of Qualitative Input Features for All Patients in the Study a
Variable Name | Values a |
---|---|
Recipient age, yr | 32.39 ± 17.59 |
Recipient weight, kg | 59.12 ± 22.30 |
CHILD Score | 9.16 ± 2.20 |
Waiting list time, d | 162.27 ± 222.61 |
Creatinine, mg/dL | 0.86 ± 0.58 |
INR, IU b | 2.03 ± 1.28 |
Total bilirubin, mg/dL | 8.27 ± 10.27 |
Cold ischemia time, h | 6.69 ± 3.50 |
Warm ischemia time, h | 0.81 ± 0.22 |
Pack cell, bag | 2.40 ± 2.89 |
Fresh frozen plasma, bag | 3.10 ± 3.97 |
Total bleeding, mL | 1700 ± 1679.38 |
Duration of operation, h | 6.01 ± 1.27 |
Duration of hospital stay, d | 13.81 ± 7.86 |
Donor age, yr | 30.76 ± 14.19 |
Description of Quantitative Input features for All Patients in the Study
Follow Up Period, yr | Total Data (n = 1168) a |
---|---|
First | 0.909 ± 0.009 |
Second | 0.888 ± 0.010 |
Third | 0.854 ± 0.014 |
Fourth | 0.843 ± 0.016 |
Fifth | 0.833 ± 0.018 |
One to Five Years Survival Probabilities of the Patients With Transplant
Firstly, the data were randomly divided into two parts called training and testing subsets (934 and 234 patients, respectively). The result of log-rank test revealed no significant difference between the survival curves for training and testing subsets (P value = 0.411). Secondly, ANN was trained and evaluated using these two data subsets. Learning procedure including training and testing were conducted for 576 different ANN architectures and the most appropriate ANN was selected based on the area under the ROC = 0.864 (P < 0.0001) for the selected model (Table 4). Based on the Youden criterion cut point, the prediction rate accuracy of this method, was 90.2% (Table 4). The effective factors on survival after liver transplantation were selected based on absolute values of the mean weights for each variable. Tables 5 and 6 show the order of variables in the selected ANN model (sixteen variables). At the end, the assumptions of Cox PH model were checked and fitted to the test data. The best model was selected using the backward conditional method. The area under the ROC for this model was 0.806, which was statistically significant (P < 0.0001) (Table 4). In addition, based on the cut point of Youden criterion, the prediction rate accuracy of Cox PH model was 90.2% (Table 4). Other criteria such as sensitivity, specificity, negative and positive predictive values and the Youden index J were also calculated to offer more opportunities to compare models (Table 4). Some of these criteria were in favor of ANN and some others in favor of Cox PH model. Furthermore, based on the results of Cox PH model on all 1168 patients, 16 variables out of 37 were effective to predict survival in liver transplantation. Indeed, the coefficients of these variables were significant at 0.1 level. Tables 5 and 6 show the order of variables in Cox PH model compared with those of the ANN model.
Comparison Criteria | Artificial Neural Network | Cox Proportional Hazards |
---|---|---|
AUC (Stand. Error) | 0.864 (0.043) b | 0.806 (0.067) b |
Youden index J | 0.588 | 0.576 |
Sensitivity c | 78.3 | 65.2 |
Specificity c | 80.6 | 92.4 |
Positive predictive value c | 30.5 | 48.4 |
Negative predictive value c | 97.1 | 96.1 |
True prediction d | ||
Survival | 170 (97.1) | 195 (96.1) |
Dead | 41 (69.5) | 16 (51.6) |
Total | 211 (90.2) | 211 (90.2) |
Comparison Criteria for Both Models on Testing Set a
Variables | Cox Regression Model | ||
---|---|---|---|
Beta | SE | P Value | |
Type liver b | < 0.001 | ||
Whole organ | -3.563 | 1.141 | 0.002 |
Split | -2.112 | 1.095 | 0.054 |
Living | |||
PNF b | 2.911 | 0.439 | < 0.001 |
Chronic rejection b | 1.822 | 0.299 | < 0.001 |
Renal failure after transplantation b | 1.602 | 0.232 | < 0.001 |
Lung complication after transplantation b | 1.599 | 0.239 | < 0.001 |
Exploration after transplantation b | 0.903 | 0.222 | < 0.001 |
Acute rejection b | -0.646 | 0.204 | 0.002 |
Diabetes after transplantation b | -0.797 | 0.283 | 0.005 |
Duration of hospital stay b | -0.033 | 0.012 | 0.006 |
Vascular complication after transplantation b | 0.698 | 0.283 | 0.013 |
PC b | 0.698 | 0.283 | 0.013 |
MELD and PELD score b | 0.414 | 0.197 | 0.035 |
Donor cause of death b | 0.036 | ||
Trauma | 2.390 | 1.095 | 0.029 |
CVA | 2.666 | 1.106 | 0.016 |
Other | 2.726 | 1.166 | 0.019 |
Living | |||
Recipient age b | 0.019 | 0.009 | 0.046 |
Donor age c | 0.015 | 0.008 | 0.053 |
Recipient weight c | -0.012 | 0.008 | 0.101 |
The Order of Important Variables Based on Cox Proportionsl Hazardregression a
Variables | ANN Model |
---|---|
Mean Weights | |
Recipient age | -0.214 |
Relapse | -0.171 |
CIT | 0.154 |
Previous abdominal surgery before transplantation | 0.152 |
FFP | 0.144 |
Lung complication after transplantation | -0.133 |
Comorbidity disease | -0.122 |
Exploration after transplantation | 0.112 |
CMV infection | 0.107 |
Renal failure after transplantation | -0.105 |
Acute rejection | 0.103 |
CHILD score | -0.103 |
Duration of operation | 0.096 |
Recipient diagnosis | -0.095 |
Chronic rejection | -0.094 |
PNF | -0.093 |
The Order of Important Variables Based on ANN Models a
5. Discussion
Liver transplantation is the only treatment for patients with liver failure (5). Without liver transplantation the patients do not have any chance for prolonged survival. During the last two decades the five-year survival of liver transplant patients increased (34). Liver transplant program was established since 20 years ago in Shiraz Namazee Hospital and it is well developed; therefore, more than six hundred liver transplants are done annually. With the increasing number of patients with liver transplant in Iran, the follow up is very important. Indeed, the important features which affect the survival of patients after the surgery is valuable for pre and post-operative cares. But so far, few rigorous studies are conducted on survival in patients with liver transplant (5-10, 29, 30).
Since this phenomena is affected by the regional status, diet or cultural traditions in life, conducting a comprehensive study which encompasses all age groups is needed for the Iranian population. However, some previous studies were conducted on survival of patients with liver transplantation in Iran which utilized classical methods for analysis (5, 10) with the limited age range.
The current study aimed to model the survival of patients with liver transplant in a wide age range (two years old and above) using ANN and Cox PH regression models in order to compare the performance of these two methods to predict death due to complications of liver transplantation. Accordingly, some variables that influenced survival of the patients with liver transplant were selected based on a few studies conducted on survival of such patients and also the experience of more than two thousand liver transplant surgeries in Shiraz transplant center.
The results of the current study revealed that ANN was better than the Cox PH regression model to predict survival in patients with liver transplant based on the area under the ROC curve (Table 4). However, both of them were large enough to be statistically significant (P < 0.0001). In addition, the prediction rate accuracy was similar in both models (Table 4). Furthermore, the Youden index J, sensitivity and negative predictive values were in favor of ANN while specificity and positive predictive values were higher in Cox PH model. However, the significance of the input variables order should be considered, clinically (Tables 5 and 6). Based on the clinical experience, the order of variables in Cox PH model may be more consistent with clinical findings. Although, the recipient age can be an important variable but Primary non function (PNF), lung, kidney and vascular complications have more important effect on the patients’ survival. According to the ANN results in Tables 5 and 6, some variables such as Cold ischemia time (CIT), previous abdominal surgery or transfusion of fresh frozen plasma (FFP), are at the top of the list while clinically they do not seem to be as important as PNF or vascular complications. However, many studies compared these methods for survival analysis in various diseases worldwide (21, 27). All these studies mentioned the superiority of ANN over Cox PH regression in real clinical datasets. In addition, the comparison between these two methods on a simulated dataset confirmed the high ability of ANN method in modeling complex relations compared to that of Cox PH regression model especially for a dataset with high censorship (35).
Generally, comparing these two models, Cox PH model needs to fulfill some theoretical assumptions on data structure. In addition, it uses a subset of variables in the final model (the significant ones). Therefore, its results are easy to interpret and the odds ratio and related confidence intervals can be calculated. In comparison, ANN requires a large data set to learn the relations. In addition, it uses all input variables in modeling process and the absolute value of their weights indicates the importance. Therefore, it cannot distinguish the confounding variables (inconsequential ones) but is a powerful tool to find the complex patterns among the inputs without any assumptions for data structure.
The strength of this study was the possibility of comparison between the two methods in survival analysis of liver transplantation data. In addition, investigating the role of many different factors in survival of patients with liver transplantation, simultaneously among a wide age range and in a large data set, was another advantage of the present research. However, the current study had a potential limitation. Incomplete registration information in the hospital records of patients was problematic. Furthermore, some patients were not available to record their final status (dead/alive). This fact led to lose some subjects. Therefore, the improvement of hospital registration information may be necessary. Moreover, examining other learning algorithms for ANN method or utilizing the hybrid methods between ANN and genetic algorithm are suggested for future studies.