1. Background
Colorectal cancer (CRC) is the third prevalent cancer and the third leading cause of death worldwide (1). According to GLOBOCAN, CRC accounted 1361000 new cases and 694000 deaths in 2012 (2). It is also predicted that there will be a 66% increase in the burden of CRC; 2.2 million new cases, and 1.1 million deaths by 2030 (3).
Most of CRC cases occur in industrialized countries; however, its incidence rate is growing in less-developed regions due to adopting the Western lifestyle (4). The lowest and highest incidence rates are observed in Western Africa and Australia, respectively (2). Among Asian countries, Japan has the highest incidence rate particularly among men, but its mortality is lower than in Europe due to screening program since 1992. After Japan, Europe has the highest incidence and mortality rates (5). In Europe, Slovakia, Hungry and the Czech Republic represent the highest rate among men while Norway, Denmark, and the Netherlands show the highest rate among women (6, 7). In Iran, CRC is the fifth and the third most common cancer among men and women, respectively (8).
The main risk factors of this disease are excessive consumption of red meat, alcohol intake, sedentary lifestyle, tobacco smoking, overweight, fruit and vegetable-free diet, family history, and age over 50 (5, 9). Numerous studies have shown that smoking increase the risk of CRC up to 30% and the effect of hereditary is estimated for 7% - 10% (10-13). It has also been found that obese men and women are at a higher risk of colon and rectal cancer, than others (14). Unlike these factors, fruit and vegetable consumption play a protective role against CRC because they are rich in antioxidants, fiber, folic acid, and vitamins. Fiber is protective and leads to faster transit times to stool, therefore, it decreases the potential chance of carcinogens (15). In addition, it is estimated that 66% - 75% of cases were preventable by adopting a healthy lifestyle (16).
The first treatment of CRC depends on the tumors’ location, size, and also patients’ health (17). In cases of early diagnosis, surgery is selected as the primary treatment but not effective in metastatic cases (18). Since the 1990s, the 5-year survival rate of patients has improved due to detection of the disease in initial stages, successful treatment in stages II and III, and also a considerable reduction in mortality after surgery (5). The 5-year survival rate of CRC patients is 50% - 60% approximately which is higher in the initial stages (19, 20).
There are different statistical methods for analyzing survival data. ANN and traditional predictive tools are utilized in different studies to predict and determine related risk factors to patients’ survival. Wang et al. showed ANN performed well for prediction the survival of breast cancer patients (21). In a study carried out by Oermann et al. (22), the efficacy of ANN and logistic regression were compared for predicting 1-year survival of patients with brain metastasis, which the result indicated a better performance for ANN model. Furthermore, studies were conducted on patients with CRC and Gastric cancer that introduced ANN as a powerful tool for survival prediction in comparison of Cox regression model (23, 24). Numerous studies have been done in the field of CRC survival rate that they have differed in statistical methods and results.
2. Objectives
In this paper, we applied ANN and Cox regression models to determine related risk factors of survival in CRC patients.
3. Methods
In this historical cohort study, data of patients who were diagnosed with CRC in Omid Hospital of Mashhad were collected. A total of 157 subjects were investigated from 2006 to 2011 and were followed up until 2016. Demographic and clinical information of the patients were gathered using the patient’s medical records.
Patients’ information including gender, age at diagnosis, BMI, family history, tobacco smoking, opium or drug user, tumor stage (I, II, III and IV) (25), tumor grade (well-differentiated, moderately differentiated, and poorly differentiated), first treatment, and relapse were obtained. According to the date of the first diagnosis, the survival time for each patient was calculated in year and death from CRC was defined as an event; so those who survived considered as censored. The information of patients for their regularly checkups were available in their medical records. In some cases, we made phone calls to gather the survival status (death/censor) of patients who did not refer to the hospital for more than six months.
Kaplan-Meier and log-rank test were used for preliminary analyses. To fit the Cox regression model, the proportional hazard assumption tested by the log-minus-log plot. Then we utilized the backward conditional method with an inclusion criterion of 0.10 to enter and 0.15 to remove.
In ANN modeling, we divided data into two subsets randomly including a training (70%) and a testing subset (30%). To avoid complexity, only one hidden layer was applied, therefore we used a 3-layer MLP to fit ANN model with 11 nodes in the input layer, 5 to 15 nodes in the hidden layer, and 1 node in the output layer. The response defined as a binary variable of status; therefore, the logistic transfer function was applied to the output layer. Feedforward algorithm was used for training data with the decay of 0.1 to 0.5. For determining important risk factors, the significant of the ordered variable was calculated for the chosen ANN model. In addition, concordance index and the area under the curve were calculated to compare the power of prediction in ANN and Cox models. In this study, SPSS software version 20.0 and R software version 2.14.0 were utilized for statistical analysis and the significance level was 0.05.
The protocol was approved by the ethics committee of Mashhad University of Medical Sciences (code number: 941205).
4. Results
The study was consisted of 91 (58%) men and 66 (42%) women. The mean and standard deviation of age was 56.4 ± 14.6 years. According to independent sample t-test, there was a significant difference in mean age of diagnosis between men (60.1 ± 14.3) and women (55.2 ± 13.6) (P = 0.03). We followed survival status of patients for 10 years and it revealed that 55 (35%) patients died and 102 (65%) were censored. Table 1 shows characteristics of CRC patients based on investigated variables in different subgroups. The results show that most patients were diagnosed with CRC in stage II and III, and 73.2% of them were over 50 years old. The first choice of treatment for 97 (61.8%) cases was surgery and 24.2% of patients had a family history of cancer.
Variables | Values |
---|---|
Gender | |
Male | 91 (58) |
Female | 66 (42) |
Age | |
< 50 | 42 (26.8) |
≥ 50 | 115 (73.2) |
BMI | |
< 18.5 | 26 (16.6) |
18.5 - 25 | 90 (57.3) |
25 - 30 | 28 (17.8) |
> 30 | 13 (8.3) |
Tobacco smoking | |
Yes | 29 (18.5) |
No | 128 (81.5) |
Opium or drug user | |
Yes | 19 (12.1) |
No | 138 (87.9) |
Family history | |
Yes | 38 (24.2) |
No | 119 (75.8) |
Tumor location | |
Colon | 79 (50.3) |
Rectum | 78 (49.7) |
First treatment | |
Surgery | 97 (61.8) |
Radiotherapy | 60 (38.2) |
Tumor grade | |
WD | 93 (59.2) |
MD | 61 (38.9) |
PD | 3 (1.9) |
Tumor stage | |
I | 11 (7) |
II | 65 (41.4) |
III | 55 (35) |
IV | 26 (16.6) |
Relapse | |
Yes | 37 (23.6) |
No | 120 (76.4) |
Characteristic of CRC Patientsa
The mean ± SD of survival time was calculated 6.5 ± 4.3 years. The three-, five- and seven-year survival rates of patients were 0.67, 0.62, and 0.58, respectively. Furthermore, we calculated 5-year survival time in each stage that was 0.87 for patients with stage I, 0.75 for stage II, 0.59 for stage III, and 0.24 for stage IV. The lowest survival rate was observed in subjects with tumor stage IV while the patients who were diagnosed with CRC in stage I, had the highest survival rate.
To fit the ANN model, first, we divided data into training (70%) and testing (30%) subsets randomly. Based on the log-rank test, there was no significant difference between the estimated survival curve of training and testing data (P value = 0.482). In sum, 55 models were fitted (with the decay of 0.1 to 0.5 and 5 to 15 nodes in the hidden layer), and the best model was chosen based on the area under the ROC curve (AUC = 0.802) with 8 nodes in hidden layer and decay of 0.2.
After acknowledgment of proportional hazard assumption using log-minus-log plot, we fitted the Cox regression model with the backward conditional method. Table 2 shows the result of both models in determining the importance of independent variables. For this aim, normalized importance and probability value were utilized to identify the order of variables.
ANN Model | Cox Regression | ||
---|---|---|---|
Ordered Variable | Normalized Importance | Ordered Variable | P Value |
Tumor stage | 0.187 | Tumor stage | 0.001 |
First treatment | 0.138 | Gender | 0.016 |
Family history | 0.135 | Relapse | 0.017 |
Opium or drug user | 0.111 | Family history | 0.066 |
Gender | 0.110 | Opium or drug user | 0.128 |
BMI | 0.080 | Age | 0.201 |
Relapse | 0.068 | BMI | 0.212 |
Age | 0.057 | First treatment | 0.445 |
Tobacco smoking | 0.051 | Tobacco smoking | 0.452 |
Tumor grade | 0.042 | Tumor grade | 0.623 |
Tumor location | 0.021 | Tumor location | 0.767 |
Prognostic Factors of CRC Patients’ Survival in ANN and Cox Regression Models
In the 3-layer ANN model, factors such as tumor stage, first treatment, family history, opium or drug user, and gender played a major role in survival prediction. The results show tumor stage (P = 0.001), gender (P = 0.016), and relapse (P = 0.017) were statistically significant in the Cox regression.
In the next step, we utilized testing data to calculate the accuracy of prediction in both models. Table 3 illustrates observed cases of censor, death and the percent of true prediction in ANN and CPH models. The area under the ROC curve was 0.759 for ANN and 0.544 for Cox regression models. As regards, ANN was more powerful in recognition of true cases and superior to Cox regression. According to the classification table, the accuracy of ANN and Cox regression was 70.8% and 50.0%, respectively. This amount was greater in ANN that represents more correct classification.
5. Discussion
After cardiovascular disease and motor vehicle accidents, cancer is the third leading cause of death in Iran (26). CRC is one of the common gastrointestinal cancer that causes due to lifestyle and aging. Although the incidence of this disease is higher in Western countries, it is increasing in less developed countries as a result of changing their lifestyle (4). Regards to the incident rate of CRC in recent years in Iran and the necessity of carrying out more researchers in this field, this article is conducted to determine related factors to CRC patients’ survival using ANN and Cox regression models.
The results show the mean age of men and women were 60.1 and 55.2 years, respectively and 26.8 % were under 50 years old. In Iran, almost 20% of CRC cases occur under 40 years old while only 2% - 8% of patients are in this age group in the developed countries (27). Lifestyle changes and the young population in Iran are the reasons that make diagnosed patients younger in comparison with more developed countries.
The 5-year survival rate of CRC patients was estimated 0.48. Rasouli et al. (28) represented that 5-year survival was 0.33 in Kurdistan Province in Iran. In another research by Marely et al. (17) the 5-year survival was over 0.60 in the USA. Prevention methods such as screening and diagnosis at initial stages are the reasons for the high survival rate in the Western countries.
Stage of the tumor describes the extent of cancer in the body and it is one of the significant factors in deciding about the type of treatment (29). In this article, the tumor stage was significant in Cox regression and also was the most important item in survival prediction in the ANN model. In a study conducted by Gohari et al. (23) the pathologic stage was significant in rectum cancer and that was one of the important prognostic factors on patients’ survival in ANN model.
Relapse is particularly effective on the survival of patients with colorectal cancer (30). O’Connell et al. (31) showed that subjects with initial stage II had longer survival time versus stage III. In our study, relapse was significant in Cox regression but not located at the most important variables for survival prediction in the ANN model. This variable was also significant in a study conducted in Mashhad using Cox regression (32).
Gender and the variable opium or drug user were important in ANN model; gender was also statistically significant in Cox regression. Majek et al. (33) showed women had higher age-adjusted 5-year survival rate compared to men. Moreover, the effect of smoking and drug on the patients' survival was proved in other researches (23, 32, 34).
Family history was important only in survival prediction in the ANN model; this founding was acknowledged in a study by Jasperson et al. (35).
In the next step, we compared the ability of each model in survival prediction using the area under the ROC curve and accuracy criterion. These criterions were both higher in ANN that represents the power of model in predicting true cases. Plenty of studies highlighted ANN was superior to classical models.
Gohari et al. (23) reported ANN was a better approach for prediction and determining prognostic factors of colon and rectum cancers. Another study demonstrated neural network was more accurate and outperformed logistic regression in colon cancer patients (36). Ahmed (37) showed neural networks had better accuracy in classification and survival prediction of patients with colon cancer in comparison with other methods.
5.1. Conclusions
However, the Cox model estimates the association of variables in terms of HR but both models are comparable with regard to their accuracy in predicting as well as determining which variables are important in the model. This study supports the use of ANN model versus Cox regression in survival prediction of CRC patients. We can determine relevant prognostic factors by ordering the normalized importance variables. It would be also helpful to compare the result of ANN model with other survival analysis methods. In conclusion, ANN is more efficient and accurate, so it is recommended for predicting and determining risk factors for survival of CRC patients.