1. Background
Endometriosis refers to the presence of endometrial tissue outside the pelvis. The etiologies raised include retrograde menstruation (i.e. the return of menstrual blood into the abdominal cavity), coelomic metaplasia, inflammation, and hormonal dysfunction (1). Endometriosis has many symptoms, including dysmenorrhea, dyspareunia, menorrhagia, chronic pelvic pain, and urinary and gastrointestinal symptoms (2). Pathology provides the definitive diagnosis along with the observation of the tissue containing endometrial glandular and stromal cells in areas outside the uterine cavity. It is a common disabling costly disease in women, affecting approximately 176 million women worldwide with a prevalence of 5 to 20% in women, which reaches 50% in infertile women. The prevalence widely varies among different studies, as far as it depends on various factors (3). It imposes a great financial burden of around 110 billion dollars a year and includes a large section of women’s hospitalization and staying in bed (4, 5).
There are no accurate and documented results for the major causes of this disease, but according to research, genetic factors play an important role in the development of endometriosis, as individuals with the disease in their first or second-degree family members are at higher risk of developing endometriosis (6).
Various studies have investigated the factors affecting endometriosis in Iranian infertile women. The results of these studies (7, 8), using logistic regression, showed an association between endometriosis and socio-economic characteristics, fertility, menstruation, lifestyle, etc., but the results are somewhat inconsistent due to the wide variety of patients and control groups in the studies (5) and studies assessing the predictive factors have used logistic regression in their prediction model.
In the logistic regression model, the odds ratio is a very important indicator used for modeling and the researcher is looking for predicting the occurrence or non-occurrence of a phenomenon or an event. In this case, the dependent variable is a binary variable. Logistic regression is used to model such dependent variables (9).
There are several other kinds of methods to predict the risk factors of the response variable. Among them, Artificial Neural Network (ANN) is a popular method commonly used in studies. The response variable in ANN is binary, for example, in the assessment of the risk of cardiac complications after coronary artery bypass surgery (10), predicting prostate cancer (11), and predicting the chance of pregnancy. In these studies, the researchers have proposed artificial neural network as an efficient method to predict factors affecting the disease.
ANN is accepted to model complex real-world problems. The advantages of ANN include ignoring the default data and fitting statistical model of the data through within-data information. In fact, the type of association is discovered by the use of each information in the data (12).
Most studies used logistic regression to predict endometriosis while there is no study in this regard in Iran and other countries using artificial neural networks. Therefore, given the importance of the disease and limited studies in this field, we decided to consider these factors in our study. In this study, the ANN was used to predict the occurrence of endometriosis and its potential power was compared with the logistic regression model. It is hoped that the results of this study are effective in plans to prevent and control the disease.
2. Methods
The present study is a case-control study conducted at Rasoul-e-Akram hospital, affiliated to Tehran University of Medical Sciences, Tehran, Iran. To determine the sample size, odds ratios (OR) for variables reported between 1.5 and 2.8 in different studies were used (13). Regarding the sample size required in case-control studies (14), the sample size was determined between 68 and 250; thus, the maximum sample size was considered and to conduct the case-control study, 500 subjects were selected (250 cases and 250 controls).
The study population included patients with endometriosis diagnosed from 2007 to 2015 by laparoscopy (the case group) and patients without endometriosis confirmed by laparoscopic examination who were diagnosed with other diseases, such as simple and dermoid cysts (the control group). Each control subject was selected simultaneously with the case subject; if a person was not reachable or not available, the next person would be selected. In this study, a checklist was used to collect data based on patient records, which included demographic variables (age, weight, height, marital status, place of birth, address, telephone number, and case number) and main variables such as irregular menstruation, number of pregnancies, number of births, abortion, the family history of endometriosis, pelvic infection, taking oral contraceptives (OCs), dysmenorrhea, dyspareunia, infertility, premenstrual spotting, and dyschezia.
After entering data into SPSS software (version 22), followed by data processing and risk factor selection, tables for frequency distribution and descriptive indicators were prepared. Then, the association between the independent variables including irregular menstruation, menstrual cycle duration, amount of menstrual bleeding, number of pregnancies, number of live births, abortion, the family history of endometriosis, pelvic infection, the use of OCs, dysmenorrhea, dyspareunia, infertility, premenstrual spotting, dyschezia, BMI, age, and marital status were analyzed with the response variable (endometriosis) by the single logistic regression; variables with a significance level of less than 0.2 were selected to enter the multiple logistic regression. The multiple logistic regression was conducted with the backward approach. The significant variables in the multiple logistic regression were used to compute the individual endometriosis risk as the dependent variable.
To be fitted to the artificial neural network, data were randomly divided into training and testing groups. Data for the training network included 70% of the study data while 30% of the study data were used for neural network testing and predicting the final network topology. The R-Package Neuralnet (version 3.2.1) was used for the analysis.
Finally, using the artificial neural network and logistic regression, the factors affecting endometriosis were evaluated and the efficacy of the two methods was compared. To assess the accuracy of the prediction of the methods and determine the preciseness of their prediction, the area under the ROC curves (AUC) was used.
The upper point in the left corner of the ROC curve and above the graph indicated the point with 100% sensitivity and specificity. Therefore, the closest point on the curve to the point on the left corner and above the graph had the highest sensitivity and specificity. To determine the closest point, the distance between each point and the upper point in the left corner should be measured and the closest point should be chosen as the best-fit point. To measure the distance between each point and the upper point in the left corner, the following equation is used:

The point with the closest distance is selected as the best-fit point.
The under area is a number between zero and one. The closer this number is to one, the higher the model’s prediction power will be (2).
3. Results
The mean and standard deviation (SD) of the age of case and control groups were 34.84 ± 0.62 and 33.75 ± 0.55, respectively, which showed a statistically significant difference (P value = 0.04). The mean and SD of BMI in case and control groups were 24.79 ± 0.62 and 23.19 ± 0.45 kg/m2, respectively, which had no significant difference (P value = 0.4).
Table 1 shows the association between endometriosis and the independent variables of the study. Variables including age, irregular menstruation, menstrual cycle duration, duration of bleeding, number of pregnancies, number of live births, and premenstrual spotting showed a significant correlation with the response variable at an error level of 0.2%. These variables were candidates to enter multiple logistic regression analysis; then using backward methods, independent variables affecting the response variables were fitted to the final model. Single logistic regression analysis is shown in Table 1 and multiple logistic regression in Table 2.
Variable | Case | Control | Total | Estimate of Coefficient | |||
---|---|---|---|---|---|---|---|
β | SD (β) | OR | P Value | ||||
Age, yb | 34.8 ± 0.62 | 33.7 ± 0.55 | 33.3 ± 8.75 | 0.041 | 0.011 | 1.042 | < 0.001 |
BMI, kg/m2b | 24.8 ± 0.62 | 23.2 ± 0.45 | 23.7 ± 3.88 | 0.107 | 0.054 | 1.11 | 0.047 |
Marital status | |||||||
Single | 42 (19.4) | 52 (24.1) | 94 (21.8) | - | - | - | - |
Married | 174 (80.6) | 164 (75.9) | 338 (78.2) | 0.277 | 0.236 | 1.319 | 0.241 |
Regular menstruationb | |||||||
Positive | 96 (44.4) | 73 (33.8) | 169 (39.1) | - | - | - | - |
Negative | 120 (55.6) | 143 (66.2) | 263 (60.9) | 0.449 | 0.199 | 1.567 | 0.024 |
Duration of menstrual cyclesb | |||||||
≥ 27 | 69 (31.9) | 42 (21.8) | 116 (26.9) | .598 | 0.349 | 1.818 | 0.087 |
28 | 126 (58.3) | 143 (66.2) | 269 (62.3) | 0.087 | 0.318 | 1.091 | 0.784 |
≤ 29 | 21 (9.7) | 26 (12) | 47 (10.9) | - | - | - | - |
Duration of menstrual bleedingb | |||||||
≥ 4 | 16 (7.6) | 24 (11.1) | 40 (9.3) | - | - | - | - |
5 | 126 (58.3) | 143 (66.2) | 269 (62.3) | 0.279 | 0.345 | 1.322 | 0.419 |
≤ 6 | 74 (34.3) | 49 (22.7) | 23 (28.5) | 0.818 | 0.372 | 2.265 | 0.028 |
Dysmenorrheab | |||||||
Positive | 68 (31.5) | 44 (20.4) | 112 (25.6) | 0.584 | 0.224 | 1.796 | 0.028 |
Negative | 148 (68.5) | 172 (79.6) | 320 (74.1) | - | - | - | - |
Duration of pain | |||||||
1 - 2 days | 2 (2.9) | 3 (6.8) | 5 (4.5) | - | - | - | - |
> 3 | 66 (97.1) | 41 (93.2) | 107 (95.5) | 0.882 | 0.934 | 2.415 | 0.345 |
Number of pregnanciesb | |||||||
0 | 48 (22.2) | 37 (17.1) | 85 (19.7) | - | - | - | - |
1 | 31 (14.4) | 59 (27.3) | 90 (20.8) | 0.904 | 0.312 | 0.405 | 0.004 |
2 | 32 (14.8) | 35 (16.2) | 67 (15.5) | 0.350 | 0.328 | 0.705 | 0.286 |
≤ 3 | 65 (30.1) | 34 (15.7) | 99 (22.9) | 0.388 | 0.304 | 1.474 | 0.203 |
Number of live birthsb | |||||||
0 | 54 (25) | 49 (22.7) | 103 (23.8) | - | - | - | - |
1 | 34 (15.3) | 60 (27.8) | 93 (21.5) | 0.355 | 0.303 | 0.701 | 0.244 |
2 | 30 (13.9) | 36 (16.7) | 66 (15.3) | 0.061 | 0.325 | 1.065 | 0.852 |
≤ 3 | 59 (27.3) | 20 (9.3) | 79 (18.3) | 1.325 | 0.334 | 3.761 | < 0.001 |
Number of abortionb | |||||||
0 | 147 (68.1) | 135 (62.5) | 282 (65.3) | - | - | - | - |
≤ 1 | 29 (13.4) | 30 (13.9) | 59 (13.7) | 0.328 | 0.243 | 0.176 | 1.388 |
Using OCPs | |||||||
Positive | 16 (7.4) | 22 (10.2) | 38 (8.8) | - | - | - | - |
Negative | 160 (74.1) | 143 (66.2) | 303 (70.1) | 0.349 | 0.344 | 0.705 | 0.310 |
Dyspareuniab | |||||||
Positive | 30 (13.9) | 36 (16.7) | 66 (15.4) | 0.586 | 0.224 | 1.796 | 0.009 |
Negative | 147 (68.1) | 129 (59.7) | 276 (63.9) | - | - | - | - |
Family history of endometriosis | |||||||
Positive | - | 3 (1.4) | 3 (0.7) | - | - | - | - |
Negative | 216 (100) | 213 (98.6) | 429 (99.3) | - | - | - | - |
History of pelvic infection | |||||||
Positive | 4 (0.2) | 4 (0.2) | 8 (1.8) | 0.292 | 0.770 | 1.340 | 0.704 |
Negative | 212 (98) | 212 (98) | 425 (98.2) | - | - | - | - |
History of infertilityb | |||||||
Positive | 37 (17.2) | 9 (4.2) | 46 (10.6) | 1.657 | 0.427 | 5.242 | < 0.001 |
Negative | 40 (64.4) | 156 (72.2) | 295 (68.3) | - | - | - | |
Type of infertility | |||||||
Primary | 26 (33.8) | 9 (15) | 35 (25.5) | 0.128 | 0.241 | 1.136 | 0.597 |
Secondary | 11 (14.3) | - | 11 (8.9) | - | - | - | - |
History of infertility treatment | |||||||
Positive | 37 (48.1) | 9 (15) | 46 (33.6) | 0.138 | 0.272 | 1.157 | 0.502 |
Negative | 1 (1.3) | - | 1 (1.3) | - | - | - | - |
Premenstrual spottingb | 0.513 | 0.249 | 1.670 | 0.039 | |||
Positive | 50 (23.1) | 33 (15.3) | 83 (19.2) | ||||
Negative | 167 (79.9) | 183 (84.7) | 349 (80.8) | ||||
Pain in defecation | - | - | - | - | |||
Positive | 9 (4.2) | - | 9 (2.1) | ||||
Negative | 207 (95.8) | 216 (100) | 423 (97.9) |
The Results of the Association Between Endometriosis and Independent Variables of the Studya
Variable | Coefficients | SD | OR | P Value |
---|---|---|---|---|
Age, y | 0.042 | 0.012 | 1.043 | < 0.001 |
Premenstrual spotting | 0.045 | |||
Positive | 0.654 | 0.259 | 1.936 | |
Negative | - | - | - | |
Number of live births | ||||
Positive | - | - | - | - |
Negative | 0.261 | 0.258 | 0.771 | 0.012 |
Constant value | -1.131 | 4.00 | 7.977 | 0.005 |
The Results of the Multiple Logistic Regression in the Women Under Study
Backward logistic regression analysis showed that variables including “number of live births and premenstrual spotting” were predicted as factors affecting the occurrence of endometriosis. According to the multiple logistic regression, the number of live births is a protective factor against endometriosis and premenstrual spotting increases the risk of endometriosis (OR = 1.936); each year increase in age increases the risk of endometriosis by 1.043 times.
To select the most efficient artificial neural network, the AUC, the percentage of correct predictions, and mean error squares were used and among 1440 networks assessed, 0.92 neural networks with AUC above 0.72 were evaluated. Table 3 demonstrates the most efficient artificial neural network with AUC above 0.9.
No. | Number of Middle Layers | Number of Nodes of the First Hidden Layer | Number of Nodes of the Second Hidden Layer | First Activity Subordinate | Second Activity Subordinate | Sum of Errors’ Squares | AUC | Incorrect Predictions, % |
---|---|---|---|---|---|---|---|---|
1 | 2 | 11 | 7 | Hyperbolic tangent | Sigmoid | 36.77 | 0.911 | 16.6 |
2 | 2 | 11 | 6 | Hyperbolic tangent | Sigmoid | 37.148 | 0.908 | 18.6 |
3 | 2 | 11 | 5 | Hyperbolic tangent | Hyperbolic tangent | 38.843 | 0.911 | 17.6 |
4 | 2 | 12 | 8 | Hyperbolic tangent | Sigmoid | 30.025 | 0.940 | 14.0 |
5 | 2 | 12 | 7 | Hyperbolic tangent | Sigmoid | 36.735 | 0.907 | 16.9 |
6 | 2 | 12 | 8 | Hyperbolic tangent | Sigmoid | 37.449 | 0.901 | 17.6 |
7 | 2 | 11 | 6 | Hyperbolic tangent | Sigmoid | 34.635 | 0.911 | 16.6 |
8 | 2 | 11 | 7 | Hyperbolic tangent | Sigmoid | 35.743 | 0.916 | 15.9 |
9 | 1 | 9 | - | Sigmoid | Sigmoid | 36.837 | 0.905 | 15.6 |
The Results of the Best Neural Network Models with the Number of Neurons and Different Activity Subordinates
The results of the present study showed that among different architectures of the neural network, the AUC of 0.94, including (2:8:12), hyperbolic tangent, and sigmoid activity subordinates were the most effective models to predict the occurrence of endometriosis for the first and second activity, respectively. All variables entered ANN, like logistic regression, including age, BMI, etc., and the significant variables were determined to be the number of live births, age, premenstrual spotting, and BMI, respectively, which had a significant association with endometriosis (Tables 3 and 4).
Importance | Normalized Importance, % | |
---|---|---|
Number of live births | 0.183 | 53.2 |
Age | 0.343 | 99.8 |
Premenstrual spotting | 0.131 | 38.3 |
BMI | 0.343 | 100.0 |
Independent Variable Importance in ANN
The results of the study showed that ANN with AUC of 0.94 has a higher efficacy than logistic regression with AUC of 0.72 (Figure 1).
The best-fit point in the ROC curve of logistic regression was obtained with a sensitivity of 0.688 and specificity of 0.615 and in the ROC curve of the ANN, it was obtained with a sensitivity of 0.935 and specificity of 0.873.
4. Discussion
Endometriosis is a common gynecologic problem in women of reproductive age that presents with pelvic pain, dysmenorrhea, and infertility (3). It is characterized by the presence of endometrial stroma outside the endometrial cavity and myometrium. Although the pelvis is the most common site for endometriosis, endometrial implants may occur in almost any part of the body.
Even though many hypotheses explain why women develop endometriosis, none has been proven conclusive. Given that the main cause of the disease is unknown, the identification of risk factors for the development of endometriosis and taking necessary measures can be effective in the diagnosis and decrease of complications, including infertility.
In the present study, artificial neural network and logistic regression were used to examine factors affecting the occurrence of endometriosis and their efficacy was compared using AUC.
According to the results of the current study, variables, including age, irregular menstruation, menstrual cycle duration, duration of bleeding, number of pregnancies, number of live births, and premenstrual spotting showed significant correlations with the response variable.
The results of the current study showed that women with irregular menstrual cycles are at a greater risk of endometriosis (OR = 1.57), which is in line with the results reported previously in a similar population (8) that is also confirmed by Matalliotakis et al. in Italy and Collazo colleagues in Poland (15, 16) and it might be due to the fact that irregular menstruation increases the risk of developing endometriosis due to the increase in retrograde menstruation.
In the present study, there was an inverse association between the number of live births and the risk of endometriosis, which is consistent with the results of studies by Burghaus and colleagues in Germany (17), Matalliotakis and colleagues in Italy (15), and Hemmings and colleagues in Canada (18). Based on these studies and other similar studies, pregnancy and live births are protective factors against endometriosis (8, 15, 17, 18). The reason for this phenomenon might be attributable to the fact that menstruation does not occur during pregnancy and some women during lactation, as a result, experience fewer menstrual cycles; so, the likelihood of retrograde menstruation would be reduced in them that may act as a protective factor against endometriosis.
The results of this research are also in line with studies by Kirshon et al. and Kennedy et al. on the impact of age on endometriosis (19, 20). With increasing age, women experience more frequent menstrual cycles and may have prolonged menstrual periods and this increases the risk of retrograde menstruation. Other reasons may include the fact that the quality and sensitivity of immune cells of the body, decreasing by increasing age, may not be able to inhibit the endometrial cells that migrate to other parts; it is also possible that the hormonal disorders and uterine abnormalities increase the risk of retrograde menstruation and the risk of endometriosis by increasing age.
In the present study, premenstrual spotting increased the chance of developing endometriosis (OR = 1.68), which is consistent with the results of other studies (18, 21-23). Frequent spotting may increase the risk of retrograde menstruation, which requires further investigation.
In this study, based on the results of ANN, predictor variables in order of importance included body mass index, duration of menstrual bleeding, age, and spotting.
The results of the current study showed that ANN with AUC of 0.94 has a higher efficacy than logistic regression with AUC of 0.72. Considering the fact that no study in this field has used ANN in the field of endometriosis, we make the comparison with the results of studies with a similar design in other diseases.
Siristatidis and colleagues have similarly evaluated the efficacy of ANN in some gynecologic diseases and proposed ANN as an appropriate alternative to logistic regression for the prediction of gynecologic diseases (24). In addition, the ANN was established to be able to classify endometrial lesions properly (25), thus, ANN is also useful in clinical decision-making.
The results of the current study are consistent with the findings of other studies that evaluated the efficiency of ANN on the prediction of other diseases, including hypertension (26), diabetes (27), and coronary artery disease (28), predicting metabolic syndrome (29), complications of diabetes (30), gastric cancer (31), and other cancerous lesions (25, 32), and predicting mortality in patients with sepsis (33); in all the mentioned studies, ANN had a higher efficacy than logistic regression in predicting the studied outcomes.
As previous studies using ANN prediction model also stated, the prediction has a great role in today’s medicine, as far as a causal relationship cannot be established for many diseases. Therefore, a better predictive model may give the physicians and researchers a better perspective towards diseases. In most studies, ANN had a better fitness, but it is important to point out that if the network can be trained correctly and the best structure for prediction can be achieved, the network can provide an appropriate prediction from the new data. This issue is of great importance in health and treatment issues, especially in the allocation of health resources for high-risk and at-risk patients and can reduce the complications of such diseases by proper diagnosis and prompt treatment.
With each year increase in age, the odds increased 1.043 times and with a 10-year increase, 33.86 fold increases were observed. The chance of endometriosis in subjects with premenstrual spotting is about twice those without; and for those with no live births, the chance of endometriosis is 1.3 folds the subjects with live births. In ANN, as demonstrated in Table 4, BMI was the most important factor, followed by age, number of births, and premenstrual spotting, in sequence.
4.1. Conclusion
In this study, age, irregular menstruation, menstrual cycle duration, duration of menstrual bleeding, number of pregnancies, number of live births, and premenstrual spotting showed significant correlations with the response variable. According to the results of the present study, the artificial neural network has greater prediction accuracy and this model is more suitable to use for predicting endometriosis. The accurate determination of the factors affecting endometriosis is of great importance and can help prevent the severe complications of this disease, especially infertility, by prompt diagnosis and treatment.