1. Background
2. Objectives
3. Patients and Methods
3.1. Study Population
3.2. General Description of Data
3.3. Pre-Processing and Transformation of Data
| Variables | Cutoff Value | Values b |
|---|---|---|
| Age, y | ||
| Group 1 | 20-34 | 2525 (37.98) |
| Group 2 | 35-49 | 2444 (36.76) |
| Group 3 | 50-64 | 1310 (19.70) |
| Group 4 | ≥ 65 | 368 (5.53) |
| Total length of stay in the city, y | ||
| Group 1 | < 20 | 609 (9.2) |
| Group 2 | 20-39 | 3884 (58.4) |
| Group 3 | ≥ 40 | 2131 (32.1) |
| Education | ||
| Group 1 | ≥ 13 years | 990 (14.89) |
| Group 2 | 6-12 years | 3877 (58.32) |
| Group 3 | ≤ 5 years | 1780 (26.77) |
| Gender | ||
| Female | - | 3762 (56.59) |
| Male | - | 2885 (43.40) |
| Occupation | ||
| Employed | - | 2718 (40.89) |
| Housekeeping (for females) | - | 3041 (45.74) |
| Student | - | 215 (3.23) |
| Unemployed | - | 642 (9.65) |
| Other | - | 31 (0.49) |
| Marital status | ||
| Divorced | - | 55 (0.82) |
| Married | - | 5512 (82.92) |
| Single (unmarried) | - | 869 (13.07) |
| Widowed | - | 210 (3.15) |
a Data are presented as No. (%).
bThe data contains missing values when the cell percentages do not sum up to 100%.
3.4. Analysis Method
3.4.1. Association Rules Mining

4. Results
| Variables | Cutoff Value | Values b |
|---|---|---|
| Waist circumference, cm | ||
| Normal | < 90 | 3554 (53.47) |
| Abnormal | ≥ 90 | 2914 (43.84) |
| Wrist circumference, cm | ||
| Women | ||
| Group 1 | < 15.5 | 1058 (28.12) |
| Group 2 | 15.5-16.5 | 1357 (36.07) |
| Group 3 | > 16.5 | 1252 (33.28) |
| Men | ||
| Group 1 | < 17.2 | 934 (32.37) |
| Group 2 | 17.2-18 | 806 (27.94) |
| Group 3 | > 18 | 1109 (38.44) |
| Waist to height ratio | ||
| Normal | < 0.5 | 785 (11.80) |
| Abnormal | ≥ 0.5 | 5683 (85.49) |
| Waist to hip ratio | ||
| Men | ||
| Normal | < 0.9 | 476 (16.49) |
| Abnormal | ≥ 0.9 | 2373 (82.25) |
| Women | ||
| Normal | < 0.85 | 2172 (57.73) |
| Abnormal | ≥ 0.85 | 1447 (38.46) |
| Body Mass Index, kg/ m2 | ||
| Normal | < 25 | 2364 (35.56) |
| Overweight | 25-30 | 2702 (40.64) |
| Obese | ≥ 30 | 1424 (21.42) |
a Data are presented as No. (%).
bThe data contains missing values when the cell percentages do not sum up to 100%.
| Variables | Cutoff Value | Values c |
|---|---|---|
| Fasting Plasma Glucose, mg/dL | ||
| Normal | < 100 | 946 (14.23) |
| Impaired Fasting Glucose | 100-126 | 5701 (85.76) |
| Two hour postprandial plasma glucose, mg/dL | ||
| Normal | < 140 | 5848 (87.97) |
| Impaired Glucose Tolerance | 140-200 | 799 (12.02) |
| Total cholesterol, mg/dL | ||
| Normal | < 200 | 3382 (50.88) |
| Hypercholesterolemia | ≥ 200 | 3265 (49.11) |
| Triglyceride Levels, mg/dL | ||
| Normal | < 150 | 3690 (55.51) |
| Hypertriglyceridemia | ≥150 | 2957 (44.48) |
| Cholesterol to High Density lipoprotein Ratio | ||
| Normal | < 5.3 | 3841 (57.78) |
| Abnormal | ≥ 5.3 | 2801 (42.13) |
| High Density Lipoprotein, mg/dL | ||
| Men | ||
| Low | < 40 | 1893 (65.61) |
| Normal | ≥ 40 | 990 (34.31) |
| Women | ||
| Low | < 50 | 2747 (73.01) |
| Normal | ≥ 50 | 1012 (26.90) |
| Triglyceride to High Density Lipoprotein Ratio | ||
| Men | ||
| Normal | < 4.7 | 1647 (57.08) |
| Abnormal | ≥ 4.7 | 1236 (42.84) |
| Women | ||
| Normal | < 3.7 | 2358 (62.67) |
| Abnormal | ≥ 3.7 | 1401 (37.24) |
| Chronic Kidney Disease | ||
| CKD | GFR < 60 | 2262 (34.03) |
| Non CKD | GFR ≥ 60 | 4382 (65.92) |
| Systolic Blood Pressure, mm Hg | ||
| Normal | < 140 | 5946 (89.45) |
| Hypertension | ≥ 140 | 613 (9.22) |
| Diastolic Blood Pressure, mm Hg | ||
| Normal | < 90 | 5852 (88.04) |
| Hypertension | ≥ 90 | 707 (10.64) |
a Abbreviations: CKD, chronic kidney disease.
b Data are presented as No. (%).
cThe data contains missing values when the cell percentages do not sum up to 100%.
| Variables | Values b |
|---|---|
| History of hospitalization until now | |
| Yes | 4566 (68.69) |
| No | 2081 (31.30) |
| History of ischemic heart disease | |
| Yes | 173 (2.60) |
| No | 6474 (97.40) |
| History of non-ischemic heart disease | |
| Yes | 295 (4.43) |
| No | 6352 (95.57) |
| History of hypertension | |
| Yes | 725 (10.90) |
| No | 5922 (89.10) |
| History of hyperlipidemia | |
| Yes | 1161 (17.46) |
| No | 5486 (82.54) |
| Family history of cardiovascular disease in male relatives (father, brother, son) aged under 55 | |
| Yes | 566 (8.51) |
| No | 6081 (91.49) |
| Family history of cardiovascular disease in female relatives (mother, sister, daughter) aged under 65 | |
| Yes | 523 (7.86) |
| No | 6124 (92.14) |
| Family history of diabetes in first-degree relatives | |
| Yes | 1731 (26.04) |
| No | 4916 (73.96) |
| Goiter Status | |
| Grade 1and 2 | 1764 (26.53) |
| No goiter | 4883 (73.47) |
| Thyroid nodules | |
| Yes | 393 (5.91) |
| No | 6254 (94.09) |
a Data are presented as No. (%).
b The data contains missing values when the cell percentages do not sum up to 100%.
| Variables | Values b |
|---|---|
| Current cigarette smoking | |
| Yes (daily / occasionally) | 830 (12.48) |
| No | 5817 (87.52) |
| Former cigarette smoking | |
| Yes (daily / occasionally) | 489 (7.35) |
| No | 6158 (92.65) |
| Exposed to second hand smoke at home or at work | |
| Yes | 1690 (25.42) |
| No | 4957 (74.57) |
| Physical activity levels | |
| Low (doing exercise or labor less than three times a week) | 4426 (66.58) |
| Normal (doing exercise or labor more than three times in a week) | 2221 (33.42) |
| Use of diet or exercise for the management of hyperlipidemia | |
| Yes | 559 (8.40) |
| No | 6088 (91.59) |
| Use of diet or exercise for the management of hypertension | |
| Yes | 319 (4.79) |
| No | 6328 (95.20) |
| Use of antihypertensive drugs in the past month | |
| Yes | 324 (4.87) |
| No | 6323 (95.12) |
| Use of lipid lowering drugs in the past month | |
| Yes | 150 (2.25) |
| No | 6497 (97.74) |
| Use of diuretic drugs in the past month | |
| Yes | 112 (1.68) |
| No | 6535 (98.32) |
| Use of thyroid drugs in the past month | |
| Yes (ordered / unordered) | 166 (2.49) |
| No | 6481 (97.51) |
| Use of aspirin in the past month | |
| Yes (ordered / unordered) | 178 (2.67) |
| No | 6469 (97.33) |
a Data are presented as No. (%).
bThe data contains missing values when the cell percentages do not sum up to 100%.
4.1. Results of Association Rules Mining Using the Apriori Algorithm
| Rule Number | Antecedent | Consequent | Support b | Confidence c | Lift d |
|---|---|---|---|---|---|
| 1 | IFG = yes, IGT = yes, BMI ≥ 30, waist to height ≥ 0.5 | Type 2 DM | 2.8 | 75.0 | 6.6 |
| 2 | IFG = yes, IGT = yes, BMI ≥ 30, Marital status = Married, waist to height ≥ 0.5 | Type 2 DM | 2.4 | 75.0 | 6.6 |
| 3 | IFG = yes, IGT = yes, BMI ≥ 30, HDL < 50, waist to height ≥ 0.5 | Type 2 DM | 2.3 | 75.6 | 6.7 |
| 4 | IFG = yes, IGT = yes, BMI ≥ 30, wrist circumference ≥ 16.5, waist to height ≥ 0.5 | Type 2 DM | 2.2 | 76.5 | 6.7 |
| 5 | IFG = yes, IGT = yes, , BMI ≥ 30, waist to hip ≥ 0.85, waist to height ≥ 0.5 | Type 2 DM | 2.2 | 76.5 | 6.7 |
| 6 | IFG = yes, IGT = yes, Family history of diabetes = yes, waist to height ≥ 0.5 | Type 2 DM | 2.1 | 78.2 | 6.9 |
| 7 | IFG = yes, IGT = yes , BMI ≥ 30, HDL< 50, Marital status = Married | Type 2 DM | 2.1 | 75.6 | 6.7 |
a Abbreviations: IFG, Impaired Fasting Glucose; IGT, Impaired Glucose Tolerance; BMI, Body Mass Index; Type 2 DM, Type 2 Diabetes Mellitus.
b The percentage of records in the data for which the antecedents are true.
c The percentage of records in the data that for which both antecedents and consequent is true.
d The ratio between the rule’s confidence and the support of the item sets in consequent of a rule.
| Rule Number | Antecedent | Consequent | Support b | Confidence c | Lift d |
|---|---|---|---|---|---|
| 1 | IGT= yes, IFG= yes, CHO to HDL ≥ 5.3, occupation status= employed, waist to hip ≥ 0.9 | Type 2 DM | 2.2 | 65.1 | 6.2 |
| 2 | IGT= yes, IFG= yes, length of stay in the city ≥ 40, wrist circumference ≥ 18, waist to hip ≥ 0.9 | Type 2 DM | 1.9 | 66.1 | 6.3 |
| 3 | IGT= yes, IFG= yes, and CKD= yes, Physical activity levels = low, waist to hip ≥ 0.9 | Type 2 DM | 1.8 | 65.4 | 6.2 |
| 4 | IGT= yes, IFG= yes, wrist circumference ≥ 18, occupation status= employed , waist to height ≥ 0.5 | Type 2 DM | 1.8 | 69.2 | 6.6 |
a Abbreviations: IFG, Impaired Fasting Glucose; IGT, Impaired Glucose Tolerance; BMI, Body Mass Index; Type 2 DM, Type 2 Diabetes Mellitus.
b The percentage of records in the data for which the antecedents are true.
c The percentage of records in the data that for which both antecedents and consequent is true.
d The ratio between the rule’s confidence and the support of the item sets in consequent of a rule.
