1. Context
2. Evidence Acquisition
Key artificial intelligence (AI)-powered applications in thyroid disorders. The “*” symbol indicates that the application of AI in the management of thyroid nodules is not explored in the current study, although it is addressed in a separate survey conducted by the authors (5).
3. Results
3.1. Thyroid Function Test and Routine Clinical Laboratory Data
| First Author, Year, Reference | Aim | Technique | Dataset | Sample Size | Performance Metrics |
|---|---|---|---|---|---|
| Cheng et al., 2022 (16) | Diagnosis of thyroid dysfunction from thyroid datasets (TFT) | DT, LR, XGB, and SVM | 21 features, including TFT | 11565 Nl and 1170 elevated TSH from 1 center | XGB: AUC 0.87, and ACC 0.86 |
| Islam et al., 2022 (17) | Same | CatB, Extra-Trees, ANN, LGBM, SVC, KNN, RF, XGB, DT, and GaussianNB | UCI sick-euthyroid dataset (25 features) | 2870 sick and 292 Nl | ANN: F1-score 0.957, precision 0.957, recall 0.959, and ACC 0.9587 |
| Abbad Ur Rehman et al. 2021 (19) | Same | KNN, NB, SVM, LR, and DT | DHQ Teaching Hospital thyroid dataset (10 features) | 170 Nl, 66 hyper, and 73 hypo | NB: ACC 100%, recall 100%, and F1-score 100% |
| Rasitha, 2016 (14) | Same | LDA | UCI hypothyroid dataset (29 features) | 3481 Nl, 194 compensated hypo, 95 primary hypo, and 2 secondary hypo | Precision 0.996, recall 0.996, ACC 99.62, and ROC 0.996 |
| Yadav and Pal, 2020 (20) | Same | DT, RF, CART, and bagging ensemble model | UCI thyroid disease dataset (30 features) | 3710 patients (2 classes) | Bagging ensemble model: ACC 100% |
| Chaganti et al., 2022 (15) | Same | RF, LR, SVM, ADA, GBM and CNN, LSTM, and CNN-LSTM | UCI thyroid disease dataset (30 features) | 233 primary hypo, 359 compensated hypo, 346 increasing binding proteins, 456 concurrent non-thyroidal illness, and 400 Nl | RF: F1-score 0.99, precision 0.99, recall 0.99, and ACC 0.99 |
| Mir and Mittal, 2020 (18) | Same | Boosting, bagging, NB, SVM, and J48 | 21 features, including TFT | 489 Nl, 488 hyper, and 487 hypo from 1 center | SVM: ACC 99.08, precision 0.991, recall 0.991, and ROC 0.994 |
| Yoshimura Noh et al., 2024 (21) | Diagnosis of thyroid dysfunctions from routines | ANN (Prediction one) and LR | JHEP (11 features), JND (9 features), and Ito (32 features) datasets including routine lab | 20653 GD, 3435 painless thyroiditis, 4266 Nl, and 18937 HT from 1 center | Thyrotoxicosis: AUC 0.977 Hypothyroidism: AUC 0.877 |
| Hu et al., 2022 (22) | Same | GBDT, SVM, LR, and ANN | EMRs (23 features, including routine lab) | 176727 patients from 4 centers | Hyperthyroidism: AUC 93.8% Hypothyroidism: AUC 90.9% |
| Ghali et al., 2020 (23) | Prediction of TSH by macroelements and vitamins | ANFIS, ANN, and MLR | 7 vitamins and macronutrients | Blood sample of 1 patient | ANFIS: R2 0.914 |
| Shin et al., 2023 (24) | Detection of hyperthyroidism | LGB | 662 pairs of TFT and HR | 175 patients (2 classes) from 1 center | Sensitivity 86.14%, specificity 98.28%, NPV 95.32, and PPV 94.57% |
| Choi et al, 2022 (25) | Same | DL | 174331 ECGs for training, 48648 for external validation | 146672 patients from 2 centers | AUC 0.926 for internal validation and AUC 0.883 for external validation |
Abbreviations: DT, decision tree; LR, logistic regression; XGB, X gradient boosting; SVM, support vector machine; CatB, CatBoost; ANN, artificial neural network; GaussianNB, Gaussian naive bayes; LGBM, light gradient-boosting machine; SVC, support vector classifier; KNN, K-nearest neighbors; RF, random forest; LDA, linear discriminant analysis; CART, classification and regression tree; BP-AdaBoost, back propagation-adaptive boosting; CNN, convolutional neural network; ANFIS, adaptive neuro-fuzzy inference system; MLR, multiple linear regression; ADA, AdaBoost; GBM, gradient boosting machine; LSTM, long short-term memory; DL, deep learning; NL, normal; AUC, area under curve; ACC, accuracy; TFT, thyroid function tests.
3.2. Ultrasound and Scintigraphy
| First Author, Year, Reference | Aim | Technique | Dataset | Sample Size | Performance Metrics |
|---|---|---|---|---|---|
| Acharya et al., 2014 (31) | Diagnosis of thyroid dysfunctions | SVM, DT, fuzzy classifier, and KNN | 7 features from US images | 232 Nl and 294 HT from 1 center | Fuzzy classifier: ACC 84.6% |
| Zhang et al., 2022 (32) | Same | DL | Features from US images, US videos, and 6 features of TFT | 37424 HT and 69089 Nl from 1 hospital | By US videos + TFT: AUC 0.949 and ACC 0.892 |
| Zhao et al., 2022 (33) | Same | DL | Features from US images | 20666 HT and 18613 non-HT from 2 centers | Acc 0.892, AUC 0.940, and F1-score 0.892 |
| Vasile et al., 2021 (34) | Same | DL | Features from US images | 767 autoimmune, 672 micro-nodular, 720 nodular, and 638 Nl from 4 centers | Acc 98.78, and AUC 0.98 |
| Qiao et al., 2021 (28) | Same | DL | 1430 thyroid scintigraphies | 175 NL, 834 GD, and 421 subacute thyroiditis from 1 center | Subacute thyroiditis: F1-score 84.98 %, precision 77.99%, recall 93.33%, and ACC 89.00% GD: F1-score 88.62 %, precision 93.36%, recall 84.33%, and ACC 92.78% |
| Yang et al., 2021 (30) | Same | DL | 3389 thyroid scintigraphies from 3 centers | 4 classes of scintigraphy pattern | Overall ACC 92.73%, |
| Zhao et al., 2023 (29) | Same | DL | 3194 thyroid SPECT | 742 Nl, 808 GD, 826 subacute thyroiditis, and 818 tumors from 3 centers | Subacute thyroiditis: F1-score 0.958, recall 93.9, precision 97.6, and AUC 0.992 GD: F1-score 0.981, recall 100.0, precision 96.3, and AUC 0.999 |
| Kikuchi et al., 2023 (27) | Same | LGBM | 7013 F-18 FDG PET/CT scans | 182 hypo, 265 hyper, and 6566 Nl from 1 center | Hypothyroidism: AUC 0.77 Hyperthyroidism: AUC 0.78 |
Abbreviations: SVM, support vector machine; DT, decision tree; DL, deep learning; KNN, K-nearest neighbors; US, ultrasound; HT, hypothyroidism; TFT, thyroid function tests; SPECT, single-photon emission computed tomography scan; GD, Graves' disease; LGBM, light gradient boosting machine; F-18 FDG PET/CT, fluorine-18 fluorodeoxyglucose positron emission tomography; NL, normal; AUC, area under curve; ACC, accuracy.
3.3. Thyroid Ultrasound Computer-Aided Diagnosis Systems
3.4. Pregnancy and Major Depressive Disorders
| First Author, Year, Reference | Aim | Technique | Dataset | Sample Size | Performance Metrics |
|---|---|---|---|---|---|
| Stroek et al., 2023 (41) | Congenital hypothyroidism screening | RF | Features from the Deutch National database | 458 CH-T, 82 CH-C, 2332 false-positive referrals, and 1670 Nl | ACC 0.77 |
| Sun et al., 2021 (42) | Prediction of pregnancy outcome | LR, RF, XGB, and DL | Obstetrics and pre- and post-conception serum TSH features | 3428 delivery from 1 center | XGB: PRETERM BIRTH: AUC 0.812 LOW APGAR SCORE: AUC 0.987, RF: INDUCTION: AUC 0.650 |
| Araya et al., 2021 (43) | Prediction of GDM | PCA | 29 thyroidal and non-thyroidal features in 1 - 2 trimesters | 39 pregnancies from 1 center | NA |
| Mennickent et al., 2023 (44) | Same | LR, L-SVM, PLS-DA, CART, and XGB | 75 thyroidal and non-thyroidal features in 1-2 trimesters | 12 GDM and 54 NGT from 3 centers | PLS-DA: AUC 0.940 |
| Zhou et al., 2022 (45) | Prediction of preterm delivery | GAM | Features from routine prenatal examination | 3176 preterm birth, 2127 Spontaneous preterm birth, and 1049 Iatrogenic preterm birth from 1 center | NA |
| Zhang et al., 2021 (46) | Prediction of postpartum depression | RF, DT, XGB, LR, and MLP | EMRs (32 features) | 14187 non-PPD and 1010 PPD from 1 center for training, 50459 non-PPD and 3513 PPD from another center for validation | LR: AUC 0.937 training and 0.886 validation |
| Yuan et al., 2023 (47) | Prediction of abortion | LR and XGB | 48 features | 340 abortions and 677 IVF-treated delivery from 1 center | XGB: AUC 0.759 and F1-score 0.566 |
| Yang et al., 2023 (40) | Prediction of suicide attempt | LASSO | Hamilton depression and anxiety symptoms and biological features | 1372 non-attempts and 208 attempts from 1 center | AUC 0.72 |
| Li et al., 2021 (38) | Same | GBDT | Hamilton depression and anxiety symptoms and biological features | 1372 non-attempts, 235 recent attempts, and 111 late attempts from 1 center | RECENT: ACC 87% LATE : ACC 88% |
| Qiao et al., 2022 (39) | Prediction of MDD prognosis | SVM | Hamilton depression and anxiety symptoms and TFT features | 2086 MDD from 1 center | ROC-AUC 0.86 |
Abbreviations: GDM, gestational diabetes mellitus; NGT, normal glucose tolerance; LR, logistic regression; XGB, X gradient boosting; SVM, support vector machine; RF, Random forest; PCA, principal component analysis; L-SVM, linear support vector machine; MLP, multilayer perception; LASSO, least absolute shrinkage and selection operator; CART, classification and regression tree; GAM, generalized additive model; DT, decision tree; GBDT, gradient-boosting decision tree; DL, deep learning; PLS-DA, partial least-squares discriminant analysis; EMR, electronic medical record; NL, normal; AUC, area under curve; ACC, accuracy; MDD, major depressive disorder.
3.5. Bioinformatics, Exposure, Radioiodine Therapy, and Levothyroxine Dose Adjustment
| First Author, Year, Reference | Aim | Technique | Dataset | Sample Size | Performance Metrics |
|---|---|---|---|---|---|
| Atas, 2023 (50) | Prediction of autoimmune concomitant dx with HT | SVM, RF, LR, KNN, MLP, and a ML hybrid model | OMIM, PUBMED, Entrez Gene on NCBI, NCBI dbSNP, and SWISS Prot database | 162 genes | ACC 0.815, precision 0.731, recall 1.0, and F1-score 0.800 |
| Li et al., 2024 (51) | HT diagnosis based on genes | LASSO | HRA001684, GSE29315 and GSE163203 datasets | 2000 highly variant genes | NA |
| Shen et al., 2021 (52) | Prediction of genes of hyperthyroidism | RW-RVM, RF, ANN, and NB | DisGeNET | 269 genes | AUC 0.90 |
| Liu et al., 2023 (54) | Screening Tshr agonists | RF, MLP, SVM, and GAT | Updated TSHR agonist dataset from PubChem fingerprints | 7 molecular representations | RF: AUC 0.984, and ACC 0.941 |
| Xu et al., 2022 (55) | Detection of TSHR inhibitory chemicals | RF, XGB, and LR | Tox21 database | 5952 compounds from a cAMP analysis | RF: ACC 0.85, recall 0.89, and AUC 0.92 |
| Gao et al., 2021 (58) | Prediction of 131i therapeutic dose | BPNN, RBFNN, SVM, BP-AdaBoost, and RF | EMRs (17 features) | 353 patients from several centers | RF: ACC 100% |
| Duan et al., 2022 (57) | Prediction of hypothyroidism after RAIT | ML | EMR (138 clinical and lab test features) | 471 GD patients from 1 center | AUC 0.74 and F1-score 0.74 |
| Chen et al., 2019 (59) | Levothyroxine dosage post-thyroidectomy | DT | LT4 doses and TSH levels | 320 patients from 1 center | Correctly predicted dose adjustment 75%, confidence interval = 65% - 82% |
| Barrio et al., 2023 (60) | Same | ANN, RF, 0LS, and LR | Demographic, clinical, and laboratory data | 951 patients from 1 center | Met postopTSH goal 45.3% |
| Hemmati et al., 2023 (61) | Same | Fuzzy logic | NA | Thyrosim application to simulate thyroid hormone courses of a virtual thyroidectomized patient | NA |
Abbreviations: HT, Hashimoto’s thyroiditis; GD, Graves’ disease; TSHR, thyroid stimulating hormone receptor; RAIT, radioactive iodine therapy; LT4, levothyroxine; LR, logistic regression; XGB, extreme gradient boosting; SVM, support vector machine; ANN, artificial neural network; KNN, K-nearest neighbors; RF, random forest; OLS, ordinary least squares; BP-AdaBoost, back propagation-adaptive boosting; RBFNN, radial basis functions neural Network; MLP, multilayer perceptron; LASSO, least absolute shrinkage and selection Operator; RW-RVM, random walk-relevance vector machine; GAT, graph attention network; BPNN, back propagation neural network; EMR, electronic medical record; NL, normal; AUC, area under curve; ACC, accuracy.
3.6. Thyroid-Associated Ophthalmopathy
| First author, Reference, Number | Aim | Technique | Dataset | Sample Size | Performance Metrics |
|---|---|---|---|---|---|
| Lee et al., 2022 (62) | TAO diagnosis | DL | Orbital CT scans | 99 mild GO, 94 mod-to-severe, and 95 Nl from 1 center | AUC 0.895 - 0.979 |
| Lin et al., 2024 (63) | Same | DL | Orbital CT scans | 459 mild, 355 severe, and 373 Nl from 1 center | ACC 89.5% and AUC 0.96 - 0.99 |
| Lin et al., 2021 (64) | Same | DL | Orbital MRI | 50 active phase and 110 inactive phase from 1 center | ACC 0.863, precision 0.680, and F1-score 0.712 |
| Karlin et al., 2023 (65) | Same | DL | External orbital photographs | 2288 images from 1 clinical dataset | ACC 89.2%, recall 93.4%, precision 79.7%, and F1 score 86.0% |
| Yoo et al., 2020 (66) | Prediction of post-orbital decompression surgery appearance | Generative adversarial network (GAN) | Facial photographs | 109 pairs of matched pre- and postoperative facial images from amGoogle image search | ACC 90.9% and AUC 0.957 |
| Zhai et al., 2021 (67) | Prediction of the therapeutic efficacy of IV glucocorticoids | Binary LR | Orbital MRI and clinical characteristics | 35 responsive and 28 unresponsive orbit | AUC 0.844 |
Abbreviations: TAO: thyroid associated ophthalmopathy; LR: logistic regression; DL: deep learning; CT: computed tomograohy scan; MRI: magnetic resonance imaging; NL: normal; AUC: area under curve; ACC: accuracy.
