1. Context
1.1. Rationale
1.2. Introduction to Artificial Intelligence Technologies
1.3. A Summary of the Approach to Thyroid Nodules
Diagnostic approach to thyroid nodule based on Harrison's principles of internal medicine. Green areas represent diagnosis topics, and yellow areas relate to treatment. This review focuses on the advancements of artificial intelligence (AI) in the highlighted topics (15).
2. Methods
3. Results
| Year | Aim | Technique | Dataset | Sample Size | Performance Metrics (Testing Cohort) | Reference |
|---|---|---|---|---|---|---|
| 2024 | Using AI for US multi-tissue segmentation | DL (UNet) + attention gating and pyramid pooling modules | US images | 4600 US images with ≥ 23000 annotated regions | DSC 81.88 | (16) |
| 2021 | Using AI for dynamic CEUS TNOD diagnosis | DL (hierarchical temporal attention network) | CEUS images from 1 center | 336 lesions’ CEUS images (77 Nodular Goiter, 84 Adenoma, 101 PTC, and 74 PTMC) | ACC 80.18 and F1 score 79.90 | (17) |
| 2023 | Using AI for US TNOD segmentation | DL (UNet) + boundary-preserving assembly transformer | A US images public dataset (TN3k) and a private one from 1 center | Public 3493 and private 328 US images | TN3k: ACC 97.22, F1 score 84.23, DSC 83.64, and AUC 92.03/private: ACC 97.80, F1 score 85.81, DSC 85.63, and AUC 92.19 | (18) |
| 2023 | Using AI for US TNOD segmentation | DL (BTNet: Boundary attention transformer net) | A US images public dataset (DDTI) and a private one from 6 centers | DDTI 626 and private 532 US images | DDTI: DSC 0.757/private: DSC 0.892 | (19) |
| 2019 | Comparing Linear and nonlinear ML models for TNOD classification | Ridge-penalty, Lasso-penalty, Elastic net, RF, k-SVM, ANN, k-NN, and NB | Pathological diagnosis confirmed TNODs US features: Size, margins, shape, aspect ratio, capsule, hypoechoic halo, vascularity, echo, cervical LN status calcification, and composition | 501 benign and 678 malignant | Overall AUC 0.928-0.954 (RF: AUC 0.989 training/0.954 testing) | (20) |
| 2019 | Using an ML model for US and USE malignant TNOD identification | LR, LDA, RF, k‐SVM, AdaBoost, k‐NN, ANN, NB, and CNN | US and USE features of 2064 TNOD underwent hemi‐ or total thyroidectomy from 1 center: USE grade, etc. | 1314 benign and 750 malignant | AUC 0.859-0.924 (RF: AUC 0.986 training/0.924 validation) | (21) |
| 2020 | Using ML for US follicular adenoma and carcinoma differentiation | ANN and SVM | 10 features out of 96 radiomics features extracted from 348 US from 2 centers | 252 adenoma and 96 carcinoma | ANN: Sensitivity 32.3, specificity 74.1, and ACC 79.4/SVM: Sensitivity 90.1, specificity 41.7, and ACC 69.0 | (22) |
| 2022 | Using AI for US TNOD diagnosis based on ACR TI-RADS | CNN (InceptionResNetV2) | 10 features covering TI-RADS categorization extracted from 1588 US TNODs who underwent hemi‐ or total thyroidectomy from 2 centers | PTC 484, FTC 14, MTC 1, nodular hyperplasia 987, follicular adenoma 70, and thyroiditis 32 | AUC 0.91 | (23) |
| 2021 | Using DL for US benign and malignant TNOD differentiation | DL (ThyNet) | Training: 18049 US images from 2 centers/Testing: 4305 images from 7 centers | 22354 US images | AUROC 0.922 | (24) |
| 2022 | Using AI for US benign and malignant TNOD differentiation | Meta-analysis | PubMed, Cochrane Library, Embase, Web of Science, China Biology Medicine, and China National Knowledge Infrastructure | 25 studies with 17429 US TNOD images | Sensitivity 0.88 (0.85 - 0.90), Specificity 0.81 (0.74 -0.86), and AUC 0.92 (0.89 - 0.94) | (25) |
| 2020 | Comparing a CAD system performance and 3 levels of experienced radiologists in TNOD US evaluation | DL (S-Detect) | Patients evaluated with US and cytology for TNOD from 1 center | 197 patients | S-Detect: ACC 88.48, sensitivity 92, specificity 87.9, NPV 98.40 / 1-month and 4-year experienced radiologists: ACC 83.03, Sensitivity 64 and 72, Specificity 86.4 and 85, NPV 93.08 and 94.44/9-year radiologists: ACC 95.76, Sensitivity 84, Specificity 97.9, NPV 97.16 | (26) |
| 2020 | Comparing a CAD system performance and 4 levels (1, 4, 9, and 20 years) of experienced radiologists in TNOD US evaluation | DL (S-Detect) | Patients evaluated with US and cytology for TNOD from 1 center | 204 TNODs in 181 patients | S-Detect: ACC 77.0, sensitivity 91.3, specificity 65.2, NPV 90.1, AUC 0.782 / 1-year experienced radiologist: ACC 63.7 (with S-Detect: 75.0), sensitivity 95.7 (94.6), specificity 37.5 (58.9), NPV 91.3 (93.0), AUC 0.666 (0.767) / 20-year: ACC 84.8 (85.3), sensitivity 96.7 (97.8), specificity 75.0 (75.0), NPV 96.6 (97.7), AUC 0.859 (0.864) | (27) |
| 2020 | Using DL for CT cervical LNM diagnosis | CNNs: VGG16, VGG19, InceptionV3, InceptionResNetV2, DenseNet121, ResNet, DenseNet169, and Xception | 3838 axial CT images from 698 thyroid cancer patients (PTC 689, FTC 5, MTC 3, and PDTC 1) | 3606 benign and 232 malignant LNs | AUROC 0.846 (0.784-0.884) (Xception: AUROC 0.884 external/0.942 internal validations) | (28) |
| 2024 | Using ML for US radiomics LNM diagnosis | Meta-analysis | PubMed, Embase, Cochrane, and Web of Science | 27 studies (16410 thyroid cancer patients, 6356 with LNM) | SROC for clinical features 0.76, for radiomics features 0.84, and for both 0.81 | (29) |
| 2024 | Using AI for nuclear medicine radiomics in thyroid diseases assessment | Meta-analysis | PubMed, Scopus, and Web of Science | 17 studies with 9627 patients | Not reported | (30) |
| 2023 | Using DL for in situ adequacy screening of unstained FNAB samples | DL (FNA-Net): MTL classifier + Faster R-CNN, Inception-ResNet v2, U-Net, and TensorFlow Object Detection API | FNABs | 6 patients with 21 slides and 287 cytopathologic images | AUC 0.84 and F1 score 0.81 | (31) |
| 2022 | Using ML for ROI identification on FNA WSI | TL: CNN (VGG11) + supervised learning + ML classifier + Ordinal regression | Thyroidectomy specimens with a previous FNAB | 908 FNABs: 799 training/109 testing (84 benign and 25 malignant) | AUC 0.931 for WSI-TBS and 0.896 for ROI-TBS | (32) |
| 2016 | Using ML for TNOD malignancy risk evaluation | ANN + resilient back propagation training algorithm | FNABs and surgical specimens of patients who underwent thyroid resection | 345 patients | ACC 64.5 and AUC 0.72 | (33) |
| 2020 | Using ML for TNOD FNA evaluation | RBFN + image analysis algorithms | Thyroidectomy specimens with a previous FNAB | 41324 nuclear measurement by image analysis from 288 benign and 159 malignant patients | Sensitivity 81.4, specificity 90.0, and ACC 86.9 | (34) |
| 2018 | Using ML for FNA follicular adenoma and follicular carcinoma differentiation | ANN | FNABs | 48 FNABs | AUC 1 and ACC 100 | (35) |
| 2020 | Using ML for FNA malignancy prediction | TL: CNN (VGG11) + multiple instance learning + ML classifier + ordinal regression | Thyroidectomy specimens with a previous FNAB | 908 WSIs from 659 patients: 799 training/109 testing | AUC 0.932, Sensitivity 92.0, and Specificity 90.5 | (36) |
| 2022 | Using AI for image analysis of ThinPrep-prepared FNABs | GBM, ETC | 20 FNABs of AUS/FLUS cases and 20 FNABs of benign TNODs | 400 low-power (100x) and 400 high-power (400x) images | AUC 0.75 for low- and 0.74 for high-power | (37) |
| 2023 | Using DL for FNA diagnosis | CNN (EfficientNetV2) + data augmentation + Gradient-weighted Class Activation Mapping (Grad-CAM) + stochastic neighbor embedding (t-SNE) | 393 FNABs | 148395 microscopic images of FNAB | AUC 0.49 for PDTC and 0.91 for MTC, others AUC > 0.95 | (38) |
| 2022 | Using AI for protein-based TNOD classification | ANN + feature selection and feature importance evaluation algorithms | 19 protein biomarkers from 1724 thyroid tissue samples proteomes | 1161 TNODs from 1133 patients compromising 288 TNODs for retrospective and 294 for prospective external validations | ACC 91 for training, 89 for retrospective, and 85 for prospective validations | (39) |
| 2021 | Using ML for predictive and diagnostic power of PPARγ targets for PTC evaluation | PPARGi: ML-Powered Personalized Scoring Index comprising 10 PPARγ targets, RF, SVM, k-NN, ANN, and LR | Datasets selected from public functional genomics data repository: TCGA-THCA, MMD-THCA (PTC), and MMD-THCA (ATC) | Three pairs of monozygotic twins with PTC | AUC 0.828 - 0.998 | (40) |
| 2022 | Using DL for drug response prediction by integrating bulk and scRNA-seq data | DL, TL | 6 public scRNA-seq datasets: GDSC, CCLE, etc. | 1280 cancer cell lines, 1557 drugs/chemical compounds, and their expression profiles on 15962 genes | F1 score 0.892 and AUROC 0.898 | (41) |
| 2023 | Using AI for CT radiomics ATC/PDTC from DTC differentiation | RF | Thyroid cancer patients underwent CECT from 1 center | ATC/PDTC 32 and PTC 58, FTC 40 | AUROC radiomics features 0.883, radiomics and clinical 0.908, ACC 84.6% and 86.5% | (42) |
| 2020 | Using ML multiparametric MRI radiomics for PTC aggressiveness prediction | 22 ML algorithms including LR, SVM, GBC, etc. | 120 TNOD patients underwent MRI and hemi- or total thyroidectomy from 1 center | 1393 MRI features from 71 non-aggressive and 49 aggressive | LASSO feature selection + GBC: AUC 0.874 training and 0.915 testing | (43) |
| 2021 | Using ML for PTC central LNM prediction based on preoperative and intraoperative clinicopathological characteristics | LR, GBM, XGBoost, RF, DT, and ANN | T1-T2, cN0 PTC patients underwent thyroidectomy from 1 center | 619 central LNM- and 652 central LNM+ | AUROC 0.695 - 0.750 (XGBoost 0.750) | (44) |
| 2020 | Using ML for PTC central LNM prediction based on clinical characteristics and US features | RF, ANN, DT, GBDT, XGBoost, and AdaBoost | 22 variables from 1103 patients who underwent thyroidectomy from 1 center | 491 central LNM- and 612 central LNM+ | AUC 0.680-0.731 (GBDT: AUC 0.731) | (45) |
| 2022 | Using ML for lung metastasis prediction based on clinicopathological characteristics | SVM, LR, XGBoost, DT, RF, and k-NN | Demographical and clinicopathological data from 9950 thyroid cancer patients from the SEER database: TN stage, age, sex, race, laterality, year of diagnosis, histological type, and LM | 212 lungM+ and 9738 lungM- | RF: ACC 0.99, F1-score 0.72, and AUC 0.99 | (46) |
| 2021 | Using ML for bone metastasis prediction based on clinicopathological characteristics | LR, RF, AdaBoost, DT, NB, SVM | Demographical and clinicopathological characteristics from 17138 thyroid cancer patients from the SEER database: Marital status, insurance status, grade, etc. | 166 boneM+ and 16972 boneM- | RF: AUC 0.917 and ACC 0.904 | (47) |
| 2021 | Using ML for recurrence prediction based on EMRs | Inductive logic programming | Information extracted from EMRs of 783 patients with >= 5 years F/U after total thyroidectomy, central LN dissection, and RIAT from 1 center | 54 recurrences and 729 recurrence-free | ACC 71.4 | (48) |
| 2022 | Using ML for FTC prognosis prediction | XGBoost, LightGBM, RF, LR, AdaBoost, GaussianNB, KNN, SVM, and MLP | 11 variables from the SEER database: Region, surgical methods, lymphadenectomy, TNM stage, etc. | 6891 patients with FTC with a median F/U of 64 months | XGBoost: AUROC 0.886 | (49) |
| 2023 | Using AI and bioinformatics for LNM prediction by key genetic variations and endocrine-disrupting chemicals | Different R packages with LASSO, SVM, and RF | Immune cell abundance identifier (ImmuCellAI), genomics of drug sensitivity in cancer (GDSC), and Human Protein Atlas (HPA) databases | 12 hub genes: ERBB3, etc. | ERBB3 as a diagnostic marker for thyroid cancer (AUC = 0.89), high LNM potential (AUC = 0.75), and LNM+ (AUC = 0.86) | (50) |
| 2023 | Using ML for PTC prognosis prediction based on molecular identifiers | ML (HighLifeR) | 502 cases annotated by the Cancer Genome Atlas Project | 82 genes: BRAFV600E, RAS, EZH2-HOTAIR pathway mutations, etc. | Not reported | (51) |
| 2022 | Using AI for guided US therapeutic effect evaluation of thoracoscopic thyroidectomy on PTC | MVA, DAS | Patients diagnosed with PTC by imaging or FNAB | Experimental: 94 patients, control: 119 patients | P < 0.05, not reported | (52) |
| 2021 | Using DL for recurrent laryngeal nerve identification during thyroidectomy | CNN (ResNeXt50-32 × 4d) + Mask R-CNN | Various images of recurrent laryngeal nerve and surrounding tissues | 277 images of 130 patients | DSC 0.707 | (53) |
| 2024 | Using DL for parathyroid gland identification during endoscopic thyroidectomy | DL: YOLOX | 838 endoscopic thyroidectomy videos | 32482 images extracted from videos | P < 0.001, not reported | (54) |
| 2023 | Using AI for the severity of postoperative scars prediction | CNN (ResNet-50) + convolutional block attention module + t-SNE + Grad-CAM | Images of thyroidectomy scars and clinical data | 1283 patients | ROC-AUC 0.896 for imaging and 0.912 for imaging and clinical features | (55) |
| 2023 | Using DL for RAIT dosimetry optimization | ANN + adam optimizer | DTC underwent RAIT dosimetry, using images and blood sampling gathered from the initial 4, 24, and 48 hr post administration | 83 patients | P = 0.351, not reported | (56) |
| 2024 | Using ML for RAIT success prediction in low-risk PTC by clinical data and radiomics | LR with lasso and ridge | Characteristics of low-risk PTC patients who underwent total or near total thyroidectomy and RAIT: Age, sex, and pre-ablative serum Tg | 130 patients | AUC 0.78 | (57) |
| 2018 | Using ML for thyroid cancer patients' family members' radiation exposure dose estimation | ANN | Characteristics of RAI-treated TC patients: Age, gender, home area, education, BMI, release dose rate, administrated residual activity, etc. | 99 family members of 52 patients | ROC-AUC 0.957 | (58) |
| 2022 | Using AI for targeting AKT1 peptide design for ATC treatment | Not reported | Peptide synthesis datasets | 96 plates | IC50 18.2 mM in 8303C and 12.4 mM in 8505C cells | (59) |
| 2023 | Using AI for new therapeutic target identification (Kir5.1) | Deep docking (VirtualFlow) | Gene Expression Omnibus, Cancer Genome Atlas, and TCGA databases, etc. | 68 pairs of primary tumors and para-tumor tissues (6 benign, 36 PTC, and 26 PTMC) | Not reported | (60) |
Abbreviations: RF, random forest; SVM, support vector machine; ANN, artificial neural network; k-NN, k-nearest neighbor; NB, naive bayes; LR, logistic regression; LDA, linear discriminant analysis; CNN, convolutional neural network; GBM, gradient boosting machine; DT, decision tree; GBDT, gradient boosting decision tree; MLP, multilayer perceptron; MTL, multi-task learning; TL, transfer learning; RBFN, radial basis function network; ETC, extra tree classifier; MVA, minimum variance algorithm; DAS, delay-and-sum algorithm; TNOD, thyroid nodule; LNM, lymph node metastasis; PTMC, papillary thyroid microcarcinoma; PDTC, poorly differentiated thyroid carcinoma; DTC, differentiated thyroid carcinoma; TC, thyroid cancer; anaplastic thyroid cancer (ATC), anaplastic thyroid carcinoma; TBS, thyroid cytopathology Bethesda system; AUS, atypia of undetermined significance; FLUS, follicular lesion of undetermined significance; FNAB, fine needle-aspiration biopsy; DSC, dice similarity coefficient; ACC, accuracy; AUC, area under the curve; SROC, summary receiver operating characteristics ; ROC-AUC, area under the receiver operating characteristic curve.

