This study proposed and internally validated a radiomics-based machine-learning framework for fetal weight percentile classification using fetal ultrasound images acquired in the second and third trimesters. The workflow included image preprocessing, manual ROI segmentation of the head, abdomen, and femur, radiomics feature extraction, dimensionality reduction using PCA, and supervised classification using multiple machine-learning algorithms.
A key step in ultrasound image analysis is preprocessing because fetal ultrasound images are inherently affected by speckle noise and acquisition-related artifacts. In the present study, several commonly used filters were evaluated and compared using SNR and MSE. The findings suggested that a median filter with a small window size of 3 and wavelet-based denoising at levels 2 and 4 provided the most favorable noise reduction while preserving anatomical detail. These results are consistent with prior reports indicating that median filtering can effectively reduce noise and artifacts in medical images (
8).
Following preprocessing and manual segmentation, a large number of quantitative features were extracted using radiomics. Aggregation of features across the fetal head, abdomen, and femur resulted in a high-dimensional feature space of 1715 features per case. Because high-dimensional feature sets can increase the risk of overfitting, particularly when the sample size is limited, PCA was used to reduce dimensionality before model training.
Among the evaluated classifiers, the ensemble model achieved the best overall performance for fetal weight percentile classification. This observation aligns with previous studies reporting that ensemble or hybrid strategies may improve robustness in biomedical classification tasks by combining complementary decision rules. Prior work by Tao et al. (
9) and International Conference on Communications (
10) similarly demonstrated that performance may vary substantially across algorithms and that combined approaches can yield competitive results in fetal growth-related prediction tasks.
Despite these promising findings, several limitations should be acknowledged. First, this retrospective, single-center study included a moderate sample size of 200 cases, which may limit generalizability. Second, ROIs were delineated manually, which may introduce observer dependence and reduce scalability. Third, model evaluation relied on random 10-fold cross-validation without external validation; therefore, the reported performance should be interpreted as internal validation. Finally, given the multiclass setting and potential class imbalance, reporting additional metrics, such as macro-F1, balanced accuracy, and class-wise sensitivity/specificity, would provide a more complete assessment than accuracy alone.
Accordingly, the proposed approach should be considered a decision-support tool rather than a clinically deployable system. Future work should focus on external validation using independent multicenter datasets, evaluating stratified validation protocols, and investigating semiautomated or automated segmentation approaches to improve reproducibility and clinical feasibility.
5.1. Limitations and Future Directions
Several limitations of the current study warrant acknowledgment. First, the retrospective, single-center design using a dataset of 200 cases may limit the generalizability of the findings. Although rigorous random 10-fold cross-validation was employed for internal model assessment, external validation using independent datasets from diverse patient populations and clinical settings is crucial to confirm the model’s robustness and performance in real-world scenarios. Future research should prioritize the collection and analysis of such external data to further validate the proposed radiomics-based framework.
Second, manual segmentation of ROIs, although performed under expert supervision, introduces interobserver variability. Efforts to develop and validate semiautomated or fully automated segmentation algorithms could enhance reproducibility and scalability, thereby facilitating wider clinical adoption.
Third, the current model serves as a decision-support tool. Although promising, direct translation into routine clinical practice requires further prospective validation, integration into clinical workflows, and rigorous assessment of its impact on clinical decision-making and patient outcomes.
5.2. Conclusions
This study demonstrated the potential of radiomics analysis of fetal ultrasound images for fetal weight percentile classification. A radiomics-driven machine-learning pipeline applied to fetal ultrasound images classified fetal weight percentile categories with encouraging internal validation performance. Among the evaluated models, the ensemble classifier showed the best results, supporting the potential utility of ensemble learning for ultrasound-based fetal growth assessment, pending further external validation.