Computed Tomography-Based Radiomics Analysis of Different Machine Learning Approaches for Differentiating Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor

An-Lin Zhang; Yan-Mei Fu; Zhi-Yang He

doi:10.5812/ijradiol-139568

IJ Radiology

An Innovative Journal in the Field of Radiology

Home

Current Issue All Issues In Press Search Accepted Manuscripts

Instructions

Journal Information Editors and Boards Indexing and Listing Sources Journal Metrics Publication Ethics and Malpractice Statement Reviewer and AE Registration Form Support Contact Us Open Peer Review (OPR)

APC

Authors Guide Submit Manuscript

Image Credit:I J Radiol

https://doi.org/10.5812/ijradiol-139568

Computed Tomography-Based Radiomics Analysis of Different Machine Learning Approaches for Differentiating Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor

Author(s):

An-Lin Zhang

^1,*,

Yan-Mei Fu¹,

Zhi-Yang He²

1The Affiliated Hospital of Panzhihua University, Panzhihua City, China

2The First Affiliated Hospital of Chongqing Medical University, Chongqing, China

IJ Radiology:Vol. 20, issue 4; e139568

Published online:Apr 02, 2024

Article type:Research Article

Received:Aug 10, 2023

Accepted:Mar 02, 2024

How to Cite:Zhang A, Fu Y, He Z. Computed Tomography-Based Radiomics Analysis of Different Machine Learning Approaches for Differentiating Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor.I J Radiol.2024;20(4):e139568.https://doi.org/10.5812/ijradiol-139568.

Abstract

Background:

Differentiating between pulmonary sarcomatoid carcinoma (PSC) and pulmonary inflammatory pseudotumor (PIP) is challenging using current conventional diagnostic methods. This lack of distinction significantly impacts subsequent clinical treatment decisions.

Objectives:

This study was conducted to construct an effective method to distinguish between PSC and PIP based on commonly used computed tomography (CT) images.

Patients and Methods:

A total of 14 patients with PSC and 76 patients with PIP were retrospectively included in the study for CT imaging. Radiomics features were extracted from non-enhanced CT images, and canonical correlation analysis was performed to reduce redundancy. The final radiomics signature was then identified using the least absolute shrinkage and selection operator (LASSO). Logistic regression (LR), classification and regression trees (CART), support vector machine (SVM), k-nearest neighbors (KNN), and gradient boosting machine (GBM) were used to construct the radiomics models. The performance of these different radiomics models was evaluated using the receiver operating characteristic curve.

Results:

A total of 1186 radiomics features were extracted from non-enhanced CT images. After dimensionality reduction and selection, 7 valuable features were identified. The performance of 5 machine learning models was evaluated to differentiate between PSC and PIP, and the GBM-based radiomics model demonstrated the best performance. The GBM-based radiomics model achieved an accuracy of 0.922, area under the curve (AUC) of 0.98, F1 score of 0.967, and log loss of 0.161. Compared to conventional clinical-radiological diagnosis, the GBM-based radiomics model showed a significant association (odds ratio [OR] = 8.119; P = 0.006).

Conclusion:

The implementation of the GBM-based radiomics model has the potential to improve the ability to differentiate between PSC and PIP, thereby influencing the timeliness of subsequent surgical interventions and even the prognosis of patients.

Keywords

CT-Based Radiomics

Pulmonary Sarcomatoid Carcinoma

Pulmonary Inflammatory Pseudotumor

1. Background

Pulmonary sarcomatoid carcinoma (PSC) is a rare and aggressive subtype of non-small cell lung cancer (NSCLC) characterized by the presence of both carcinomatous and sarcomatous components. It represents a small proportion of lung malignancies, accounting for approximately 0.5% (1). Pulmonary sarcomatoid carcinomas are highly heterogeneous and malignant tumors that exhibit resistance to radiotherapy and chemotherapy, posing challenges in identifying effective treatment regimens (2, 3). Clinical guidelines and expert consensus statements specifically addressing PSC may have limitations (4, 5). Typically, management strategies for PSC align with those established for NSCLC. Surgical resection is the primary treatment modality whenever feasible, and it has shown significant improvement in overall survival for early-stage, operable PSC cases, with 5-year survival rates ranging from 11% to 19.5% (6-8).

Conventional imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) provide insights into tumor extent, lymph node involvement, and distant metastases. However, the radiological features of PSC are not entirely specific and can resemble other subtypes of NSCLC or even benign entities (9). Therefore, a comprehensive pathological assessment is necessary for a definitive diagnosis. Early detection and diagnosis of PSC are crucial for improving patient outcomes.

Pulmonary inflammatory pseudotumor (PIP) is a non-neoplastic lesion characterized by excessive cell proliferation (10). However, on CT scans, PIPs often present as solid pulmonary masses, which can be potentially misinterpreted as malignancies such as PSCs. This poses a diagnostic challenge and can influence subsequent clinical treatments, including the decision for surgery. Currently, certain CT features have been used to differentiate PIPs from peripheral lung cancers (11, 12), aiming to distinguish between benign and malignant masses based on CT images. However, there is a lack of literature investigating the differentiation of PIPs and PSCs based on clinical and radiological characteristics. This knowledge gap hinders our understanding of these rare diseases and their differentiation ability and impacts the timeliness of surgical interventions and prognosis.

Radiomics has emerged as a promising technique for the in-depth analysis of quantitative features extracted from medical images that are not easily identified by the naked eye. It enhances diagnostic efficiency and reduces misdiagnosis rates (13), particularly for lung nodules and solid masses' diagnosis and prognosis. Machine learning, an advanced technique in computer science, uses algorithms to identify patterns within large datasets and make data-driven predictions or decisions (14). Although limited reports exist on the application of machine learning to differentiate between PSCs and PIPs (15-17), there is a clinical imperative to proactively distinguish between them through noninvasive CT scans, especially during the initial diagnosis phase.

2. Objectives

This study aimed to assess the feasibility of using radiomics-based machine learning methods to differentiate between PIPs and PSCs based on CT images. The reliability and generalizability of radiomics models rely on their accuracy, which necessitates comparing different machine learning approaches. This evaluation will support the effectiveness and timeliness of clinical treatment decisions in the next steps. By focusing on specific radiomics features within these lesions, we aimed to improve diagnostic accuracy compared to methods based solely on clinical and radiological features. Additionally, our study aimed to provide a preliminary reference for future research on this rare disease.

3. Patients and Methods

3.1. Patient Population

This retrospective, single-center study was conducted with institutional review board approval (2023HR23), and written consent was waived due to its retrospective nature. The study included 113 patients who were enrolled between March 2013 and August 2022 at our hospital. Among them, 21 patients had PSCs, and 92 patients had PIPs. We specifically recruited patients who had their first chest CT images taken at our hospital before treatment. Inclusion criteria were (1) non-contrast chest CT scans with thin-slice sections measuring 0.625 - 1.25 mm; (2) lesions with a maximum diameter equal to or greater than 1 cm; and (3) confirmed pathological diagnosis. Patients were excluded if they had received preoperative chemotherapy, radiotherapy, or chemoradiation treatment, had been diagnosed with chronic systemic diseases or other malignancies, or if the quality of their thoracic CT images was unsatisfactory. Ultimately, a total of 90 patients (27 males; median age, 58.1 years) were included in the study, comprising 14 patients with PSCs and 76 patients with PIPs. A flowchart illustrating the patient recruitment process is provided in Appendix 1 of the Supplementary File.

3.2. Computed Tomography Technique

Dual-source CT scanning (Somatom Definition, Siemens Medical Solutions, Germany) was performed on all subjects without the use of contrast. Thin-slice CT scans were acquired in a craniocaudal direction, covering the entire lung from the apex to the base. The following parameters were used for the chest CT examination: Detector collimation ranging from 1 - 5 mm, beam pitch ranging from 0.75 - 1.75 mm, tube voltage of 100 - 120 kV, automatic tube current modulation, detector collimation of 128 × 0.6 mm, and image matrix of 512 × 512. The reconstructed images had a slice thickness of 0.625 - 1.250 mm and were presented with window levels of 1600, 600, 350, and 35 HU for visualizing the lung and mediastinal anatomy.

3.3. Radiological and Clinical Data Acquisition

The CT images and relevant clinical data were obtained from picture archiving and communication systems, as well as the electronic medical records of the patients. The collected clinical parameters included age, sex, smoking status, prolonged fever, cough history, sputum, bloody sputum, hemoptysis, pleural pain, asymptomatic status, and white blood cell count. Two experienced radiologists, one with 15 years and the other with 20 years of experience in CT diagnosis, independently reviewed and evaluated the chest CT images of all subjects without access to their clinical data. The following radiological characteristics were recorded: Location, maximum diameter of the lesion, boundary of the lesion, shape of the lesion, lobulation sign, burr signs, vacuole sign, air bronchial sign, necrotic zone, calcifications, halo sign, satellite lesions, interlobular septal thickening, pleural indentation, pleural effusion, and mediastinal or hilar lymphadenopathy. To ensure the reproducibility of the results, a subset of 50 subjects was randomly selected and re-evaluated by the same observer after a 1-month interval.

3.4. Radiomics Feature Acquisition

The segmentation of the pulmonary lesions was performed by a radiologist specialized in chest CT diagnosis with a decade of experience. For this task, the radiologist used ITK-SNAP software version 3.8.0. The contouring process involved delineating the complete tumor margin along the lesion edge. Prior to the analysis, image stacks were manually masked to remove extraneous structures outside of the alveolar cavity, including blood vessels and large airways.

The radiomics features were extracted from the medical images using PyRadiomics version 3.0, an open-source Python package for radiomics feature extraction. A flowchart illustrating the process of building the radiomics model is shown in Figure 1.

Figure 1.

The flowchart for building the radiomics model

A range of features was included for further analysis, including first-order statistics, intensity histogram statistics, shape descriptors (both 2-D and 3-D), and texture features such as gray level dependence matrix and gray level size zone matrix.

Initially, the radiomics features underwent standard score (z score) transformation to address dimensional heterogeneity caused by variations in value scales among features. Subsequently, correlation analysis and the least absolute shrinkage and selection operator (LASSO) method were used to reduce the dimensionality of the extracted features. Pearson correlation analysis was applied to normally distributed data, while Spearman correlation analysis was used for non-normally distributed data. Features with a correlation coefficient exceeding 0.9 were removed from further analyses. The study ultimately implemented a LASSO regression model with 5-fold cross-validation to select radiomics features with nonzero coefficients. A comprehensive diagram of the methodology is provided in Figure 1.

3.5. Development of Machine Learning Models

After successfully reducing the dimensionality of radiomics features using the LASSO method, our focus shifted to constructing robust radiomics models with the primary goal of identifying a classifier capable of exceptional recognition within the given database. The careful selection of these features is of immense significance, as they are believed to play a pivotal role in distinguishing and precisely classifying between PIPs and PSCs. To ensure the effectiveness and reliability of our approach, we incorporated a comprehensive array of machine learning algorithms renowned for their popularity and success in classification tasks (18, 19). Specifically, we used 5 well-established models: Logistic regression (LR), classification and regression trees (CART), support vector machine (SVM), k-nearest neighbors (KNN), and gradient boosting machine (GBM). By integrating multiple algorithms, our intention was to encompass diverse perspectives and exploit the strengths of each model in accurately discerning tumor patterns. Through meticulous comparison of the performance of these algorithmic approaches, we aimed to identify an optimal classifier model that demonstrates exceptional accuracy and predictive power in recognizing PIPs and PSCs. This rigorous evaluation process facilitated making informed decisions regarding the most suitable model for our study, ultimately enhancing the reliability and robustness of our findings. The “caret” R package was used to explain these 5 machine learning approaches. Thirdly, during the model training process, a control object was defined using 10-fold cross-validation, and the summary function was set to “twoClassSummary,” enabling the retrieval of predicted probabilities for different models. Subsequently, LR, SVM, CART, KNN, and GBM were trained using the corresponding functions “glm,” “svmLinear,” “rpart,” “knn,” and “gbm,” respectively. The diagnostic performance of each model was evaluated based on metrics such as area under the curve (AUC), accuracy (ACC), F1 score, and log loss using receiver operating characteristic curves (ROC) and confusion matrices. The optimal radiomics model was selected based on its superior performance.

3.6. Statistical Analysis

For the statistical analysis, we used R software version 3.6.3 and Python software version 3.5.6. To evaluate interobserver variability and reliability, we used the intragroup correlation coefficient (ICC). An ICC value greater than 0.8 indicated high repeatability (20). Student's t test was used for continuous variables, and the results were reported as mean ± SD. Categorical variables were analyzed using either the chi-square test or Fisher's exact test, and the results were presented as ratios. Subsequently, a multivariable LR analysis was performed on the selected statistically significant features to identify the ultimate predictor variables for model development. A 2-tailed P value less than 0.05 was considered statistically significant.

4. Results

4.1. Patients’ Population and Radiological Characteristics

In our study, a total of 90 patients were enrolled, including 14 with PSC and 76 with PIP. Detailed clinical and radiological features are presented in Tables 1 and 2.

Table 1.Comparison of Demographic and Clinical Data Between Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor Patients ^a

Variables	Total (n = 90)	PIP (n = 76)	PSC (n = 14)	P-Value
Age (y)	58.1 ± 9.2	57.3 ± 9.3	62.4 ± 7.2	0.030
Sex				0.056
Female	63 (70)	50 (65.8)	13 (92.9)
Male	27 (30)	26 (34.2)	1 (7.1)
Smoking status				0.009
Never	38 (42.2)	37 (48.7)	1 (7.1)
Smoker	52 (57.8)	39 (51.3)	13 (92.9)
Prolonged fever				0.027
No	77 (85.6)	68 (89.5)	9 (64.3)
Yes	13 (14.4)	8 (10.5)	5 (35.7)
Cough history				1.000
No	21 (23.3)	18 (23.7)	3 (21.4)
Yes	69 (76.7)	58 (76.3)	11 (78.6)
Sputum				0.573
No	35 (38.9)	31 (40.8)	4 (28.6)
Yes	55 (61.1)	45 (59.2)	10 (71.4)
Bloody sputum				0.058
No	61 (67.8)	55 (72.4)	6 (42.9)
Yes	29 (32.2)	21 (27.6)	8 (57.1)
Hemoptysis				0.286
No	72 (80)	59 (77.6)	13 (92.9)
Yes	18 (20)	17 (22.4)	1 (7.1)
Pleural pain				1.000
No	70 (77.8)	59 (77.6)	11 (78.6)
Yes	20 (22.2)	17 (22.4)	3 (21.4)
Asymptomatic				0.651
No	10 (11.1)	8 (10.5)	2 (14.3)
Yes	80 (88.9)	68 (89.5)	12 (85.7)
Total white blood cell count	6.6 (5.2, 9.3)	6.6 (5.3, 9.2)	6.6 (5, 10)	0.881

Comparison of Demographic and Clinical Data Between Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor Patients ^a

Table 2.Comparison of Radiological Findings Between Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor Patients

Variables	Total (n = 90)	PIP (n = 76)	PSC (n = 14)	P-Value
Location				0.943
Upper lobe	49 (54.4)	42 (55.3)	7 (50)
Middle lobe and lower lobe	41 (45.6)	34 (44.7)	7 (50)
Boundary of the lesion				1.000
Well-defined	35 (38.9)	30 (39.5)	5 (35.7)
Ill-defined	55 (61.1)	46 (60.5)	9 (64.3)
Lobulation sign				0.084
No	48 (53.3)	44 (57.9)	4 (28.6)
Yes	42 (46.7)	32 (42.1)	10 (71.4)
Burr signs				1.000
No	64 (71.1)	54 (71.1)	10 (71.4)
Yes	26 (28.9)	22 (28.9)	4 (28.6)
Vacuole sign				0.841
No	44 (48.9)	38 (50)	6 (42.9)
Yes	46 (51.1)	38 (50)	8 (57.1)
Air bronchial sign				0.156
No	52 (57.8)	41 (53.9)	11 (78.6)
Yes	38 (42.2)	35 (46.1)	3 (21.4)
Necrotic zone				0.406
No	38 (42.2)	34 (44.7)	4 (28.6)
Yes	52 (57.8)	42 (55.3)	10 (71.4)
Calcifications				1.000
No	82 (91.1)	69 (90.8)	13 (92.9)
Yes	8 (8.9)	7 (9.2)	1 (7.1)
Halo sign				0.637
No	47 (52.2)	41 (53.9)	6 (42.9)
Yes	43 (47.8)	35 (46.1)	8 (57.1)
Satellite lesions				0.211
No	61 (67.8)	49 (64.5)	12 (85.7)
Yes	29 (32.2)	27 (35.5)	2 (14.3)
Interlobular septal thickening				0.133
No	61 (67.8)	54 (71.1)	7 (50)
Yes	29 (32.2)	22 (28.9)	7 (50)
Pleural indentation				0.005
No	68 (75.6)	62 (81.6)	6 (42.9)
Yes	22 (24.4)	14 (18.4)	8 (57.1)
Pleural effusion				0.483
No	71 (78.9)	61 (80.3)	10 (71.4)
Yes	19 (21.1)	15 (19.7)	4 (28.6)
Mediastinal or hilar lymphadenopathy				< 0.001
No	63 (70)	60 (78.9)	3 (21.4)
Yes	27 (30)	16 (21.1)	11 (78.6)
Maximum diameter of the lesion (mm)	44.7 (38.2, 60.9)	44.5 (38.2, 56.8)	62 (40.2, 80.5)	0.146

Comparison of Radiological Findings Between Pulmonary Sarcomatoid Carcinoma and Pulmonary Inflammatory Pseudotumor Patients

Our results showed significant differences between PSC and PIP patients in terms of pleural indentation, mediastinal or hilar lymphadenopathy, age, smoking status, and prolonged fever.

4.2. Machine Learning Models and Performances

In terms of image interpretation, the consistency among the observers was assessed using the ICC, which ranged from 0.977 to 1.000. These significant ICC values indicate a strong consensus among the observers in accurately interpreting the images. Initially, a total of 1186 radiomics features were extracted from plain CT scans. To prioritize the most informative and relevant features, some techniques for feature reduction and selection were employed. Through this process, 7 features with nonzero coefficients that demonstrated statistical significance were identified (Figure 2).

Figure 2.

Radiomics features with nonzero coefficients remained via the least absolute shrinkage and selection operator.

These 7 features were selected based on their statistical significance. Subsequently, the performance of 5 different radiomics models (LR, SVM, GBM, KNN, and CART) was assessed. The diagnostic potential of each model and their ability to differentiate between PIP and PSC were evaluated using ROC curve analysis. Figure 3 displays the ROC curves of all 5 models, highlighting their respective performance.

Figure 3.

The receiver operating characteristic curves of all machine models

To further evaluate the diagnostic efficacy of the models, multiple evaluation metrics (including AUC, F1 score, and log loss) were considered. Among the investigated models, GBM exhibited the highest level of diagnostic accuracy, with an AUC value of 0.980, F1 score of 0.967, and log loss of 0.161, as presented in Table 3 and Figure 4.

Table 3.Comparison of Diagnostic Performance Among Different Models

Model	ACC	F1 Score	AUC	Log Loss
SVM	0.878	0.938	0.914	0.273
CART	0.900	0.939	0.876	0.255
GBM	0.922	0.967	0.980	0.161
KNN	0.833	0.919	0.875	0.285
LR	0.878	0.942	0.915	0.242

Comparison of Diagnostic Performance Among Different Models

Figure 4.

Diagnostic efficacy analysis of all machine models

4.3. Multivariable Logistic Regression Analysis for the Combined Model

Our study aimed to develop a combined model that integrates both the radiomics model and clinical-radiological features, including age, pleural indentation, mediastinal or hilar lymphadenopathy, smoking status, and prolonged fever. Logistic regression analysis demonstrated that only the GBM-based radiomics model served as a significant factor (odds ratio [OR] = 8.119; 95% CI, 3.761-9.079; P = 0.006; Table 4).

Table 4.Multivariate Logistic Regression Analysis to Discriminate Between Pulmonary Inflammatory Pseudotumor and Pulmonary Sarcomatoid Carcinoma

Variables	B	Wald	OR (95% CI)	P-Value
Age	0.12	0.184	1.127 (0.676 ~ 3.108)	0.668
Smoking status	-0.149	0.001	0.862 (0 ~ 131)	0.981
Fever	2.225	0.001	9.252 (0.044 ~ 1588.911)	0.979
Mediastinal or hilar lymphadenopathy	8.034	1.287	3084.472 (4.468 ~ 10807)	0.257
Pleural indentation	7.048	1.077	1150.752 (0.915 ~ 1541)	0.299
GBM model	34.33	3.388	8.119 (3.761 ~ 9.079)	0.006

Multivariate Logistic Regression Analysis to Discriminate Between Pulmonary Inflammatory Pseudotumor and Pulmonary Sarcomatoid Carcinoma

5. Discussion

We conducted an investigation to assess the efficacy of CT-based radiomics models in differentiating between PSCs and PIPs. Our findings highlighted that the GBM-based radiomics model exhibited superior diagnostic accuracy compared to other machine learning approaches. Furthermore, we developed an integrated CT-based model that synergistically incorporated clinical predictors and GBM radiomics model to differentiate between PSCs and PIPs, leading to promising differentiated outcomes.

In our investigation, various clinical features were observed in the patients. Notably, patients with PSC exhibited pleural indentation, advanced age, mediastinal or hilar lymphadenopathy, and a history of smoking. Conversely, patients with PIP commonly presented with persistent fever. Accumulating evidence suggests an association between mediastinal lymphadenopathy and smoking history in the development of PSC. Recent studies have reported mediastinal lymph node involvement in nearly half of the patients at the time of diagnosis, indicating a high likelihood of lymphatic spread in this condition. Additionally, the presence of mediastinal lymph node metastasis has been correlated with an increased risk of distant metastasis and reduced overall survival. These findings underscore the importance of early detection and accurate staging of lymph node involvement in the management of PSC (21, 22). Zhao et al. (23) demonstrated a higher prevalence of smoking history in patients with PSC and its close association with poor prognosis. These findings emphasize the critical role of smoking cessation in reducing the incidence and mortality of PSC. On the other hand, a study by Kim et al. also showed that fever was a common symptom associated with PIP and can be a useful feature in guiding diagnosis (24). This finding is consistent with the results of our study.

Currently, there are relatively few studies on the imaging characteristics of pulmonary pleomorphic carcinoma. In contrast to PIPs, PSCs often exhibit radiological characteristics accompanied by pleural indentation, which is a common feature.

In this study, we conducted a radiomics-based analysis to differentiate between PSCs and PIPs. We used logistic multivariable analysis to integrate a GBM-based radiomics model with clinical predictors. This combination was based on the potential synergy between the quantitative information captured by radiomics and the valuable insights provided by clinically relevant predictors. By combining these factors, our model offers a more comprehensive understanding of the underlying mechanisms and characteristics associated with each tumor type, resulting in improved overall performance. However, the LR analysis revealed that only the GBM-based radiomics model served as an associate factor. Radiomics, as an intelligent calculation-based and non-invasive approach, uses original images to construct models, enabling the extraction of additional information that can reflect potentially relevant phenotypic features based on tumor heterogeneity. This approach provides valuable insights for both diagnosis and prognosis (15, 16).

In this study, the GBM-based radiomics model exhibited the highest diagnostic performance, as indicated by its exceptional values in terms of ACC, AUC, and F1 score, while achieving the lowest log loss value. Gradient boosting machine has established itself as a powerful machine learning algorithm, offering several advantages that make it well-suited for radiomics analysis. First, GBM operates as an ensemble model, iteratively combining multiple weak learners (decision trees). This iterative process allows GBM to learn from misclassified cases and focus on more intricate samples, thereby enhancing its overall performance. Moreover, GBM possesses a remarkable capacity to handle complex interactions between predictors through gradient optimization. Radiomics features often capture intricate patterns and textures within medical images, which can prove challenging to interpret using simple linear models like LR. Gradient boosting machine’s inherent ability to capture nonlinear relationships and interactions between features enables it to adapt to the complex and heterogeneous nature of radiomics data. Additionally, GBM incorporates regularization techniques such as shrinkage and feature subsampling, effectively mitigating overfitting and bolstering the model's generalizability. Regularization helps avert the risk of erroneously identifying spurious or irrelevant features, thereby yielding a more robust and accurate prediction model in radiomics analysis.

The findings of the present study are consistent with those of previous studies. For instance, a study reported that the GBM algorithm outperformed LR in predicting early gastric cancer (25). Another study demonstrated that GBM, when combined with dimensionality reduction techniques, achieved higher accuracy and AUC values compared to other machine learning models for predicting peritoneal metastasis of gastric cancer (26). Overall, GBM remains widely employed in the field of machine learning due to its prowess in handling complex data and generating precise predictions.

This study had several limitations that should be acknowledged. First, it was conducted retrospectively in a single center, which may have resulted in potential selection bias and limited the generalizability of the findings to other settings or populations. Second, the study was constrained by a small sample size due to the low incidence of PSC, making it challenging to collect an adequate number of medical images. To further validate the results, it is crucial to prospectively recruit a larger number of patients in future studies. It is worth noting that the differentiation between PSC and PIP has received limited attention in the existing literature, particularly regarding CT-based radiomics analysis, which could extract more image information for differentiation. Therefore, this preliminary study serves as a foundation for future research in this area. Third, the lack of directly related pathological tissue samples hindered the ability to verify the findings and limited the capacity to test hypotheses surrounding this matter, relying exclusively on prior reports instead.

In conclusion, our study suggests that the radiomics model based on GBM demonstrates exceptional diagnostic performance compared to conventional diagnostic methods. This highlights its potential utility as a differentiation approach for distinguishing between PSCs and PIPs. These findings have significant implications for informing future clinical and surgical strategies.

Footnotes

Authors' Contribution: Study concept and design: A. L. Zhang and Y. M. Fu; analysis and interpretation of data: A. L. Zhang, Y. M. Fu, and Z. Y. He; drafting of the manuscript: A. L. Zhang and Y. M. Fu; critical revision of the manuscript for important intellectual content: A. L. Zhang; statistical analysis: A. L. Zhang, Y. M. Fu, and Z. Y. He.
Conflict of Interests: The authors declare no conflict of interest.
Data Availability: The dataset presented in the study is available on request from the corresponding author during submission or after publication.
Ethical Approval: This retrospective, single-center study was conducted with institutional review board approval (2023HR23), and written consent was waived due to its retrospective nature.
Funding/Support: The authors state that this work has not received any funding.

References

1.
Steuer CE, Behera M, Liu Y, Fu C, Gillespie TW, Saba NF, et al. Pulmonary Sarcomatoid Carcinoma: An Analysis of the National Cancer Data Base. Clin Lung Cancer. 2017;18(3):286-92. [PubMed ID: 28043773]. https://doi.org/10.1016/j.cllc.2016.11.016.
2.
Li X, Wu D, Liu H, Chen J. Pulmonary sarcomatoid carcinoma: progress, treatment and expectations. Ther Adv Med Oncol. 2020;12. [PubMed ID: 32922522]. [PubMed Central ID: PMC7450456]. https://doi.org/10.1177/1758835920950207.
3.
Yang Z, Xu J, Li L, Li R, Wang Y, Tian Y, et al. Integrated molecular characterization reveals potential therapeutic strategies for pulmonary sarcomatoid carcinoma. Nat Commun. 2020;11(1):4878. [PubMed ID: 32985499]. [PubMed Central ID: PMC7522294]. https://doi.org/10.1038/s41467-020-18702-3.
4.
Zhang L, Lin W, Yang Z, Li R, Gao Y, He J. Multimodality Treatment of Pulmonary Sarcomatoid Carcinoma: A Review of Current State of Art. J Oncol. 2022;2022:8541157. [PubMed ID: 35368903]. [PubMed Central ID: PMC8975648]. https://doi.org/10.1155/2022/8541157.
5.
Wang F, Yu X, Han Y, Gong C, Yan D, Yang L, et al. Chemotherapy for advanced pulmonary sarcomatoid carcinoma: a population-based propensity score matching study. BMC Pulm Med. 2023;23(1):262. [PubMed ID: 37454075]. [PubMed Central ID: PMC10350265]. https://doi.org/10.1186/s12890-023-02541-1.
6.
Lin Y, Yang H, Cai Q, Wang D, Rao H, Lin S, et al. Characteristics and Prognostic Analysis of 69 Patients With Pulmonary Sarcomatoid Carcinoma. Am J Clin Oncol. 2016;39(3):215-22. [PubMed ID: 25068469]. https://doi.org/10.1097/COC.0000000000000101.
7.
Xie Y, Lin Z, Shi H, Sun X, Gu L. The Prognosis of Pulmonary Sarcomatoid Carcinoma: Development and Validation of a Nomogram Based on SEER. Technol Cancer Res Treat. 2022;21. [PubMed ID: 35730203]. [PubMed Central ID: PMC9228655]. https://doi.org/10.1177/15330338221109647.
8.
Sun L, Dai J, Chen Y, Duan L, He W, Chen Q, et al. Pulmonary Sarcomatoid Carcinoma: Experience From SEER Database and Shanghai Pulmonary Hospital. Ann Thorac Surg. 2020;110(2):406-13. [PubMed ID: 32268141]. https://doi.org/10.1016/j.athoracsur.2020.02.071.
9.
Tang W, Wen C, Pei Y, Wu Z, Zhong J, Peng J, et al. Preoperative CT findings and prognosis of pulmonary sarcomatoid carcinoma: comparison with conventional NSCLC of similar tumor size. BMC Med Imaging. 2023;23(1):105. [PubMed ID: 37580691]. [PubMed Central ID: PMC10424330]. https://doi.org/10.1186/s12880-023-01065-8.
10.
Cerfolio RJ, Allen MS, Nascimento AG, Deschamps C, Trastek VF, Miller DL, et al. Inflammatory pseudotumors of the lung. Ann Thorac Surg. 1999;67(4):933-6. [PubMed ID: 10320231]. https://doi.org/10.1016/s0003-4975(99)00155-1.
11.
Wang XL, Shan W. Application of dynamic CT to identify lung cancer, pulmonary tuberculosis, and pulmonary inflammatory pseudotumor. Eur Rev Med Pharmacol Sci. 2017;21(21):4804-9. [PubMed ID: 29164583].
12.
Liu C, Ma C, Duan J, Qiu Q, Guo Y, Zhang Z, et al. Using CT texture analysis to differentiate between peripheral lung cancer and pulmonary inflammatory pseudotumor. BMC Med Imaging. 2020;20(1):75. [PubMed ID: 32631330]. [PubMed Central ID: PMC7339470]. https://doi.org/10.1186/s12880-020-00475-2.
13.
Mayerhoefer ME, Materka A, Langs G, Haggstrom I, Szczypinski P, Gibbs P, et al. Introduction to Radiomics. J Nucl Med. 2020;61(4):488-95. [PubMed ID: 32060219]. [PubMed Central ID: PMC9374044]. https://doi.org/10.2967/jnumed.118.222893.
14.
Wang S, Summers RM. Machine learning and radiology. Med Image Anal. 2012;16(5):933-51. [PubMed ID: 22465077]. [PubMed Central ID: PMC3372692]. https://doi.org/10.1016/j.media.2012.02.005.
15.
Hassani C, Varghese BA, Nieva J, Duddalwar V. Radiomics in Pulmonary Lesion Imaging. AJR Am J Roentgenol. 2019;212(3):497-504. [PubMed ID: 30620678]. https://doi.org/10.2214/AJR.18.20623.
16.
Thawani R, McLane M, Beig N, Ghose S, Prasanna P, Velcheti V, et al. Radiomics and radiogenomics in lung cancer: A review for the clinician. Lung Cancer. 2018;115:34-41. [PubMed ID: 29290259]. https://doi.org/10.1016/j.lungcan.2017.10.015.
17.
Parmar C, Grossmann P, Bussink J, Lambin P, Aerts HJWL. Machine Learning methods for Quantitative Radiomic Biomarkers. Sci Rep. 2015;5:13087. [PubMed ID: 26278466]. [PubMed Central ID: PMC4538374]. https://doi.org/10.1038/srep13087.
18.
Shehab M, Abualigah L, Shambour Q, Abu-Hashem MA, Shambour MKY, Alsalibi AI, et al. Machine learning in medical applications: A review of state-of-the-art methods. Comput Biol Med. 2022;145:105458. [PubMed ID: 35364311]. https://doi.org/10.1016/j.compbiomed.2022.105458.
19.
Li J, Zhu Y, Dong Z, He X, Xu M, Liu J, et al. Development and validation of a feature extraction-based logical anthropomorphic diagnostic system for early gastric cancer: A case-control study. EClinicalMedicine. 2022;46:101366. [PubMed ID: 35521066]. [PubMed Central ID: PMC9061989]. https://doi.org/10.1016/j.eclinm.2022.101366.
20.
Wang B, Xu Y, Wan P, Shao S, Zhang F, Shao X, et al. Right Atrial Fluorodeoxyglucose Uptake Is a Risk Factor for Stroke and Improves Prediction of Stroke Above the CHA(2)DS(2)-VASc Score in Patients With Atrial Fibrillation. Front Cardiovasc Med. 2022;9:862000. [PubMed ID: 35872918]. [PubMed Central ID: PMC9304590]. https://doi.org/10.3389/fcvm.2022.862000.
21.
Mochizuki T, Ishii G, Nagai K, Yoshida J, Nishimura M, Mizuno T, et al. Pleomorphic carcinoma of the lung: clinicopathologic characteristics of 70 cases. Am J Surg Pathol. 2008;32(11):1727-35. [PubMed ID: 18769330]. https://doi.org/10.1097/PAS.0b013e3181804302.
22.
Sun J, Jiang Z, Shan T, Yang R, Kong D, Rui J, et al. Characteristics and Prognostic Analysis of 55 Patients With Pulmonary Sarcomatoid Carcinoma. Front Oncol. 2022;12:833486. [PubMed ID: 35592676]. [PubMed Central ID: PMC9113756]. https://doi.org/10.3389/fonc.2022.833486.
23.
Zhao C, Gao S, Xue Q, Tan F, Gao Y, Mao Y, et al. Clinical characteristics and prognostic factors of pulmonary sarcomatoid carcinoma. J Thorac Dis. 2022;14(10):3773-81. [PubMed ID: 36389311]. [PubMed Central ID: PMC9641323]. https://doi.org/10.21037/jtd-22-393.
24.
Kim JH, Cho JH, Park MS, Chung JH, Lee JG, Kim YS, et al. Pulmonary inflammatory pseudotumor--a report of 28 cases. Korean J Intern Med. 2002;17(4):252-8. [PubMed ID: 12647641]. [PubMed Central ID: PMC4531693]. https://doi.org/10.3904/kjim.2002.17.4.252.
25.
Lee HD, Nam KH, Shin CM, Lee HS, Chang YH, Yoon H, et al. Development and Validation of Models to Predict Lymph Node Metastasis in Early Gastric Cancer Using Logistic Regression and Gradient Boosting Machine Methods. Cancer Res Treat. 2023;55(4):1240-9. [PubMed ID: 36960625]. [PubMed Central ID: PMC10582533]. https://doi.org/10.4143/crt.2022.1330.
26.
Zhou C, Wang Y, Ji MH, Tong J, Yang JJ, Xia H. Predicting Peritoneal Metastasis of Gastric Cancer Patients Based on Machine Learning. Cancer Control. 2020;27(1):1073274820968900. [PubMed ID: 33115287]. [PubMed Central ID: PMC7791448]. https://doi.org/10.1177/1073274820968900.

Import into EndNote Import into BibTex

Indexed in

Scopus

Web of Sciences Core Collections

Crossmark

Checking

Share on

Comments

Number of Comments:0

Cited by

Metrics

Get Permission (article level)

Purchasing Reprints

Copyright Clearance Center (CCC) handles bulk orders for article reprints for Brieflands. To place an order for reprints, please click here ( https://www.copyright.com/landing/reprintsinquiryform/ ). Clicking this link will bring you to a CCC request form where you can provide the details of your order. Once complete, please click the ‘Submit Request’ button and CCC’s Reprints Services team will generate a quote for your review.

Search Relations

Author(s):

An-Lin Zhang:[PubMed][Scholar]
Yan-Mei Fu:[PubMed][Scholar]
Zhi-Yang He:[PubMed][Scholar]