A Hybrid Intelligent Approach to Breast Cancer Diagnosis and Treatment Using Grey Wolf Optimization Algorithm

Background: Breast cancer is the second leading cause of death in women. The advent of machine learning (ML) has opened up a world of possibilities for the discovery and formulation of drugs. It is an exciting development that could revolutionize the pharmaceutical industry. By leveraging ML algorithms, researchers can now identify disease-related targets with greater accuracy. Additionally, ML techniques can be used to predict the toxicity and pharmacokinetics of potential drug candidates. Objectives: The main purpose of ML techniques, such as feature selection (FS) and classification, is to develop a learning model based on datasets. Methods: This paper proposed a hybrid intelligent approach using a Binary Grey Wolf Optimization Algorithm and a Self-Organizing Fuzzy Logic Classifier (BGWO-SOF) for breast cancer diagnosis. The proposed FS approach can not only reduce the complexity of feature space but can also avoid overfitting and improve the learning process. The performance of this proposed approach was evaluated on the 10-fold cross-validation technique and the Wisconsin Diagnostic Breast Cancer dataset. Although the performance of breast cancer detection is highly dependent on classification accuracy, most good classification methods have an essential flaw in that they simply seek to maximize the accuracy of classification while ignoring the costs of misclassification amongvariouscategories. Thisisevenmoreimportantinclassificationproblemswhentheinitialsetof featuresislarge. Withsuch a large number of features, it is of special interest to search for a dependency between an optimal number of selected features and the accuracy of the classification model. Results: In experiments, standard performance evaluation metrics, including accuracy, F-measure, precision, sensitivity, and specificity,wereperformed. TheevaluationresultsdemonstratedthattheBGWO-SOFapproachachieves99.70% accuracyand99.66% F-measure, which outperforms other state-of-the-art methods. Conclusions: During the comparison of the results, it was observed that the proposed approach gives better or more competitive resultsthanotherstate-of-the-artmethods. Byleveragingthepowerof MLalgorithmsandartificialintelligence(AI)andthefindings of the current study, we can optimize the selection of natural pharmaceutical products for the treatment of breast cancer and maximize their efficacy.

Artificial intelligence (AI) algorithms can serve as appropriate models to assist physicians in diagnosing breast cancer and categorizing patients.The integration of AI in the field of medical science is of paramount importance for improving the accuracy and performance of disease diagnosis (4)(5)(6).Promising reports on machine learning (ML) methods and intelligent techniques in breast cancer diagnosis research indicate their potential to enhance diagnostic performance (7,8).In recent years, various ML techniques have been applied to the diagnosis and classification of breast cancer to distinguish between malignant and benign cases.
Dutta et al. (9) introduced a classification approach for data mining in medicine.The approach applied an improved fireworks optimization algorithm with the best selection strategy (IFWABS) in the multilayer neural network.The IFWABS approach was tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, resulting in a testing accuracy of 96.98%.Dora et al. (10) suggested a novel classification approach based on the Gauss-Newton method and sparse representation technique (GNRBA) for breast cancer diagnosis.The efficiency of the GNRBA approach was examined on the UCI WDBC dataset, resulting in a classification accuracy of 98.48%.Saygili (11) conducted research on several classification methods for breast cancer, and the best accuracy was obtained by the multi-layer perceptron (MLP) neural network technique (98.41%).Jafari-Marandi et al. (12) offered a novel breast cancer diagnosis approach based on a life-sensitive method, a self-organizing map neural network, and an error-driven learning model (LS-SOED).
This method employed a decision-oriented neural network based on hybrid supervised and unsupervised learning.
The resulting classification accuracy of LS-SOED was 96.19% based on the WDBC dataset.
Wang et al. (13) suggested a novel breast cancer classification method applying the support vector machine based on weighted area under the curve and ensemble learning model (WAUCE).The simulation outcomes revealed that WAUCE achieved a 98.76% accuracy.Rao et al. (14) developed a new method based on the coherently integrated artificial bee colony optimization algorithm and gradient boosting decision tree technique (ABCODT) to select optimal feature subsets.
Several UCI datasets, including the WDBC dataset, were subjected to the ABCODT approach.Tests were carried out using the WDBC dataset, resulting in a classification accuracy of 97.18%.Liu et al. (15) employed a new hybrid technique for breast cancer detection.The information gain method, simulated annealing algorithm, and genetic wrapper approach (IGSAGAW) were used to select optimal features, and a support vector machine based on cost-sensitive The approach was tested on the WDBC dataset, resulting in a classification accuracy of 99.3%.The GWO algorithm offers an optimal trade-off between local search and global search.
The GWO algorithm has the advantages of having fewer control parameters and favorable convergence.
The SOF classifier is highly objective and non-parametric.
The SOF classifier is a highly adaptable classifier.
This work hybridizes the BGWO algorithm and the SOF classifier to validate the supervised learning model effectively.The BGWO-SOF hybrid approach efficiently removes unimportant features from the feature space and generates an optimal feature subset while preserving the latent structure of the dataset.

Methods
In this section, the materials and methods applied in the proposed approach are described.First, the breast cancer dataset used in this work is introduced.Then, the GWO and SOF classifiers are explained.Finally, the proposed approach is presented.

Dataset
The WDBC dataset was obtained from the UCI Machine Learning Repository and utilized in this work (24) (29,30).
Therefore, this process plays a vital role in the development of learning models.Selecting the appropriate and optimal feature subset can be challenging due to the complex and unpredictable interrelationships between features (31,32).Since FS is a near-optimal (NP)-hard problem (33,34), several optimization algorithms have been proposed to overcome its limitations.These algorithms include particle swarm optimization (PSO) (35,36), ant colony optimization (ACO) (37,38), whale optimization (WO) (39,40), and GWO (41).Feature selection approaches based on optimization algorithms have the ability to efficiently explore large search spaces and often yield results that closely approximate the global solution.
These approaches systematically eliminate unnecessary and redundant features, which can, in many cases, enhance the performance of learning models by reducing uncertainty and overfitting issues (42-44).

Grey Wolf Optimization Algorithm
Optimization algorithms refer to procedures for finding near-optimal solutions to multi-dimensional and complex optimization problems.One such optimization algorithm is the GWO algorithm.The GWO algorithm is a promising optimization technique based on swarm intelligence (SI) (45).This algorithm has garnered the attention of many researchers across various optimization domains (46)(47)(48).What sets the GWO algorithm apart from other evolutionary and swarm intelligence techniques are its distinctive characteristics.The GWO algorithm requires minimal parameter tuning, effectively balances global and local search, and demonstrates favorable convergence.Moreover, it is known for its simplicity of implementation, adaptability, and scalability (49).
The GWO algorithm mimics the natural hierarchy of leadership and hunting mechanisms observed in grey wolves.It employs four types of grey wolves to simulate the hierarchy of leadership: alpha (α), beta (β), delta (δ), and omega (ω).Grey wolves tend to live in herds and group environments, with group sizes ranging from a minimum of 5 to a maximum of 12.The hierarchy among grey wolves is highly structured, with α representing the best outcome in mathematical terms, followed by β and δ as the next level of preferred outcomes.The ω outcome is another consideration.The GWO algorithm posits that α, β, and δ wolves lead the hunt (optimization); however, ω wolves monitor these three leaders (45).The crucial phase of the hunt occurs when a wolf encircles its prey.Equations are employed to represent the encircling behavior of the GWO algorithm. (1) Where D i is calculated in Equation the current iteration is defined as t; p and y are positions of prey and grey wolves, respectively. (2) Where A and C are coefficient vectors that are determined using Equations (3)

C = 2.r2
Where r 1 and r 2 are random vectors in the range of [0, 1], and the lb components are linearly lowered from 2 to 0. The hunting process is frequently directed by α.Grey wolves, both β and δ, sporadically attend the hunt.However, the exact position of the best solution (prey) in the problem's abstract search space is unknown.Therefore, it was believed that α would be the best potential solution for the simulation of wolf hunting behavior; nevertheless, β and δ would present a better understanding of where the prey could be found.As a result, the top three findings obtained were set reserved.Omegas and other search agents must change their positions to the best search agents' positions.
Equation is utilized to update the positions of wolves. (5) Where W 1 , W 2 , and W 3 are formulated in Equations to 8, accordingly.

Binary GWO
The binary GWO algorithm has been called BGWO, where each solution contains a combination of 0's and 1's.

Emary et al. proposed a new BGWO algorithm applied for
FS tasks (50).In this algorithm, the wolves that updated the equation represented a three-position vector function: W α , W β , and W δ , which were in charge of inviting each of the wolves to the best three outcomes.The position of a specific wolf is included using the GWO principle while maintaining the binary constraint according to Equation The scholars employed the update on the GWO method, which is detailed in Equations to 23.The core update equation is expressed in Equation where w d α is the α wolf position in dimension d, and stepb d α is a binary step in dimension d that can be determined in Equation ( 15) where urand is a random number generated from a standard uniform distribution in the range (0,1), and stepb d α is the dimension d's continuous-valued step size that can be computed using the sigmoidal function as in Equation ( 16) where A d 1 and Di d α are determined by Equations ( 17) where w d β is the β wolf position vector in dimension d, and stepb d β is a binary step in dimension d, which can be determined as in Equation ( 18) where urand is a random number generated from a standard uniform distribution in the range (0,1), and stepb d β is the dimension d's continuous-valued step size that can be computed using the sigmoidal function as in Equation where w d 3 is the δ wolf position vector in the dimension d, and stepb d δ is a binary step in dimension d, which can be determined as in Equation ( 21) where urand is a random number generated from a standard uniform distribution in the range (0,1), and stepb d δ is the dimension d's continuous-valued step size that can be computed using the sigmoidal function as in Equation  Additionally, the SOF classifier is highly objective and non-parametric.This means that it does not rely on a predefined model with parameters.Instead, it derives all associated meta-parameters directly from the data itself.
Depending on the complexity of the problem and the availability of computational resources, the SOF classifier can address issues at various levels of granularity or detail.
Furthermore, it supports both online and offline learning and can classify data using various dissimilarity/distance criteria.Therefore, the SOF is a versatile classifier known for its excellent performance across a range of problems.
In this paper, the offline learning mode of the SOF classifier will be utilized. The Where xin signifies vector of input and "˜" signifies similarity, which can also be considered a fuzzy degree of membership/satisfaction (55); pro i (i=1,2,..., Np) represents the class's ith prototype; Np is the number of prototypes discovered from the data samples of this class.Different strategies, such as "fuzzily weighted average", might be used to determine the label for a specific data sample.
The fuzzy rule training procedures of separate classes will have no effect on each other.We will suppose for the remainder of this section that the training procedure is conducted using data samples from the cth class (c=1,2,..., C) indicated by Prototypes are found using the densities and mutual distributions of data samples in the method.
It is important to note that after a data sample is selected into list{r}, it cannot be selected for a second time.
Prototypes, indicated by {p} 0 are then recognized as the local maximum of the ordered multi-modal densities, D M M k c (r), by condition one: Condition 1: T HEN ri ∈ {p} 0 After all of the prototypes have been recognized with Equation some fewer representative ones might be found in {p} 0 , thereby necessitating the use of a filtering process to eliminate them from P0.
Before beginning the filtering process, use the prototypes to attract close data samples to construct data clouds (55), similar to Voronoi tessellation (58): W inning prototype After all the clouds of data are generated around the available prototypes {p} 0 , one can acquire the data cloud centers Indicated by {ϕ} 0 , and the multi-modal densities at the centers are computed by Following that, for each data cloud, supposing the ith one (ϕ i ∈ {ϕ} 0 ), the set of the centers of its neighboring data clouds, indicated using ϕ neighboring

Dehghan MJ and Azizi A
Where (ϕ j ∈ {ϕ} 0, ϕ i = ϕ j ; G c,l k c is described as the average radius of the local influential region surrounding, which corresponds to the Lth granularity level (L=1,2,3,...) and is produced from the cth class data based on an offline mode.Finally, the most representative prototypes of the cth class, indicated by {p} c , are chosen from the centers of the available data clouds that fulfill Condition 3 (58): Condition 3: In the end, the representative prototypes of the cth class {p} c are recognized, the fuzzy rule of AnYa type might be constructed as follows, where N c denotes the number of prototypes in {p} c : ( 29) T HEN (class c)

The Hybrid Intelligent Method
Taking into consideration the advantages of BGWO and SOF and the importance of breast cancer classification, this study proposes an intelligent approach for distinguishing benign from malignant breast cancers.In the proposed approach, BGWO acts as an FS technique to select the effective and optimal features; nevertheless, SOF functions as the classifier to evaluate the performance of these optimal features.The procedure of the proposed approach (BGWO-SOF) is described below.
The procedure begins with normalizing the values of the WDBC dataset and initializing the parameters for the BGWO algorithm.Then, the K-fold cross-validation technique is employed (with K = 10) to assess how effectively the classification approach can predict the tumor characteristics of an unknown instance.For each fold, the dataset is divided into 10 equally sized subsets.
Consequently, in each fold, 9 subsets serve as the training data (90% of the dataset); however, 1 subset (10% of the dataset) is reserved for testing purposes.
In each fold, the algorithm executes the optimization process by generating an initial population of candidate solutions (i.e., individuals) within the search space.Each position of an individual is represented as a vector with N elements, where N is the number of features in the dataset.
A 0 value indicates that a certain feature is not selected; nonetheless, a 1 value indicates that the related feature is selected.Each individual in the feature space constitutes a set of candidate features.The BGWO solution is depicted in Figure 1 (for example, N = 10).
Then, the fitness for each solution (X i ) is calculated.
Because the approach's primary goal is to improve the performance of the classification, the quality of a solution is determined by two key criteria: the number of BGWO-selected features in the solution and the SOF classifier's error rate.Therefore, the optimal solution is a combination of features with the fewest number of selected features and the highest performance of classification.In this paper, the fitness function in Equation by the SOF classifier is applied to evaluate the quality of the features selected by the BGWO. ( Where SOF Error the SOF's error rate on the WDBC dataset applying optimal selected features.F is the number of the complete set of features in the cancer breast dataset, and f is the number of the selected features in the solution.ω ∈ (0,1) and ϕ =1-ω are balance factors between SOF Error and the size of selected features to control the importance of feature space reduction and classification performance.The ϕ is equal to 0.05 in the present experiments.
At each iteration, the fitness or classification error rate of the new solution is compared to the fitness of the previous solution, and if it shows an improvement, the new solution is chosen.The process is repeated until the total number of iterations reaches a certain limit called Maxit.Finally, the sets of the selected features are used in the SOF learning model.This procedure will be repeated until all of these subsets apply for both the training and testing phases.
Finally, the evaluation metrics results from the 10 iterations are averaged to produce reliable statistical results.Figure 2 shows a flowchart of the proposed approach.

Results
This section presents the experimental setup, including the dataset and parameter settings, evaluation

Experimental and Parameter Settings
This study evaluated the performance of BGWO-SOF using the WDBC breast cancer dataset.For the experimental executions, this study utilized an Intel(R) Core (TM) i5 CPU 8250U 1.6GHz with 8 GB of RAM, running MATLAB 2018 on a Windows 10 (64-bit) operating system.
The BGWO-SOF was executed with a population size of 20.
The approach was formulated as an optimization problem, and BGWO was run for 50 iterations.In the SOF classifier, the Euclidean distance was employed as a dissimilarity metric with a granularity level equal to 12. To prepare for experimentation, the values from the WDBC dataset were normalized.Subsequently, a 10-fold cross-validation technique was employed to assess the performance of the proposed approach.

Evaluation Metrics
This work focused on a two-class classification problem, where a classifier produces two discrete results: occurs when a positive instance is incorrectly classified as negative.Similarly, when a negative instance is correctly classified as negative, it is a true negative (TN); however, an incorrect classification of a negative instance as positive is a false positive (FP).
In the WDBC dataset, patient instances with benign tumors are labeled as 0 (negatives); however, those with malignant tumors are labeled as 1 (positives).
True negatives are instances where the actual class was negative, and the expected class was also negative.True   The ROC curve does not depend on the class distribution, making it useful for evaluating classifiers predicting rare events.

Experimental Results
The first experiment evaluates the performance of the present proposed approach (BGWO-SOF) and the SOF classifier without feature selection capability (SOF-WFS).
The classification ability of the method is assessed using 10-fold cross-validation.Table 2 shows the experimental results as the average of the 10-fold cross-validation outcomes.Table 2 shows that by using BGWO-SOF, classification accuracy, and other metrics have improved dramatically, with the BGWO-SOF approach outperforming the SOF-WFS method.The average accuracy for the SOF method was 80.54%; however, the best accuracy for BGWO-SOF is 99.702%; nevertheless, the number of features was reduced by approximately 64%.Since the 10-fold cross-validation method was applied in this study, the ROC curve is generated from 10 folds.Figure 4 illustrates the ROC curve for the 10 folds, demonstrating the classifier's efficiency and its capability to distinguish between classes.
Figure 5 presents the number of selected features by the proposed approach in each iteration.The proposed approach selects the best and optimal subset of features, reducing the feature count by 62% in the FS process.Figure 6 indicates the frequencies of the selected features using the BGWO-SOF approach on the WDBC dataset over 10 iterations.

Discussion
In this study, the BGWO algorithm was combined with the SOF classifier to validate the supervised learning model     Additionally, the average MCC value (close to 1) suggests that both classes are predicted effectively.Figure 7 depicts comparisons between the present study's approach's results and the findings of previous research.The results demonstrated that the proposed approach outperforms other state-of-the-art methods in all evaluation metrics.
In general, the results of the proposed approach are satisfactory for all the evaluation metrics.The evaluation results indicated that the proposed hybrid approach can not only effectively reduce the feature space dimensions but also ensure the efficiency and acceptable performance of the classification model.The experimental and comparison results for all the evaluation metrics demonstrate that the proposed approach outperforms other well-known methods in classification.
Since we used the standard WDBC dataset for this study, there was no access to the BI-RADS stages of cases in Jundishapur J Nat Pharm Prod.2023; 18(4):e142058.

Uncorrected Proof
Dehghan MJ and Azizi A  It is important to note that, despite using the WDBC dataset from the UCI Machine Learning Repository, the proposed approach does not have any specific limitations for testing on real datasets.In practice, when using a real dataset, the following operations must be taken into account to ensure a clean dataset: Features must be extracted from real samples.
Valid samples must be separated, and damaged or invalid samples must be removed.
An expert doctor must classify the data samples as benign or malignant.

Conclusions
Breast cancer ranks among the leading causes of mortality worldwide, particularly among women.This pressing issue has prompted extensive research in the field of medicine.The primary objective of this study was to introduce an intelligent approach to breast cancer detection, with the aim of aiding clinical practitioners in making more informed decisions in the future.
In this paper, we proposed a novel hybrid intelligence approach that combines the GWO algorithm with the SOF classifier.The performance of this hybrid approach was assessed using the WDBC dataset and a stratified 10-fold cross-validation.Various standard performance evaluation metrics, including accuracy, F-measure, precision, sensitivity, and specificity, were employed

(
CSSVM) was used as a classifier in this hybrid technique (IGSAGAW-CSSVM).The results of experiments on the UCI-WDBC dataset demonstrated a classification accuracy of 95.7%.Lu et al. (16) proposed a novel breast cancer classification method using a genetic optimization algorithm and an online gradient boosting model (GAOGB).On the WDBC dataset, the GAOGB method achieved a classification accuracy of 94.51%.Abdar et al. (17) presented a novel breast cancer diagnosis method by utilizing a two-layer nested ensemble model based on stacking and voting ensemble techniques and Naïve Bayes classifier (SV-Naïve-Bayes).The presented method achieved a classification accuracy of 98.07% when applied to the UCI WDBC dataset.Dalwinder et al. (18) suggested a new breast cancer classification approach using a feature weighting technique based on the ant lion optimization algorithm and a multilayer neural network classifier (FW-BPNN).

( 12 )
lb = 2 − t 2 M axitWhere t denotes the current number of iterations, and Maxit denotes the maximum iterations permitted in the GWO.
" (w1.w2.w3)Where w 1 , w 2 , and w 3 binary vectors reflect the influence of a wolf's movement toward α, β, and δ grey wolves in order.w 1 , w 2 , and w 3 vectors are computed by Equations and 20, respectively.(14) Di d δ are determined by Equations A simple technique of stochastic crossover is used in each dimension to crossover w 1 , w 2 , and w 3 outputs by Equation(23) .5.Self-Organizing Fuzzy Logic Gu et and Angelov introduced a novel classifier model based on SOF (51).The SOF classifier utilizes non-parametric statistical operators to objectively reveal essential data patterns, even in the absence of empirically acquired data samples.It identifies local peaks within the multi-modal data distribution to serve as prototypes.
SOF classifier's offline method involves independently detecting prototypes for each class and constructing a zero-order fuzzy rule of the AnYa type based on the identified prototypes for each class (in the structure of Equation The AnYa-type fuzzy rule-based scheme was introduced in (52) as an alternative approach to the commonly used fuzzy rule-based schemes, such as Takagi-Sugeno (53) or models, the pattern component (IF) in AnYa-type fuzzy rules is streamlined into a more concise, objective, and non-parametric vector structure without requiring the definition of ad-hoc membership functions, as needed in the two aforementioned predecessors.The following is the form of a zero-order fuzzy rule of the AnYa type: If (xin ∼ pro 1 ) OR (xin ∼ pro 2 ) OR xin ∼ pro N T HEN (class ) is the number of data samples with {x} c k c , U c k is the number of unique data samples of the cth class.Considering all the classes, we have

1 . 3 .
Then, in a list defined by {r}, the data samples are ordered according to multi-modal density values and their reciprocal distances.By discovering the sample of data with the largest multi-modal density, r 1 = argmax(element of list {r} is recognized (r 1 ).Then, the data sample was determined as the second element (r 2 ) that is closest to r 1 in terms of distance: r 2 = argmin(d(r 1 , u c i ) i = 1, 2, . . ., U c k − The minimal distance to r 2 is used to identify the third element of list {r}, indicated by r The entire list{r} is built by repeating the procedure until each of the data samples has been chosen, and based on the list{r}, the multi-modal densities of {u} IF d 2 (φi, φj) ≤ G c,l K cT HEN φi ∈ {φ} neighboring j Jundishapur J Nat Pharm Prod.2023; 18(4):e142058.

Figure 1 .
Figure 1.Solution Representation of Feature Selection negative and positive.The experiments were conducted on the WDBC dataset, resulting in four possible outcomes in the confusion matrix, as shown in Figure 3.When a positive instance is correctly classified as positive, it is a true positive (TP).Conversely, a false negative (FN) positives are instances with a positive current class and a positive expected class.False negatives occur when a record's true class is positive but is expected to be negative; nevertheless, FPs occur when the record's actual class is negative; nonetheless, the expected class is positive.When dealing with cancer diagnosis results, minimizing FNs is crucial.This study employed classification performance Jundishapur J Nat Pharm Prod.2023; 18(4):e142058.

Figure 2 .
Figure 2. Flowchart of the proposed approach

Figure 3 .
Figure 3. Confusion matrix in a two-class classification

Figure 7 .
Figure 7. Accuracy comparison of the proposed approach to state-of-the-art methods (abbreviations: weighted area under the receiver operating characteristic curve ensemble (WAUCE); artificial bee colony and gradient boosting decision tree algorithm (ABCoDT); information gain directed simulated annealing genetic algorithm wrapper (IGSAGAW); cost sensitive support vector machine (CSSVM); genetic algorithm-based online gradient boosting (GAOGB); Back-propagation neural networks (FW-BPNN); genetic programming (GP); generalized feature selection algorithm (GeFeS); group method data handling (GMDH); interaction feature selection algorithm based on neighborhood conditional mutual information (NCMI IFS); eagle strategy optimization (ESO); gravitational search optimization (GSO); binary grey wolf optimization algorithm and a self-organizing fuzzy logic classifier (BGWO-SOF)

Table 3
impact on accuracy, this study examined metrics such as recall, precision, F-measure, and specificity.The average F-measure value, close to 1 (0.99742), indicates that the proposed approach has a very low number of FP and FN, signifying excellent precision and recall percentages.

Table 3 .
Comparison of Results with State-of-the-art Methods