Abstract
Background:
The main method used for the laboratory confirmation of malaria is the conventional light microscopy; however, microscopy has three main disadvantages: I) it is time-consuming and labor-intensive; II) its results depend heavily on good techniques, reagents and microscopes; III) in many cases decisions about treatment are often taken without using the result of microscopy because of long delays in providing the results to the clinician. Hence, an extreme necessity of the fast automatic detection of the disease is required to diagnose and treat promptly.Objectives:
Through the improvement of classification accuracy rate, this work aims to present a computer-assisted diagnosis system for malaria parasite.Materials and Methods:
This study was conducted using 400 confirmed images of blood slides infected with malaria parasite. The MATLAB software was used for the implementation of computation procedures. Using five extracted features (flat texture, saturation channel histogram, color histogram, gradient, and granulometry) and six classifiers (k-Nearest Neighbors (k-NN), 1-Nearest Neighbor (1-NN), decision tree (DT), Fisher, linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA)), images were classified into two classes: parasitic and nonparasitic. Then, classifier fusion was done using several algorithms: mean, min, max, stack, median, Adaboost, and bagging.Results:
Using six classifiers separately, the highest accuracy was obtained 92% using the k-NN classifier. The highest accuracy of the classifiers' fusion was obtained using the Adaboost algorithm with 95.5% success rate.Conclusions:
By comparing the results of classification using multiple classifier fusion with respect to using each classifier separately, it is found that the classifier fusion is more effective in enhancing the detection accuracy.Keywords
Malaria Classification Decision Trees k-Nearest Neighbor Fisher Adaboost Multiple Classifiers Fusion
1. Background
Malaria is the most important parasitic disease and one of the major health problems in a number of countries, particularly in tropical countries. The importance of this disease is because of its high prevalence and mortality. The total annual of cases of malaria in the world is five times higher than AIDS, tuberculosis and measles; 300 to 500 million clinical cases of malaria are diagnosed annually and three children die of malaria every minute in the world (1).
On the basis of world health organization report, 3.4 billion people in the world live in 104 countries with endemic malaria. Annually 207 million malaria report and 627,000 deaths were recorded in 2012. Children under 5 years old with 90% death rate were the most vulnerable group in the patients. Malaria is the most important parasitic disease in Iran with 444 identified active foci in the south parts of the country where 1% of population live there and are at risk (1).
Malaria is brought forth by a parasite from the Plasmodiidae family. Among all types of Plasmodium parasite, only five types can lead to malaria in humans: falciparum, vivax, ovale, malariae, and knowlesi. Infected female Anopheles mosquitoes spread malaria parasites by biting humans. These mosquitoes are nocturnal, thus they bite humans during the night time. The mosquito can also become infected via biting an infected human being, then it transfers the parasite to other persons when bites them (2).
The microscopic examination of a stained blood film remains the regular way of detecting and identifying malaria parasites; however, microscopy has three main disadvantages: (i) it is time-consuming and labor-intensive; (ii) its results depend heavily on good techniques, reagents and microscopes; (iii) in many cases decisions about treatment are often taken without using the result of microscopy, because of long delays in providing the results to the clinician. Hence, to treat diseases on time, the necessity of automatic detection of the malaria, using blood smear stained with Giemsa, is really felt and it has a significant value in the accurate and quick diagnosing of the disease, especially in epidemics cases. With automatic detection of the malaria parasite life stage, stage and severity of the disease can be identified.
The presence of endemic malaria in some parts of the southern of Iran and the proper weather condition for breading the Anopheles mosquito vectors in the clear areas in one hand and traveling people and immigrants of foreign nationalities from the neighboring countries with high prevalence of malaria on the other hand cause the priority of designing a fast acting system to detect and prompt treatment of discovered malaria cases in the community.
In this study, the Giemsa-stained images of different elements in blood are classified into two parasite and non-parasite categories and then stages of plasmodium vivax are detected. At first, features were extracted and then images were classified into two classes of parasitic and nonparasitic. To increase the efficiency of pattern recognition systems, the fusion of many classifiers is used. Therefore, the proposed method is an important step to diagnose the suspected malaria cases and as a result their prompt treatment can protect the whole community from occurring indigenous malaria transmission.
2. Objectives
Through the improvement of classification accuracy rate, this work aims to present a computer-assisted diagnosis system for malaria parasite; this system is of great importance, especially in remote areas, to diagnose and treat patients in time preventing malaria epidemics.
3. Materials and Methods
To enhance the accuracy of classification, a method consisting of the following steps was introduced. At first, 400 images of malaria parasites and nonparasites (platelets, white blood cells, etc.) were selected from the resource of London school of hygiene and tropical medicine (3). Then, several features were extracted from these images. These features are used for classifying images by several widely used classifiers. The features and classifiers are explained in the following.
3.1. Extracted Features
3.1.1. Gradient
In general, the gradient is a vector that point to the direction with a greatest increase rate and its magnitude measures the rate of increase (4). Here, the gradient is the change rate of gray-scale values in an image; for each pixel, it can be computed through calculation of difference between this pixel and its neighbor pixels. The gradient values for image pixels in a noninfected red blood cell are small; however, the gradient values of the trophozoite-infected cell are high. So, this feature is used to distinguish between noninfected red blood cells and red blood cells infected with malaria parasites.
3.1.2. Color Histogram
The color distributions in noninfected red blood cells and red blood cells infected with malaria parasites are different; thus, color histogram is used as a feature to differentiate infected from noninfected red blood cells. The distribution of colors in an image can be represented with a color histogram. After dividing the color space into a finite number of colors (color subspaces, each subspace corresponding to a color), this histogram can be computed via counting the number of pixels whose colors are situated in these subspaces. In spite of the fact that most of the time color histogram is used for three-dimensional spaces like Red, Green, Blue (RGB) or Hue, saturation, value (HSV), it can be done for any kind of color space (5).
3.1.3. Flat-Texture
The flat-texture is defined as difference between an image and its median filtered version (6). The flat texture image IFT is computed as follows:
where r is the size of the median operator window and IE (x, y) is the original image after pre-processing. The best performance for r is between 15 and 25. Since the texture in noninfected red blood cells and infected red blood cells are different, this feature was assessed on multiple images and the distinction between noninfected red blood cells and red blood cells infected with the malaria parasite was revealed.
3.1.4. Granulometry
Blood various elements differ in terms of size. So, the white blood cells are about two to three times larger than red blood cells and platelets are smaller than them. Granulometry feature was used to separate the blood components from one another according to their size. In binary images, the size of grains can be computed using some mathematical morphology opening operations (7).
3.1.5. Saturation Channel Histogram
Due to staining with Giemsa, parasite nucleus is dark purple that is easily visible in the saturation channel of the HSV space. This feature enables the parasite to be separated from other image (8).
3.2. Classifiers
Classification is to assign an unknown pattern based on the features to known classes. In order to train classifier, 400 images were used. Images consist of four types of Plasmodium parasites: falciparum, vivax, ovale, and malariae. All of the images have been prepared in size of 480 × 640. To compare the result, following criteria were used:
where, TP is the number of parasites which has been detected parasitic correctly by classifier; FP stands for the number of parasites which has been detected parasitic wrongly by classifier; TN denotes the number of nonparasites, which has been detected nonparasitic correctly by classifier; FN is the number of nonparasites, which has been detected nonparasitic wrongly by classifier.
The used classifiers are briefly described as follows: A) linear discriminant analysis (LDA) classifier: this classifier is the simplest and most widely used statistical classifiers; in standard form, is a binary classifier. With the maximum a posteriori (MAP) estimate and assuming a Gaussian function of the conditional probability of features in feature space and also equality of covariance matrices, the classes can be distinguished with this classifier (9).
B) quadratic discriminant analysis (QDA) classifier: this classifier is very close to the LDA; similar to LDA, QDA assumes that each class has normally distributed measurements (10). However, contrary to the LDA, QDA assumes that the covariances of classes are not identical and after doing some rearrangement, the separating surface between classes is quadratic (11).
C) k-Nearest Neighbor (k-NN) classifier: to find the class related to an F feature vector, first the class related to K to the nearest training data vector of vector F (according to Euclidean distance) would be considered. Then the class which more vectors are related to would be determined as the class related to the vector F (12).
D) Fisher’s classifier: this discriminator is according to make image the class. Suppose there are two classes whose data are two-dimensional. Then Fisher’s discriminator in two-dimensional space seeks the line on which making image of the data of the two classes, the class would be efficiently discriminated (12).
E) Decision Tree (DT) classifier: tree classifier is the large set of nonlinear classifiers, act as a multi-stage decision making system. To categorize several classes, all classes will be removed in stages to get the best grade possible. Tree includes nodes that are closely associated with branches sequentially. Each node has a certain condition and based on that, decide in that particular node; and then if necessary go to the next node (10).
3.3. Classifier Fusion
Classifier fusion consists of two parts. First part includes the creation of appropriate base classifiers, selecting types of classifier, Number of classifiers, and convenient features for each classifier. In order to achieve the best results for pattern classification, the second part consists of the combination of classifiers outputs. How to combine output classifiers is a problem related to the function. There are different rules for combining the outputs of classifiers that the used methods described briefly below (13).
A) Voting: in this method, the perspective of each classifier about the input sample class is considered as a vote and the final decision will be made based on the votes of different classifiers. The input sample belongs to the class, which has gotten the most votes (14).
B) The min, max, mean, and median algorithms: these algorithms need the classifiers, whose outputs are as C dimensional vector, [Si,1, Si,2, …,Si,c] (15). In other words, if the output of the i th classifier be as the vector [Si,1, Si,2, …,Si,c] (with this assumption that L basic classifier and C different classes), then its new output could be written as [S’i,1, S’i,2, …,S’i,c] in which:
Therefore, we have:
At this time, fixed fusion algorithms namely mean, maximum, minimum, and median could be defined as follows, respectively:
C) Stacking: in this method, several input feature-vector classifiers have been classified separately, which are called classifiers level -0. Then the output of each of them enters other classifiers as input which is called classifiers level -1 and it makes the final decision (14).
D) Bagging algorithm: consider the sum of training data as Z = [Z1, Z2… Zn] in which Zi is the i th sample. Each time, by n times bootstrap sampling in the Z training data collection, we gain a new training fusion which is as much as Z. Thus, if we do this process L times, we will have L different training fusions which are as much as Z. In bagging algorithm, L basic classifiers on L gained-training collection are trained then voting would be done among them (14).
E) Adaboost algorithm: this meta-algorithm is used to fuse many weak classifiers to generate a classifier with a high accuracy rate. It uses classifiers with accuracy rates more than random level (e.g. in the case of binary classification, each classifier has a success rate larger than 0.5). However, it can also Adaboost algorithm classifiers with error rate higher than random level, since the coefficient attribute to these classifiers in the final linear combination of classifiers would be negative. This algorithm is iterative; at each iteration, weights of correctly classified case are decreased and vice versa. It was shown that the Adaboost is less prone to the overfitting problem with respect to other classification methods (14).
4. Results
In Giemsa-stained blood cell images, the infected red blood cells can be distinguished from noninfected ones, since color distributions of these two kinds of cells are different; so, to differentiate noninfected and infected red blood cells, the color histogram feature is very helpful.
In Table 1, the result of applying the algorithm on six classifiers can be seen. The highest detecting accuracy belongs to k-NN, with 92% accuracy, and the lowest one belongs to the LDA classifier with 84% accuracy.
Classifier | LDC | k-NN | 1-NN | Fisher | QDC | DT |
---|---|---|---|---|---|---|
Accuracy | 84 | 92 | 90.2 | 85 | 86 | 88.5 |
Table 2 shows the result of the sum of the classifiers in Table 1 in which the highest accuracy was 93.25%, gotten from mean and median rules, and the lowest accuracy was 88℅, gotten from the min rule.
Combing Rule | Mean | Median | Max | Min | Stack |
---|---|---|---|---|---|
Accuracy | 93.25 | 93.25 | 91.5 | 88 | 91.5 |
Table 3 shows the result of the sum of the classifiers k-NN and 1-NN and the accuracy of all was 90.08%.
The Results of the Fusion of the Classifiers k-Nearest Neighbors (k-NN) and 1-Nearest Neighbor (1-NN) With Different Combining Rules a
Combing rule | Mean | Median | Max | Min | Stack |
---|---|---|---|---|---|
Accuracy | 90.08 | 90.08 | 90.08 | 90.08 | 90.08 |
Table 4 shows the result of the sum of the classifiers k-NN, 1-NN and fisher which the highest accuracy was 91.74%, gotten from the mean rule, and the lowest accuracy was 89.81%, gotten from the stack rule.
The Results of the Fusion of the Classifiers k-Nearest Neighbors (k-NN), 1-Nearest Neighbor (1-NN) and Fisher With Different Combining Rules a
Combing rule | Mean | Median | Max | Min | Stack |
---|---|---|---|---|---|
Accuracy | 91.74 | 90.63 | 89.88 | 89.88 | 89.81 |
In Table 5, the bagging rule applied on some of the based classifiers and the results were compared. In this table, the best performance was for the tree classifier with 30 repetitions whose accuracy was 94.8%.
Number of Base Classifiers | |||
---|---|---|---|
Classifier | 6 | 12 | 30 |
k-NN | 91.75 | 92.25 | 92.25 |
1-NN | 91.75 | 92 | 92 |
Fisher | 86 | 84.75 | 86.25 |
LDA | 84.75 | 85.5 | 85.5 |
QDA | 87.25 | 87.5 | 87.5 |
DT | 93.75 | 94.25 | 94.8 |
In Table 6, the Adaboost rule applied to some of base classifiers and the results were compared. In this table, the best performance was for the tree classifiers with 150 repetitions whose accuracy was 95.5%.
Adaboost Accuracy Using Several Classifiers a
Classifier | Number of Base Classifiers | |||
---|---|---|---|---|
6 | 20 | 100 | 150 | |
Accuracy | ||||
k-NN | 90.25 | 90.75 | 89.75 | 89.75 |
1-NN | 89.5 | 90 | 89.75 | 89.75 |
Fisher | 88.75 | 89.5 | 91.25 | 89.75 |
LD | 85.75 | 86.75 | 85.75 | 85.75 |
QD | 84.75 | 83.75 | 82.75 | 82.75 |
DT | 95 | 94 | 95.25 | 95.5 |
In Table 7, by using tree classifiers with Adaboost fusion rule, SE, SP, and PR were calculated; then SE and SP proposed method are compared with the Tek et al. (3).
The Results of Detecting Parasite Using the Adaboost Algorithm and Comparing With Previous Method a,b
Method | SE | SP | PR |
---|---|---|---|
Proposed method | 93.3 | 97.3 | 96.5 |
Tek’s method (3) | 72.4 | 97.6 | - |
Comparing the Accuracy of the Proposed Algorithm With Previous Methods a
5. Discussion
Di Ruberto et al. (8) utilized two components of the HSV color space, i.e. the hue and saturation, for delimiting parasitical regions. This method may lead to false detection, because it uses a parabolic model based on these two components for nonuniform illumination in images. To discriminate between RBCs and parasites, Ross et al. (18) proposed a method based on histogram thresholding, but this method is unsuccessful when there is not a distinct valley in the image histogram. Sio et al. (19) presented an algorithm that determines boundaries between cell and parasite via edges detection. Their method led to the “Malaria Count” software, which automatically quantifies the parasites level in the blood.
In Table 8, the accuracy of research method has been compared with two other methods. As it is clear, the results validate the efficiency of the proposed algorithm.
Diaz et al. (20) introduced a method to separate images into three regions: parasite, red blood cell and background; it used a color segmentation technique and supervised classifiers. The article presents a simple method for red blood cell and parasite detection with no classification of parasites. No details on the filtering process performed to separate the relevant objects of interest which are given. The system assumes constant color tone in the input images, since only luminance differences are corrected. The paper by Tek et al. (3) classified the stained pixels as parasite or nonparasite; in this paper, a distance weighted k-NN classifier was utilized using four selected features: color histogram, Hue moments, relative shape measurements vector, and color correlogram. The relative shape measurements vector is formed of simple measurements representing the object shape. According to the results of the study, the most successful feature to classify the stained objects as parasite/nonparasite was the combination of correlogram, Hue moments and relative shape measurements.
The fusion of classifiers has a significant role in detecting the pattern. There are three views regarding this claim:
- To some extent, each classifier is able to recognize the pattern correctly and none of the classifiers is able to recognize all patterns in every condition correctly. Generally, for one special usage, one classifier is not able to supply the demanded recognition by itself and a fusion of some classifiers is needed.
- Different features are different representations of a pattern each of which has a special kind of information regarding that pattern. Extracting different features is needed in order to a pattern be recognized. For instance, using fingerprint, iris and voice of a person are a common way to distinguish the identity of people (21). The fusion of the results of classifiers, gained from different features, can improve the efficiency of pattern detecting system.
- Extracting some features produces a big feature vector. The analysis of big feature vectors by one classifier makes the time of processing longer. This analysis causes some problems in biometric system. Using the fusion of classifiers makes it possible that feature vectors in larger scale be divided into vectors in smaller scale and be processed by the smaller and simpler classifier simultaneously (22). The final classifiers are made from the collection of the results of these classifiers. In short, by using the fusion of the results of classifiers, the efficiency of pattern detecting system, especially in complex patterns, could be improved.
As it is shown in Table 2, fusing classifiers from suitable rule causes the raise of accuracy of algorithm. It should be considered that which classifiers and from which rule should be collected. The results of fusion of classifiers in Table 1 are better than the other mentioned classifiers in Tables 3 and 4; moreover, the Adaboost rule has the best function among other rules. After this task, the accuracy of algorithm has been compared with previous methods, especially with Das et al. (17), Kumarasamy et al. (16), and Malihi et al. (23); this comparison shows that the proposed method enhances the detection accuracy.
To sum up, this study has dealt with the fusion of the classifiers to increase the accuracy in diagnosing of malaria. To this end, first color histogram, Granulometry, flat-texture, saturation channel histogram, and gradient features were extracted from images. Then, the image classification has been done by six classifiers: k-NN, Fisher, 1-NN, DT, QDA, and LDA. Then classifier fusion using the mean, min, max, stack, median, Adaboost and bagging rules were used to increase the accuracy. The results showed that the classifier fusion using Adaboost rule outperformed other rules and the parasite detection accuracy, compared to the individual classifiers, was increased. Therefore, the classifier fusion helps to increase the accuracy of diagnosis.
Acknowledgements
References
-
1.
WHO. World Malaria Report. 2013. Available from: http://www.who.int/entity/malaria/publications/world_malaria_report_2013/wmr2013_no_profiles.pdf.
-
2.
Isle M. Malaria. 1 ed. New York: Rosen Pub; 2001.
-
3.
Tek FB, Dempster AG, Kale İ. Parasite detection and identification for automated thin blood film malaria diagnosis. Comput Vision Image Understanding. 2010;114(1):21-32. https://doi.org/10.1016/j.cviu.2009.08.003.
-
4.
Korn GA, Korn TM. Mathematical handbook for scientists and engineers: definitions, theorems, and formulas for reference and review. Dover ed. Mineola, N.Y: Courier Corporation; 2000.
-
5.
Bashkov EA, Kostyukova NS. To the Estimat of Image Retrieval Effectiveness Using 2D-color Histograms. J Automation Info Sci. 2006;38(11):74-80. https://doi.org/10.1615/J.
-
6.
Rodenacker K, Bengtsson E. A feature set for cytometry on digitized microscopic images. Anal Cell Pathol. 2003;25(1):1-36. [PubMed ID: 12590175].
-
7.
Dougherty ER, Kraus EJ, Pelz JB, editors. Image Segmentation By Local Morphological Granulometries. Geoscience and Remote Sensing Symposium, 1989 IGARSS'89 12th Canadian Symposium on Remote Sensing. 1989; Canada. p. 1220-3.
-
8.
Di Ruberto C, Dempster AG, Khan S, Jarra B. Analysis of infected blood cell images using morphological operators. Image Vision Comput. 2002;20(2):133-46. https://doi.org/10.1016/s0262-8856(01)00092-0.
-
9.
Theodoridis S, Koutroumbas K. Pattern recognition. 3 ed. San Diego, CA: Academic Press; 2006.
-
10.
Cover TM. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. Electronic Computers IEEE Transactions. 1965;14(3):326-34.
-
11.
Theodoridis S, Koutroumbas K. Pattern Recognition and Neural Networks. In: Paliouras G, Karkaletsis V, Spyropoulos C, editors. Machine Learning and Its Applications. Heidelberg, Berlin: Springer; 2001. p. 169-95. https://doi.org/10.1007/3-540-44673-7_8.
-
12.
Shashua A. On the Relationship Between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant. Neural Processing Letters. 1999;9(2):129-39. https://doi.org/10.1023/a:1018677409366.
-
13.
Nabavi K, Kabir E. Classifiers Fusion: Variety Creation and Fusion Rules. J Sci Res Iran comput Soc. 2005;3(2):95-107.
-
14.
Lam L, Suen SY. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Transactions Sys Man Cybernetics - Part A Sys Humans. 1997;27(5):553-68. https://doi.org/10.1109/3468.618255.
-
15.
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. Ann Statistics. 2003;32(1):56-134. https://doi.org/10.1214/aos/1079120130.
-
16.
Kumarasamy SK, Ong SH, Tan KSW. Robust contour reconstruction of red blood cells and parasites in the automated identification of the stages of malarial infection. Machine Vision Applications. 2011;22(3):461-9.
-
17.
Das D, Ghosh M, Chakraborty C, Maiti AK, Pal M, editors. Probabilistic prediction of malaria using morphological and textural information. Image Information Processing (ICIIP), 2011 International Conference on. 2011. p. 1-6.
-
18.
Ross NE, Pritchard CJ, Rubin DM, Duse AG. Automated image processing method for the diagnosis and classification of malaria on thin blood smears. Med Biol Eng Comput. 2006;44(5):427-36. [PubMed ID: 16937184]. https://doi.org/10.1007/s11517-006-0044-2.
-
19.
Sio SW, Sun W, Kumar S, Bin WZ, Tan SS, Ong SH, et al. MalariaCount: an image analysis-based program for the accurate determination of parasitemia. J Microbiol Methods. 2007;68(1):11-8. [PubMed ID: 16837087]. https://doi.org/10.1016/j.mimet.2006.05.017.
-
20.
Díaz G, Gonzalez F, Romero E. Infected Cell Identification in Thin Blood Images Based on Color Pixel Classification: Comparison and Analysis. In: Rueda L, Mery D, Kittler J, editors. Progress in Pattern Recognition, Image Analysis and Applications. Heidelberg, Berlin: Springer; 2007. p. 812-21.
-
21.
Jain A, Nandakumar K, Ross A. Score normalization in multimodal biometric systems. Pattern Recognition. 2005;38(12):2270-85. https://doi.org/10.1016/j.patcog.2005.01.012.
-
22.
Duda RO, Hart PE, Stork DG. Pattern classification. 2 ed. New York: Wiley; 2001.
-
23.
Malihi L, Ansari-Asl K, Behbahani A, editors. Malaria parasite detection in giemsa-stained blood cell images. Machine Vision and Image Processing (MVIP), 2013 8th Iranian Conference on. 2013; Iran. p. 360-5.