Pixel Selection by Successive Projections Algorithm Method in Multivariate Image Analysis for a QSAR Study of Antimicrobial Activity for Cephalosporins and Design New Cephalosporins

authors:

avatar Ahmadreza Amraei a , avatar Ali Niazi a , b , * , avatar Mohammad Alimmoradi a , avatar Bahram Delfan c

Department of Chemistry, Arak Branch, Islamic Azad University, Arak, Iran.
Department of Chemistry, Central Tehran Branch, Islamic Azad University, Tehran, Iran.
Lorestan University of Medical Sciences, Department of Physiology and Pharmacology Khorramabad, Iran.

how to cite: Amraei A, Niazi A, Alimmoradi M, Delfan B. Pixel Selection by Successive Projections Algorithm Method in Multivariate Image Analysis for a QSAR Study of Antimicrobial Activity for Cephalosporins and Design New Cephalosporins. Iran J Pharm Res. 2018;17(4):e124847. https://doi.org/10.22037/ijpr.2018.2286.

Abstract

Thirty-one Cephalosporin compounds were modeled using the multivariate image analysis and applied to the quantitative structure activity relationship (MIA-QSAR) approach. The acid dissociation constants (pKa) of cephalosporins play a fundamental role in the mechanism of activity of cephalosporins. The antimicrobial activity of cephalosporins was related to their first pKa by different models. Bidimensional molecular structures (images) were used to calculate some pixel descriptors. The selection of pixels by successive projections algorithm (SPA) was done to achieve simple MIA-QSAR models; based on a small subset of pixels. In the present study, the performance of pixel selection technique using SPA for partial least squares (PLS) model was evaluated. The obtained model showed nice prediction ability with root mean square error of prediction (RMSEP) values of 0.402, 0.315, and 0.160 for principal component regression (PCR), PLS and SPA-PLS models respectively. Among the three methods, SPA-PLS was potentially useful in predicting the pKa of cephalosporins. The study showed the maximum structural efficacy is on pKa in a, b and c regions.

Introduction

Cephalosporins (Cephems) are broad and they are chemically related to the β-lactam class of antibiotics with enormous medicinal applications (1). Cephalosporins exhibit good antibacterial properties against a broad class of bacteria including Gram-positive and Gram-negative bacteria (2). Cephalosporins such as penicillin can prevent bacterial cell wall synthesis. The properties such as antimicrobial activity, chemical stability, solubility, and acid-base properties depend on enormous extent on cephalosporin structures (1). The human body›s resistance against antibiotics is a major problem in the medical community; so, it is expedient that new cephalosporins be designed (3). The pKa play a fundamental role in the mechanism of activity of various biological fluids, primarily the blood, and its capability to interact with components of these fluids and other drugs can be investigated (1). The activities applied in models QSAR contain biological activity, chemical measurement, toxicity and bioavailability and are used as dependent variable in building a model (4-6). In order together useful information for medicinal chemistry, design of new drugs and toxicity, QSAR is one of the well-established key areas in chemometrics. The QSAR models were created with successful prediction of the activity and factors influencing the activity, and were used at the end to design compounds that were more effective (7-11). The steps necessary in obtaining a MIA-QSAR model include drawing molecular structures, molecular descriptors (pixels) calculation, splitting of data for training and validation sets, pixels selection, model build-up between selected variables and activity, and finally model validation (8). Due to the large number of descriptors in MIA-QSAR, a major step in constructing the model is the selection of a subset of pixels to maximize information contents. Variable selection techniques in MIA-QSAR (12-17) play a key role in developing work of this nature because of the high dimensional data sets. Multivariate calibration model such as PCR and PLS is a technique that can be effective in dealing with the problem of undesirable increase in variable/object ratio and collinearity (18). The SPA is a forward selection method that starts with one variable; and incorporates a new one during each iteration until a specified number (N) of variables is obtained. Previous studies have shown that SPA can be used successfully as a special variable selection method (19-23). In recent years, many applications for image analysis have been created to solve a variety of problems due to rapid low costanalysis. Image analysis is a wide field of study that encloses classical studies on gray scale or (red-green-blue) RGB images. Esbensen and Geladi have demonstrated that multiple image analysis may provide useful information in chemistry; the descriptors do not have a direct physicochemical meaning since they are binaries (24). In MIA-QSAR (25-27) bidimensional images have been shown to contain chemical information that allows the relationship between chemical structures and activities. In this study, emphasis was on the application of 2D images, which are the suitable structures of compounds that can be drawn with the help of any appropriate software, pixels images as descriptors in QSAR (28, 29).The obtained MIA-QSAR model was then tested with successful prediction of the pKa of 4 cephalosporin compounds.

Experimental

Hardware and Software

The SONY Personal Computer (4 GB RAM) equipped with Windows 8 operating system and MATLAB (Version 7.8.0 (R2009), Math work Inc.) was used. The PLS calculus were carried out by using the PLS-Toolbox (Version 4.0) (Eigenvector Technologies). SPA program was written in MATLAB by M.C.U. Araujo et al. The source codes of the programs were made available by the authors upon our request and molecular structures were drawn by the use of Chem Office software (Version 2010). Kennard-Stones programs were written in MATLAB according to the algorithm (30).

Data Set

The first pKa of 31 cephalosporins were taken from the article of VG Alekseev (1). The chemical structure of these cephalosporins and their corresponding first pKa data are listed in Table1. In order to create a reliable MIA-QSAR model, data set was separated into the parts of training and prediction sets according to the Kennard-Stone algorithm. Kennard-Stone algorithm is one of the ideal ways of splitting a set of known data. The Kennard–Stone algorithm selects a set of molecules in studied set of data, which are uniforhy distributed over the space defined by the candidates. This is a classic technique to extract a representative set of molecules from a given data set. In this technique the molecules are selected consecutively. The first two objects are chosen by selecting the two farthest apart from each other. The third sample chosen is the one farthest from the first two objects, etc (31).

Table 1

Chemical structures of cephalosporins and their corresponding pKa

Multivariate Image Analysis Descriptors

The descriptors in the MIA-QSAR method are the pixels of structure images. The pixels chosen by SPA were correlated with the dependent variables for building the MIA-QSAR model. The bi-dimensional structures of each compound in Table 1 were systematically drawn using the Chem Office software and then converted to bitmaps of 240 × 160 pixels workspace. All the cephalosporin structures were fixed accordingly; a pixel was fixed on the sulfur element in the 110 × 80 pixel coordinate since the whole images must be superimposed afterwards to obtain maximum similarity. Each structure had a 2D image; superposition (alignment) of the 31 images gives a three-way array of 31× 240 × 160 dimension, which was unfolded to a two-way array (matrix) of 31 × 38400 dimension. Many columns with zero variance were removed.

Results and Discussion

Multivariate Image Analysis Descriptors

The MIA-QSAR model was made based on the correlation of these pixels with the activities of cephalosporin. The bi-dimensional structures of each cephalosporin in Table 1, were drawn using the Chem Office software as well as the same font and size, and thereafter, converted to bitmaps of 240 × 160 pixels workspace. All built molecular structures were systematically fixed in a given coordinate. In this study, the point fixed at the 110 × 80 coordinate (sulphur element) was used as reference in the alignment step as illustrated in Figure.1. Since each molecular image is a bi-dimensional image, alignment of the 31 images gives a three-way array of 31 × 240 × 160 pixels which was unfolded into 38400 rows and then the 31 images were grouped to form 31 × 38400 matrix. In order to minimize the memory, the columns with zero variance were eliminated, and at the end all pixels data were mean centered.

2D images and unfolding step of the 31 chemical structures to give the X-matrix. The arrow in structure indicated the coordinate of a pixel in common among the whole series of compounds, used in the 2D alignment step

Principal Component Analysis of the Data Set

The number of real dimensions of data sets was determined using PCA. It is also used for making 2, 3 dimensional plots of data for visual examination in order to diagnose collinearity, homogeneities in the data set and reduce high dimension as well as interpretation for detection of outliers and identification of clusters. PCA were performed within the calculated image descriptors space for the whole data set. The number of principal components is less than or equal to the number of original variables. The first PCs retained the greatest amount of variation and valuable information in the data set. A total of 248 pixel descriptors were initially calculated using PCA for the entire data set of 31compounds. The PCA results showed that three PCs (PC1, PC2 and PC3) had a value of 90.56% of the overall variability: PC1 = 67.66%, PC2 = 11.95% and PC3 = 10.72% (Figure.2) and thus the images domain may be expressed in terms of these 3 dimensions mainly. Since almost all variables can be accounted for by the three primary PCs, their score plot is a reliable tool for the spatial distribution of the points of the data set. As shown in Figure 2, there is no obvious clustering among the compounds. Good data distribution on the development of reliable and robust QSAR models is very important. The ability and qualityof the prediction depends on the data set used to build the model. In order to modeling, data set were divided into two groups; training set (23 data) and a prediction set (8 data) according to Kennard-Stones­ algorithm. According to Figure 2, the distribution of compounds in each subset of pixels appears somewhat well diffused over the space of the principal components.

Principal components of the 2D image descriptors for the data set, (a), PC1 versus PC2, (b) PC1 versus PC3 and (c) PC2 versus PC3

PCR and PLS Modeling

The main step in the MIA-QSAR method is the relationship between several pixels and dependent variables. The PLS and PCR methods were used as the multivariate calibration methods. The root mean square error cross validation (RMSECV) was applied in finding the number of appropriate latent variables required for description of the best developed model (Figure 3). The F-statistical test was used to determine the significance of RMSECV values that are greater than the minimum. The values of RMSECV were minimal when the optimum value of latent variables (LVs) were 7, 6 and 4 for PCR, PLS and SPA-PLS, thus the optimum number of LVs for the training set of SPA-PLS method was chosen to be 4. Before modeling, the data set were mean centered.

The RMSECV versus number of latent variables

SPA-PLS Modeling

As you know, one of the key problems in a modeling that is robust and fast is the selection of subset of molecular descriptors (pixels) instead of all set descriptors. SPA is a forward variable selection method. In order to obtain subsets of data set with small collinearity, employ simple operation in vector space of variables (pixels). Data sets were mean centered before the SPA-PLS was performed. After pixels selection by SPA technique, these pixels were used for running the PLS. By running the SPA-PLS, the number of latent variables reduced to 4 (Figure 3). The selected areas by SPA shown are in Figure.4. In this work, it has been shown that SPA can be a good, reliable and fast technique for pixel selection in MIA-QSAR. Using the chosen pixels by SPA, it was found that the most structure influence was on the pKa in a, b and c regions (Figure.4) and among these three regions, b in which there were more alterations by different functional groups had more influence on pKa than in other regions.

Selected regions by successive projection algorithm

Model Validation and Prediction of pKa

In Table 2, the predicted values of pKa calculated by the PCR, PLS and SPA-PLS methods and the percent relative errors of prediction are presented. The correlation between reference and predicted pKa for SPA-PLS was suitable with R2 =.9351 and intercept = 5576. The statistical results presented in Table 2 clearly indicate that the SPA-PLS model has good quality with low prediction errors. In addition, to evaluate the predictive ability of a different model, the root mean square error of prediction (RMSEP), relative standard error of prediction (RSEP) and cross validated coefficient (Q2) can be used.

RMSEP=i=1n(ypred-yobs)2n2

Equ 1.

RSEP(%)=100×i=1n(ypred-yobs)2(yobs)22

Equ 2.

Where ypred and yobs is the predicted value and observed value of the sample and n is the number of samples in the validation set. These statistical parameters with their goodness qualities are displayed in Table 3.

Table 2

Observation and calculation values of pKa using PCR, PLS and SPA-PLS models

Number of compounds (Table 1)Observation pKaPCR
PLS
SPA-PLS
PredictedError (%)PredictedError (%)PredictedError (%)
82.602.36-10.172.51-3.582.56-1.56
241.621.9717.761.9014.731.8311.47
272.852.51-13.542.54-12.202.81-1.42
202.102.5818.602.5016.002.214.97
212.912.34-24.352.53-15.012.57-13.22
253.202.60-23.072.67-19.853.07-4.23
182.502.34-6.832.63.842.42-3.30
192.031.79-13.402.176.451.97-3.04
Table 3

Comparison of the statistical parameters by different QSAR models for the prediction of the pKa

MethodsPCRPLSSPA-PLS
NLVa764
RMSEP0.4020.31570.160
REP (%)25.6220.1110.11
R20.46290.73750.9351
Q20.33400.59000.8960

Molecular Design

As a created application method, we investigated SPA-PLS models to predict the pKa of 4 new cephalosporin compounds whose biological tests have not been performed with this application method. In Table 4, the chemical structure of 4 new cephalosporins compounds and their pKa calculated by this proposed method are presented.

Table 4

Structural modification of cephalosporins and predicted pka by SPA-PLS

Conclusion

In this work, using the SPA-PLS model, a QSAR model that was useful in the prediction of the pKa of 31 cephalosporins based multivariate image analysis alone have been proposed. In addition, the SPA-PLS model showed nice and accurate predictive values giving good correlation values in calibration. Pixels selection improved the predictive quality of the MIA-QSAR model. Pixel selection using the SPA method unlike other methods appears to be fast and reproducible. Our study showed that pixels selection by the SPA method is useful for those designing novel cephalosporins.

Acknowledgements

References

  • 1.

    Alekseev VG. Drug Synthesis Methods and Manufacturing Technology. Pharm. Chem. J. 2010;44:16-26.

  • 2.

    Liua J, Zhoua L, Zuo Z. Antibacterial Activities of Carbapenem Derivatives and Quantitative Structure–Activity Relationship for Drug Design. J. Chem. Inf. Comput. Sci. 2008;27:1216-26.

  • 3.

    Epstein ME, Amodio-Groton M, Sadick NS. Antimicrobial agents for the dermatologist I β-Lactam antibiotics and related compounds. J. Am. Acad. Dermatol. 1997;37:149-65. [PubMed ID: 9270499].

  • 4.

    Liao SY, Xu LC, Qian L, Zheng KC. QSAR and Action Mechanism of Troxacitabine Prodrugs with Antitummor Activity. J. Theor. Comput. Chem. 2007;6:947-58.

  • 5.

    Shen Q, Shi W-M, Kong W. Modified tabu search approach for variable selection in quantitative structure-activity relationship studies of toxicity of aromatic compounds. Artif. Intell. Med. 2010;49:61-6. [PubMed ID: 20153153].

  • 6.

    Zhang YQ, Huang X, Jiang X, Huang S. Modified Ant Colony Optimization Algorithm and Its Application in Variable Selection of QSAR of Polychlorinated Organic. J. Theor. Comput. Chem. 2009;8:783-98.

  • 7.

    Shamsipur M, Zare-Shahabadi V, Hemmateenejad B, Akhond M. An efficient variable selection method based on the use of external memory in ant colony optimization Application to QSAR/QSPR studies. Anal. Chim. Acta. 2009;646:39-46. [PubMed ID: 19523554].

  • 8.

    Niazi A, Jameh-Bozorghi S, Nori-Shargh D. Prediction of Acidity Constants of hiazolidine-4-carboxylic acid derivatives using ab Initio and Genetic Algorithm-Partial Least Squares. Turk. J. Chem. 2006;30:619-28.

  • 9.

    Edrakia N, Umashankar Das, Hemateenejad B, Jonathan R Dimmockb, Miri R. Comparative QSAR Analysis of 3,5-bis (Arylidene)-4-Piperidone Derivatives: the Development of Predictive Cytotoxicity Models. Iran. J. Pharm. Res. 2016;15:425-37. [PubMed ID: 27642313].

  • 10.

    Ayatollahi AM, Zarei SM, Memarnejadian A, Ghanadian M, Heydarian Moghadam M, Kobarfard F. Triterpene Constituents of Euphorbia Erythradenia Bioss and their Anti-HIV Activity. Iran. J. Pharm. Res. 2016;15:19-27. [PubMed ID: 28228800].

  • 11.

    Ghasemi JB, Asadpour S, Abdolmaleki A. Prediction of gas chromatography/electron capture detevtore retention times of chlorinated pesticide, herbicides, and organohalides by multivariate chemometrics methods. Anal. Chim. Acta. 2007;588:200-6. [PubMed ID: 17386811].

  • 12.

    Antunes JE, Freitas MP, Rittner R. Bioactivities of a series of phosphodiesterase type 5 (PDE-5) inhibitors as modelledby (MIA-QSAR). J. Med. Chem. 2008;43:1632-8.

  • 13.

    Bitencour M. Freitas MP and Rittner R. The MIA-QSAR method for the prediction of bioactivities of possible acetylcholinesterase inhibitors. Arch. Pharm. 2012;345:723-8.

  • 14.

    Nunes CA, Freitas MP. Introducing new dimensions in MIA-QSAR: a case for chemokine receptor inhibitors. Eur. J. Med. Chem. 2013;62:297-300. [PubMed ID: 23357311].

  • 15.

    Freitas MP. MIA-QSAR modelling of anti-HIV-1 activities of some, 2-amino-6-aryl sulfonylbenzonitriles and their thio and sulfinyl congeners. Org. Biomol. Chem. 2006;4:1154-60. [PubMed ID: 16525561].

  • 16.

    Goodarzi M, Freitas MP, Eric B. Ferreira. Influence of Changes in 2-D Chemical Structure Drawings and Image Formats on the Prediction of Biological Properties Using MIA-QSAR. QSAR Comb. Sci. 2009;28:458-64.

  • 17.

    Sarkhosh M, Khorshidi N, Niazi A, leardi R. Application of genetic algorithms for pixel selection in multivariate image analysis for a QSAR study of trypanocidal activity for quinine compounds and design new Quinone compound. Chemometr. Intell. Lab. Syst. 2014;139:168-74.

  • 18.

    Geladi P, Kowalski BR. Partial Least-Squares regression: a tutorial. Anal. Chim. Acta. 1986;185:1-17.

  • 19.

    César Ugulino Araújo M, Cristina Bezerra Saldanha T, Kawakami Harrop Galvão R, Yoneyama T, Caldas Chame H, Visani V. The successive projections algorithm for variable selection in spectroscopic multi component analysis. Chemometr. Intell. Lab. Syst. 2001;57:65-73.

  • 20.

    Ye S, Wang D, Min S. Successive projections Algorithm combined with unimformative variable elimination for spectral variable selection. Chemom. Intell. Lab. Syst. 2008;91:194-9.

  • 21.

    Marreto PD, Zimer AM, Faria RC, Mascaro LH, Pereira EC, Fragoso WD, Lemos SG. Multivariate linear regression with variable selection by successive Projections Algorithm applied to the analysis of anodic stripping voltammetry data. Electrochimica. Acta. 2014;127:68-78.

  • 22.

    Acebal CC, Grünhut M, Lista AG, Band BSF. Successive projections algorithm applied to spectral data for the simultaneous determination of flavour enhancers. Talanta. 2010;82:222-6. [PubMed ID: 20685460].

  • 23.

    Kompany-zareh M, Akhlaghi Y. Correlation Weighted Successive Projections Algorithm as a Novel Method for Variable Selection in QSAR Studies: Investigation of Anti-HIV Activity of HEPT Derivatives. J. Chemom. 2007;21:239-50.

  • 24.

    Parts-Montalban JM, Juan A De, Ferrer A. Multivariate Image Analysis: A review with applications. Chemom. Intell. Lab. Syst. 2011;107:1-23.

  • 25.

    Geladi P, Esbensen K. Can Image Analysis Provide Information Useful in Chemistry. J. Chemom. 1989;3:419-29.

  • 26.

    Paul Geladi. Some Special topics in multivariate image analysis. Chemom. Intell. Lab. Syst. 1990;9:375-90.

  • 27.

    Akrami A, Niazi A. Application of MIA for a QSAR Study of Inhibitory Activity of DHP Derivatives and Design of New Compounds with Using WT and GA for Pixel Processing. Polycycl. Aromat. Compd. 2017;37:442-55.

  • 28.

    Freitas MP, Brown SD, Martins J. MIA-QSAR: A Simple 2D Image-based Approach for Quantitative Structure activity Relationship Analysis. J. Mol. Struct. 2005;738:149-54.

  • 29.

    Garkani-Nejad Z, Poshteh-Shirani M. Application of Multivariate Image Analysis in Modeling C-13-NMR Chemical Shifts of Mono Substituted Pyridines. Talanta. 2010;83:225-32. [PubMed ID: 21035668].

  • 30.

    Kennard RW, Stones LA. Computer Aided Design of Experimental. Technometrics. 1969;11:137-48.

  • 31.

    Bagheban Shahri F, Niazi A, Akrami A. Application of Wavelet and Genetic Algorithms for QSAR Study on 5-Lipoxygenase Inhibitors and Design New Compounds. J. Mex. Chem. Soc. 2015;59:203-10.