Abstract
Keywords
QSAR 4-Oxo-1 4-dihydroquinoline Oxo-4H-pyrido[1 2-a]pyrimidine Neural network
Introduction
The causative agent of acquired immune deficiency syndrome (AIDS) is the human immunodeficiency virus type 1 (HIV-1)(1-3). During the past 3 decades, the combination of antiretroviral drugs in HAART (highly active antiretroviral therapy) regimens has transformed the management of HIV infection from a fatal disease to a manageable chronic condition(4,5). However, resistance to marketed anti-HIV drugs is increasing at an alarming rate. Thus, there is a need to develop new agents possessing modified scaffolds which work by different mechanisms.
Although there are several in-vivo and in-vitro assay methods available for screening the biological activity of chemicals, they are costly and time-consuming. Therefore, computational methods have been developed as an alternative tool for predicting properties of chemicals (6). One of these computational methods is Quantitative structure–activity relationship (QSAR) study which plays a critical role in the rational drug design (7-8). The major aim of QSAR study is to develop quantitative models to predict compounds' properties such as biological activity, and these models can contribute to reduction of the time of drug discovery(9-10). Biological activities are a function of molecular descriptors which derived from the chemical structure of a set of molecules. Therefore, a model containing those calculated descriptors can beused to estimate activities of new compounds. The application of QSAR models usually requires variable selection for constructing coherent models (13-15). Through the years Different calculated descriptors and variable selection methods were used to build QSAR models capable of accurate prediction of anti-HIV-1 activity of compounds (12-13). In this study, we employed the stepwise (SW) selection method for the variable selection in the multiple linear regression (MLR) method. In addition, wavelet transformation and artificial neural network was used as nonlinear system for QSAR modeling. The aim of this paper is to search for an efficient method to build an accurate quantitative relationship between the molecular structure and the anti-HIV-1 activity of novel 4-oxo-1,4-dihydroquinoline and 4-oxo-4H-pyrido[1,2-a]pyrimidine derivatives. The findings can be helpful for designing new active derivatives.
Experimental
Data set
A data set of twenty five 4-oxo-1,4-dihydroquinoline and 4-oxo-4H-pyrido[1,2-a]pyrimidine derivatives which have been synthesized and evaluated as anti-HIV-1 agents in our laboratory were selected (16-17). Anti-HIV-1 activities of the molecules as inhibition rate of p24 expression (IR) in cell culture were converted into corresponding log inhibition rate of p24 expression (log IR). The total set of molecules was randomly divided into a training set (20 compounds) for generating QSAR model and a test set (5 compounds) for validating the quality of the model. The general chemical structures and biological activity values of all of the compounds are shown in Table 1.
Software
The geometry optimization was performed with HYPERCHEM (version 8.0; Hyper Chem, Alberta, Canada)(18). For the calculation of the molecular descriptors, the DRAGON 2.1 software was used(19). The SPSS software (version 13.0; SPSS Inc., Chicago, IL, USA) was employed for the simple MLR analysis(20). The SW-ANN calculations were performed in the MATLAB (version 7.0, Math Works, Natick, MA, USA).
Molecular descriptors and geometry optimization
The chemical structures of the molecules were drawn using the Hyper chem 8.0 software. The pre-optimization was conducted using the molecular mechanics force field (MM+)procedure included in Hyper chem, and then the molecular structures were finally optimized by the semi-empirical method AM1 using the Polak-Ribiere algorithm until the root mean square gradient was 0.01 kcal mol-1. The resultant geometry was transferred into the Dragon software package to calculate the descriptors in constitutional descriptors, topological descriptors, molecular walk counts, BCUT descriptors, Galves topological charge indices, 2D autocorrelations, charge descriptors, aromaticity indices, Randic molecular profiles, geometrical descriptors, 3D-MoRSE descriptors,WHIM descriptors, GETAWAY descriptors, empirical descriptors. The 842 descriptors were first analyzed for the existence of constant or near constant variables. Secondly, correlation among descriptors and with the activity of the molecules was calculated and collinear descriptors (i.e. correlation coefficient between descriptors is greater than 0.9) were detected. Descriptors that contain a high percentage (>90%) of identical values for all the 25 molecules were discarded. Among the collinear descriptors, the one exhibiting the highest correlation with the activity was remained and others were removed from the data matrix. Then, the remaining descriptors were collected in an n×m data matrix (D), where n = 25 and m = 243 are the numbers of the compounds and the descriptors, respectively.
Artificial neural networks as a nonlinear system predictor
In mathematics, regression analysis is a process for estimating the relationships among variables. That is, regression analysis can be used to infer causal relationships between the independent and dependent variables. Regression analysis is widely used for prediction and forecasting. It includes many techniques for creating an appropriate model and analyzing several independent variables. Among these techniques, artificial neural networks (ANN) are frequently applied to carry out this problem due to function approximation capability. In this paper we considered a multi-layer perceptron network (MLP) which is the most well-known form of ANNs as a predictor to generate future values of the output (21-23). On the other hands, it seems that existence of large number of descriptors as the inputs to predictor network brings more complexity in MLP network. To avoid this, a stepwise variable selection stage is used to choose most proper descriptors as network inputs. The method proceeds as depicted in Figure 1 schematically.
Results
Variable selection
The step-by-step iterative building of a regression model that involves automated choice of independent variables. Based on statistical significance of each variable in a regression, stepwise method can be achieved either by trying out one independent variable at a time and including it in the regression model or removing it based on the obtained value for parameter of p. In this systematic manner, the p-value of an F-statistic is computed to test models with the present variables. The available data set is a matrix with size of 20×243 where 20 and 243 are total number of training group and variables respectively. At the end of this stage, the best set of the calculated descriptors was selected in the dataset as network inputs in order to train MLP network.
Results of SW-MLR method
The MLR analysis with a stepwise selection and the variables elimination was employed to relate the anti-HIV-1 activity to a different set of descriptors. The SW-MLR analysis led to the derivation of one model, with four variables (the closest to the ratio of five training molecules for each descriptor) and good statistical parameters for the training set and with low generalization and prediction ability for the prediction set (Tables 1 and 2). It is described by the following equation:
Log Inhibition Rate = –6.21(±2.53) + 0.92(±0.14) GATS6v – 25.12(±6.36) JGI5 + 8.10(±2.61)ISH – 41.21(±3.20)R6p+.
Chemical structures and experimental and predicted activities for 4-oxo-1,4-dihydroquinoline and 4-oxo-4H-pyrido[1,2-a]pyrimidine analogs by SW-MLR.
Statistical parameters of SW-MLR model
Training set | Test set | F | Q2LOO | |
---|---|---|---|---|
SEE | R2 | R2 | ||
0.10 | 0.93 | 0.30 | 47.44 | 0.84 |
The obtained statistical parameter of the leave-one-out cross-validation test (Q2) on SW-MLR model was 0.84, which indicates reliability of the proposed model. The plots of the predicted log IR versus the experimental log IR, obtained by the SW-MLR modeling, are demonstrated in Figure 2.
However, this procedure produced good results for the training set, but it did not display good results for the prediction set. Therefore, the SW-MLP was used to build an appropriate model for both training and test set.
Results of SW-MLP method
We used a multi-layer perceptron neural network with back-propagation training algorithm to predict new output values after training stage. Generally, the input vector to the network was provided by the nominated variables of previous stage while the output layer was composed of a single neuron. The number of hidden layers and the number of nodes in each hidden layer affect the generalization capability of the network. For this purpose, the structure of the net is considered as 40×30×5×1 neurons. Moreover, the training of the MLP network involves finding values of the connection weights, which minimize an error function between the actual network output and the corresponding target value in the training set. Hence, the training algorithm used is based on resilient propagation algorithm. To generating neural network, the data set was broken into two sections as the number of 20 experiments were used to train and rest of them were applied to test the network performance. Figure 3 illustrates the obtained results of feeding training and test inputs to MLP network during these stages.
Discussion
QSAR results can provide useful chemical visions for designing new compounds. For this purpose, interpretation of the descriptors appeared in the resulting models was discussed below(24).
GATS6v (Geary autocorrelation of lag 6 weighted by van der Waals volume) is one of the 2D autocorrelation descriptors which has appeared in the SW-MLR model. In this descriptor, the Geary coefficient is a distance-type function, that function is any physico-chemical property calculated for each atom of the molecule, such as atomic mass, polarizability, etc. Therefore, the molecule atoms represent the set of discrete points in space and the atomic property the function evaluated at those points. The physico-chemical property in this case is van der Waals volume.GATS6v displays a positive sign, which indicates that the increasing the van der Waals volume value causing an improvement in log IR value.
JGI5 (mean topological charge index of order 5) is the second descriptor in the SW-MLR model. Topological charge indices were proposed to evaluate the charge transfer between pairs of atoms, and therefore the global charge transfer in the molecule. The JGI5 is closely related to the molecular branching. This descriptor has a significant negative effect on the anti-HIV-1 activity of analogs. The negative sign suggests that the Log IR value is inversely related to this descriptor. Subsequently, molecular branching results in log IR value decrease.
The third descriptor is ISH (standardized information content on the leverage equality) which has a positive influence on analogs activities. Standardized information content on the leverage equality mainly encode information on molecular symmetry; if all the atoms have different leverage values, i.e., the molecule does not have any element of symmetry, ISH is equal to one. Otherwise, if all the atoms have equal leverage values (a perfectly symmetric theoretical case), ISH is equal to zero. Based on the SW-MLR model decreasing molecular symmetry improves anti-HIV-1 activity of compounds.
The forth descriptor of the SW-MLR model was R6p+ (R maximal autocorrelation of lag 6/weighted by polarizability). This descriptor is a GETAWAY type and is related to the polarizability of the atoms in the molecule.This descriptor displays a main negative sign, which indicates that the log IR is inversely related to the polarizability of the molecules.
It is concluded that based on SW-MLR model polarizability, the molecular symmetry and branching and the atomic van der Waals volumes have effects on the anti-HIV-1 activity of the studied compounds. The results of SW-MLP model confirmed that polarizability, the molecular symmetry and the atomic van der Waals volumes are important molecular properties which influence anti-HIV-1 activity of training and test set compounds. SW-MLP model also indicated that atomic masses and electronegativity of atoms have significant effects on the activity of compounds.
Conclusion
The QSAR analysis was performed on a series 4-oxo-1,4-dihydroquinoline and 4-oxo-4H-pyrido[1,2-a] pyrimidine derivatives with the usage of the MLR and artificial neural network and filtering methods. Over 842 theoretically derived descriptors were calculated for each molecule. The best set of the calculated descriptors was selected with the step-wise method. Multiple linear regression and artificial neural network as nonlinear system were used for QSAR modeling. Two models exhibited good statistical qualities for the training group. In parallel, the SW-MLP (nonlinear system) was found to be superior to the SW-MLR with reference to the test set predictions. Based on QSAR models results, electronegativity, the atomic masses, the atomic van der Waals volumes, the molecular symmetry and polarizability were found to be important factors controlling the anti-HIV-1 activity.
References
-
1.
Barre-Sinoussi F, Chermann JC, Rey F. Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immunodefficiency syndrome (AIDS). Sci. 1983;220:868-871.
-
2.
Gallo RC, Sarin PS, Gelmann EP. Isolation of human T-cell leukemia virus in acquired immune deficiency syndrome (AIDS). Sci. 1983;220:865-867.
-
3.
Levy JA. Pathogenesis of human immunodificiency virus infection. Microbiol. Rev. 1993;57:183-289. [PubMed ID: 8464405].
-
4.
Richman DD. HIV chemotherapy. Nature. 2001;410:995-1001. [PubMed ID: 11309630].
-
5.
Palella J, Delaney FJ, Moorman KM, Loveless AC, Fuhrer MO, Satten J, Aschman G A, Holmberg DJSD. Declining morbidity and mortality among patients with advanced human immunodeficiency virus infection HIV outpatient study investigators. N. Engl. J. Med. 1998;338:853-860. [PubMed ID: 9516219].
-
6.
Young DC. Computational Drug Design: A Guide for Computational and Medicinal Chemists. New Jersey: John Wiley & Sons, Inc; 2009. p. 171-175.
-
7.
Ketabforoosh SHME, Amini M, Vosooghi M, Shafiee A, Azizi E, Kobarfard F. Synthesis, evaluation of anticancer activity and QSAR study of heterocyclic esters of caffeic acid. Iran. J. Pharm. Res. 2013;12:705-719. [PubMed ID: 24523750].
-
8.
Inamdar P, Bhandari S, Sonawane B, Hole A, Jadhav C. Structure optimization of neuraminidase inhibitors as potential anti-influenza (H1N1Inhibitors) agents using QSAR and molecular docking studies. Iran. J. Pharm. Res. 2014;13:49-65. [PubMed ID: 24734056].
-
9.
Wang Y, Wang XW, Cheng YY. A computational approach to botanical drug design by modeling quantitative composition–activity relationship. Chem. Biol. Drug Des. 2006;68:166-172. [PubMed ID: 17062014].
-
10.
Freitas MP, da Cunha EFF, Ramalho TC, Goodarzi M. Multimode methods applied on MIA descriptors in QSAR. Curr. Comput. Aided Drug Des. 2008;44:273-282.
-
11.
Xuan S, Wub Y, Chen X, Liu J, Yan A. Prediction of bioactivity of HIV-1 integrase ST inhibitors by multilinear regression analysis and support vector machine. Bioorg. Med. Chem. Lett. 2013;23:1648-1655. [PubMed ID: 23395655].
-
12.
Yasri A, Hartsough D. Toward an optimal procedure for variable selection and QSAR model building. J. Chem. Inf. Comput. Sci. 2001;41:1218-1227. [PubMed ID: 11604021].
-
13.
Knaggs MH, McGuigan C, Harris SA, Heshmati P, Cahard D, Gilbert IH, Balzarini J. A QSAR study investigating the effect of L-alanine ester variation on the anti-HIV activity of some phosphoramidate derivatives of d4T. Bioorg. Med. Chem. Lett. 2000;10:2075-2078. [PubMed ID: 10999474].
-
14.
Lameira J, Medeiros IG, Reis M, Santos AS, Alves CN. Structure–activity relationship study of flavone compounds with anti-HIV-1 integrase activity: A density functional theory study. Bioorg. Med. Chem. 2006;14:7105-7112. [PubMed ID: 16890444].
-
15.
Saı´z-Urra L, Pe´rez Gonza´lez M, Fall Y, Go´mez G. Quantitative structureeactivity relationship studies of HIV-1 integrase inhibition Getaway descriptors. Euro. J. Med. Chem. 2007;42:64-70.
-
16.
Hajimahdi Z, Zabihollahi R, Aghasadeghi MR, Zarghi A. Design, synthesis and docking studies of new 4-hydroxyquinoline-3-carbohydrazide derivatives as anti-HIV-1 agents. Drug Res. 2013;63:192-197.
-
17.
Hajimahdi Z, Zarghi A, Zabihollahi R, Aghasadeghi MR. Synthesis, biological evaluation, and molecular modeling studies of new 1,3,4-oxadiazole- and 1,3,4-thiadiazole-substituted 4-oxo-4H-pyrido[1,2-a]pyrimidines as anti-HIV-1 agents. Med. Chem. Res. 2013;22:2467-2475.
-
18.
Hyper Chem Release 8. Hyper Cube, Inc; Availabe from: URL:http://www.hyper.com.
-
19.
-
20.
SPSS for Windows. Statistical Package for IBM PC, SPSS Inc; Available from: URL: http://www.spss.com.
-
21.
Haykin S. Neural Networks: A Comprehensive Foundation. Singapore: Pearson Education, Inc; 1999.
-
22.
Zupan JJ, Gasteiger J. Neural Networks for Chemists. An Introduction. Weinheim (Germany): VCH Publishers; 1993.
-
23.
Pivetta T, Isaia F, Trudu F, Pani A, Manca M, Perra D, Amato F, Havel J. Development and validation of a general approach to predict and quantify the synergism of anti-cancer drugs using experimental design and artificial neural networks. Talanta. 2013;115:84-93. [PubMed ID: 24054565].
-
24.
Todeschini R, Consonni V. Handbook of Molecular Descriptors. Weinheim: Wiley-VCH; 2002.