Polycyclic aromatic hydrocarbons (PAHs) are abundant and important environmental non-ionic pollutants. These organic compounds, which are extracted from emissions of anthropogenic sources, annually comprise their largest part worldwide (
1). Synthetically, PAHs are formed during anthropogenic activities such as incomplete burning or pyrolysis of carbon-containing materials like coal, wood, oil products and garbage in air-deficient environments (
2), which like other similar contaminants, can be found globally in different amounts in soils (
3,
4) and can be identified in a variety of waters and wastewaters. They rarely dissolve in water, but have solubility in organic solvents and are highly lipophilic (
5). Phenanthrene, a polycyclic aromatic hydrocarbon (PAH), is an important class of organic pollutants, mainly because of its wide distribution in the environment and its carcinogenic and mutagenic properties (
6,
7). The chemical structure and attributes of the phenanthrene are shown in
Figure 1 (
8).
Chemical Structure of Phenanthrene
Organic pollutants are of concern because they have toxic effects on living organisms including human. These toxic effects can be either acute or chronic and include disruption of the endocrine, reproductive and immune systems, neurobehavioral disorders, and carcinogenicity (
8).
Depending on PAH type and exposure mode, exposure to these pollutants has contributed to an increasing risk of cancer development in different tissues including bladder, lung, stomach, skin and scrotum (
9). As an organic pollutant, phenanthrene is also toxic and can enter human body through ingestion, breathing, or skin sorption. It is known as a human skin photosensitizer and mild allergen, and under specialized conditions, it is mutagenic to human microbial system (
10).
Existence of PAHs in the environment and especially in soil causes a serious risk to human health over each food chain. To describe fate and behavior of organic contaminants, numerous environmental scholars have confirmed and used two parameters: (i) the soil sorption coefficient (K
d) and (ii) the soil organic carbon sorption coefficient (K
oc). These parameters show the strength of contaminants’ sorption to obtain surfaces at the water/solid interface; thus, it can demonstrate the environmental mobility and persistence (
11). The higher their values, the more strongly the pollutants are absorbed to the interface, and consequently, they would be less moveable (
12).
Several models such as constant partition coefficient, practical parametric K
d model, and empirical equations have been suggested to estimate K
d values. General equations predicting K
d , mostly derived empirically from statistical analysis, possess a linear or nonlinear polynomial framework; though, their accuracy is not so satisfactory (
13).
Due to variances in the experimental conditions, chemical-based techniques for predicting K
oc involve measurement errors. Even when these variations are accepted statistically, property measurements are costly and laborious (
14). The methods applied to estimate K
oc act in accordance with the statistical relationships with other attributes such as octanol/water partition coefficient (Kow), water solubility (S), molecular descriptors (e.g. first-order molecular connectivity index, and bioconcentration factors (BCF)) (
15,
16). In fact, the relationships, suggested in the literature, expressed in a log-log form, were obtained by regression: Log (K
oc) = a Log (S, Kow, or BCF) + b (a and b are constants). Chemical property estimation programs like AUTOCHEM estimate Koc from Kow using the ‘Log (K
oc) = -0.55 Log (Kow) + 1.377’ equation. The CHEMEST computer program, which estimates chemical properties, allows the user to estimate K
oc using equations similar to those used by AUTOCHEM (
17). Karickhoff et al. (
18) studied Kow and K
oc for a series of polycyclic aromatics and chlorinated hydrocarbons including phenanthrene and obtained a correlation coefficient of 0.98 between them. Evaluating the detailed absorptive behavior of four PAH compounds with various chemical and structural characteristics, Means et al. (
19) reported an extremely significant correlation between K
oc and Kow, both in log form (R
2 = 0.98). Karickhoff (
20) developed equations for estimating K
oc from S and Kow. The correlation coefficients for linear and logarithmic forms between K
oc and Kow were 0.994 and 0.997, respectively. Investigating the relationship between the topological indices and the sorption coefficient (K
oc), Tao and Lu (
21) analyzed the molecular connectivity indices and polarity correction factors based on 543 chemicals, employing a stepwise regression for their effect on a linear model. Subsequently, they developed a linear model using three indices of molecular connectivity along with a set of polarity correction factors, whose R
2 values were greater than 0.86. Toul et al. (
22) obtained an empirical relationship between K
oc and Kow values, which is applicable for a variety of values with both parameters and for a wide scale of pollutants/absorbents. Based on various topological molecular descriptors, Mishra et al. (
23) also built several quantitative structure-activity relationship (QSAR) models to estimate the K
oc of the replaced anilines and phenols, and reported that a tetra-parametric model was optimal for re- modeling such compounds. However, the complexity of soil and environmental behavior led to studies that attempted to present more simplified models with lower and easier-to-obtain required data.
Artificial Neural Networks (ANNs), like other elastic and systematic methods that are more appropriate than the empirical models, were used for adapting the nonlinear relationships and complicated interactions (i.e. hidden relationships between input variables). Recently, ANNs have become common tools employed by scholars to predict the amount of contamination and concentration of various effluents and chemicals available in drinking water, wastewater, and groundwater. ANNs are mostly applied to diverse issues reflecting successful results (
24,
25). Likewise, several scholars have proved the applicability of ANN to adsorption systems (
26-
28). Gao et al. (
14) used linear regression and ANN to predict K
oc from Kow and S. Diaconu et al. (
27) used ANN for estimating the amount of phosphate pollutant adsorption and its adsorption rate to soil, and confirmed the capability of ANN. Similarly, Snidgha (
28) applied an optimization approach to create a neural network with three layers to predict the efficacy of removing phenol pollutant from aqueous solution, using peat soil as an adsorbent.
To the best of our knowledge, unfortunately, there is no work regarding phenanthrene sorption coefficients (Kd and Koc) modeling using soil organic matter (SOM) as input data, especially with the aid of artificial neural networks. Therefore, the aim of the current study was to investigate the accuracy of ANN models with minimum required data for estimating Kd and Koc and for modeling the pollutant (phenanthrene).
1.1. Artificial Neural Networks Model
As a processing inspired tool, an ANN is similar to biological nervous systems in processing information and includes a variety of highly interconnected processing elements (neurons) working together for problem solving. Being able to extract meaningful relationships from complex or vague data, ANNs are used to detect complicated formats and trends, which can be too challenging for humans (
27). Like natural networks, some neurons receive problem data (input layer neurons) in these models, whereas some other ones (hidden layer neurons) process them and another group (output layer neurons) present answer (
29). Therefore, each neural network has its own input and output layers whose neuron amount is determined by the given problem, and the decision designer (decision maker) will set the hidden layer (the number of its neurons). In this layer, i.e. network training, the procedure of determining connection weights for neurons with purpose of finding the set of weights between the neurons can determine the minimum number of errors processed. To allocate the connection weights, the gathered data of the examples of the given issue were used. Then, a computer program was used to determine the relative weights and to represent the problem behavior using the mentioned information. This process was corresponding with the network fitting to the training data (
27). Next, the values allocated to the input layer were multiplied by the weights of their own cells and of the next cells, and then, they were transferred to the following layer. Finally, all the inputs were summed in the next layer and the results were derived from its activity task, resulting in the cell’s output. The obtained rates of the latter layer included the responses offered for the problem, which would be the main answers after comparing with the observed values, if the calculation error was acceptable (
30). The usual algorithm for training networks is back propagation (BP). In BP, which is a supervised learning method, error values are calculated after each learning cycle and then the weight correction signals are distributed in the network. One of the most important parts used for determining the optimal structure of ANN is determining the number of neurons in hidden layer and achieving the lowest error, which is obtained by trial and error (
31). Compared to other methods, an advantage of the ANN model is that it does not need previous information about relations between inputs and outputs. In addition, it is less sensitive to error in input data. In other words, by using the minimum measured parameters, this model is able to predict target variables variation precisely (
32).
1.2. Artificial Neural Networks Description
To design and train the ANNs, a series of input and output including organic content and phenanthrene sorption coefficients, respectively, was necessary.
Due to the limited number of data and to obtain more reliable results, a cross-validation was used for selecting the best performing models which provides a means for building different training/testing splits guaranteeing that each data point is present at least once in the testing set. The whole phase is simple: (i) split the data into equal-sized groups. (ii) for I = 1 to n, select group i as the testing set and all other (n-1) groups as the training set. (iii) Train the model on the training set and measure it on the testing set. This iteration is called a fold. In general practice, setting n = 10 or 10-fold cross-validation is accepted, as it provides a very accurate estimate of the generalization error of the model (
33).
As far as there were 32 input samples in this study, an eight-fold cross-validation was used. To perform this procedure, the input data were divided into eight equally-(or nearly equal) sized parts (folds). Then, eight series of iteration of training and validation were conducted. During each step, a various segment of the data was used for validation and the other folds were used for training. Next, the trained models were used to predict the validity of data. Therefore, a network was once built and assessed with a new set of data. Due to performing a reliable test on a smaller set of data and a number of computational attempts, this procedure seems superior to the simpler trained-and-tested process, and results in eight networks.
The normalization of inputs is crucial for preventing any decrease in speed and correctness of network, as well as making data values equal (
34). After normalizing the data by
Equation 1, the mean of the data series was 0.5 (
35).

Equation 1.
Where xn is the normalized value, x is the actual value, x-bar xm is the mean value, xmin is the minimum value, and xmax denotes the maximum value of parameter.
For modeling with ANN, a multilayer perceptron (MLP) network was used with MATLAB 7.6 software. Changing weights among different layers, which is called training process, was repeated till the least differentiation between observed and predicted data is obtained. In this process, the learning rule was Marquardt-Levenberg using the sigmoid and hyperbolic tangent (Tansig) functions (
31). Finally, the trial-and-error method was employed to calculate the number of neurons per hidden layer.