1. Background
Fasciolosis is an important helminthic disease caused by liver flukes, such as Fasciola gigantica, affecting a wide variety of domestic and wild animals, including cattle, sheep, goats, buffalo, deer, rabbit, and humans all around the world, especially in temperate and tropical climates. This host diversity contributes notably to the global spread of the infection (1). Fasciola gigantica is one of the most important parasitic helminths in Asia and Africa, affecting individual and small farming communities. It causes severe economic losses by decreasing animal production and livestock mortality. It is estimated that fasciolosis inflicts over US$3 billion in losses on the global animal industry (2). Fasciolosis has also been reported in humans in numerous countries. Iran is one of the countries where fasciolosis is a serious issue (3). Juvenile stages of Fasciola migrate through the circulatory system to the liver and mature into adults in the biliary ducts. The major cause of pathogenesis and clinical features of the infection is related to the migration of the parasite to the liver and the presence of adults in the biliary ducts (4).
Today, anti-parasitic drugs against fasciolosis are less effective due to the emergence of drug resistance. It has become a drastic problem in veterinary and medical sciences (5); thus, the attempts focus on designing a novel vaccine against the infection. Some candidates, including protease enzymes, have been proposed as the vaccine target (6). Proteases are secreted by different forms of parasites in order to facilitate parasitism and help the pathogens survive in their host for a long time (7). Cathepsin L (Cat L) enzyme, found in all stages of the liver flukes, plays a significant role in the biological functions of the parasites, such as migration, nutrition, and immune evasion (2). Vaccination with Cat L resulted in different protection rates against fasciolosis (8).
Bioinformatics has become a significant part of biology. Bioinformatics tools are irrefutable in comparing, analyzing, and interpreting genetic and genomic data and understanding evolutionary aspects of molecular biology and protein structure prediction. Increasing the knowledge about various aspects of the epitopes will result in the development and construction of novel epitope-based vaccines against pathogens.
2. Objectives
As far as we know, there is no published report regarding bioinformatics analysis of the Cat L sequences of FgNEJ. Therefore, the current study used bioinformatics methods to predict and identify B and T-cell epitopes and general features of Cat L of FgNEJ.
3. Methods
3.1. Post-Translation Modification and Topology Prediction
The sequence of Cat L of FgNEJ amino acid was extracted from Gene Bank, National Centre for Biotechnology Information (NCBI), through the server https://www.ncbi.nlm.nih.gov/nuccore/FJ617001.1/. The Molecular Weight (MW) and the isoelectric point (PI) of the protein sequence were calculated using the pI/Mw computing tool (http://www.expasy.ch/tools/pi_tool.html). Then, the SignalP 3.0 server (https://services.healthtech.dtu.dk/) was employed to signal peptide prediction. The WolfPSORT server (http://wolfpsort.seq.cbrc.jp/) was applied to predict the subcellular localization.
3.2. Secondary Structure Prediction
The secondary structure of the sequence was predicted using the GOR IV server, which is available at http://npsa-pbil.ibcp.fr/.
3.3. B-cell Epitope Prediction
Valid bioinformatics servers and software including BCPREDS (B-cell epitope prediction server) (http://ailab.ist.psu.edu/bcpred/predict.html), IEDB (Immune Epitope Database) (http://tools.iedb.org/bcell/), ABCpred (Artificial neural network based B-cell epitope prediction) (http://crdd.osdd.net/), and Bcepred (B- cell epitope prediction) were used to predict and analyze B-cell epitopes on the Cat L sequence. BCPREDS uses the combination of SSK (subsequence kernel) and SVM (support vector machine) methods to provide a prediction of continuous B-cell epitopes (9). The software obtains a single amino acid sequence in a plain format as input and gives each epitope a single score. Epitopes with high scores are listed in a table. The following items were set up as default to run the program: Epitope length: 20 amino acids, Classifier specificity: 75%, and the use of an overlap filter. Also, the continuous antibody epitopes of the sequence were predicted using the IEDB server. This server applies the following parameters to predict B-cell epitopes: flexibility (10), hydrophilicity (11), antigenicity (12), beta-turn (13), and surface accessibility (14). IEDB presents the results as graphs and tables. On the graphs, the Y-axes depict the correspondent score for each residue (averaged in the specified window), while the X-axes depict the residue positions in the sequence. The tables provide values of calculated scores for each residue. ABCpred is a bioinformatics program in which Artificial Neural Network (ANN), with a 65.93% accuracy, is used to predict B-cell epitope(s). It predicts B-cell epitopes based on scores obtained from a trained recurrent neural network. The output data is displayed in graphical and tabular forms. The epitopes are shown in blue in the graphic form. Furthermore, peptides with a high score are listed in a table (15). The default parameters derived from this program were used in the present study. Bcepred applies physicochemical properties, with 58.7% accuracy, including flexibility, turn, accessibility, antigenicity, and hydrophilicity, to predict epitopes (16).
3.4. Prediction of MHC-I and MHC-II Binding Epitopes
For the prediction of peptides binding to Major Histocompatibility Complex (MHC) class 1 and class 2 molecules, NetMHCcons 1.1 (17) (http://www.cbs.dtu.dk/services/NetMHCcons/) and NetMHCIIpan 3.2 (18) (http://www.cbs.dtu.dk/) online servers were applied, respectively. In NetMHCcons, the user can select the MHC molecule, along with a long list of alleles, and upload protein sequences. NetMHCIIpan is applied to predict human MHC class II isotypes, including HLA-DR, HLA-DP, HLA-DQ, and mouse molecules (H-2) (19). These servers provide the value as the half maximal inhibitory concentration (IC50) in nano Molars and %Rank. In NetMHCcons, a peptide with a rank value under 0.5% or a binding affinity (IC50) under 50 nM is predicted as a strong binder. Moreover, the peptide with a rank value under 2% or a binding affinity (IC50) under 500 nM is predicted as a weak binder. In NetMHCIIpan, peptides are predicted as strong and weak binders if the rank value is below 2% and 10%, respectively. According to previous studies (20), five frequently occurring alleles, including HLA- A01:01, HLA-A02:01, HLA-A26:01, HLA-B27:05, and HLA-B39:01 as human MHC class I molecules and 5-allele HLA reference set including DRB1_0301, DRB1_0701, DRB3_0101, DRB5_0101, and HLA-DPA10103-DPB110201 as human MHC class II molecules were chosen. By default, the length of predicted peptides was 10 amino acids in NetMHCcons and 15 amino acids in NetMHCIIpan methods.
3.5. Cytotoxic T lymphocyte (CTL) Epitopes Prediction
The CTLpred online server (21) (http://www.imtech.res.in/raghava/ctlpred/index.html) was employed to predict CTL epitopes. It directly predicts CTL epitopes and uses the information of T-cell epitopes rather than MHC binders. The basis of the server is on elegant machine learning techniques: Artificial Neural Network (ANN) and Support Vector Machine (SVM). The current study used the combined method (ANN+SVM), with 76% accuracy, to predict CTL epitopes on the protein sequence. The ANN (0.51) and SVM (0.36) cutoff scores were set by default. It is used to distinguish the epitopes and non-epitopes.
4. Results
4.1. Post-translation Modification and Topology Prediction
The complete amino acid sequence of Cat L submitted to the SignalP 3.0 server revealed a signal peptide composed of 15 amino acids, with a cleavage site between 15 and 16: VFA-SN (Figure 1). The WolfPSORT subcellular localization analysis classified Cat L of FgNEJ into 32 different families of proteins belonging to extracellular 24, lysozyme 4, integral membrane protein 1, mitochondrial inner membrane 1, endoplasmic reticulum membrane 1, cytoplasmic 1, and peroxisomal 1. The MW of Cat L was predicted to be 37 Kd. Also, the aliphatic index of the protein sequence was calculated to be 61.82.
4.2. Secondary Structure Prediction
The secondary structure analysis was performed using the GOR secondary structure prediction online server. The results showed that the random coil, alpha-helix, and extended strand ratios on the Cat L sequence were 164 (50.31%), 77 (23.62%), and 85 (26.07%), respectively (Figure 2).
4.3. B-cell Epitopes Prediction
The BCPREDS server was applied to predict Linear B-cell epitopes on the sequence. Further, a score over the threshold was considered the B-cell epitope. Table 1 presents 5 epitopes with the highest score.
Position | Epitope | Score |
---|---|---|
188 | GLETESSYPYKAEEGPCKYD | 0.97 |
159 | QLVDCSGDYGNRGCSGGFME | 0.965 |
272 | LVVGYGTQDGTDYWIVKNSW | 0.954 |
90 | RASDIHSHGIPYEANDRAVP | 0.866 |
117 | FGYVTEVKDQGDCGSCWAFS | 0.842 |
Linear B-cell Epitopes Predicted on Cat L of FgNEG Using BCPREDS
Furthermore, the results of the IEDB prediction are drawn in Figure 3. The least, average, and highest threshold scores for the used parameters were as follows: beta-turn (0.567, 1.010, 1.414), surface accessibility (0.064, 1.00, 4.847), flexibility (0.890, 0.988, 1.092), antigenicity (0.865, 1.016, 1.178), and hydrophilicity (-7.243, 1.697, 6.829). Subsequently, the ABCpred server was applied to predict the epitopes of Cat L of FgNEJ. It ranked the predicted epitopes based on their scores. A higher score for any epitope indicates an increased probability as an epitope. This server predicted 32 epitopes on the sequence. The highest score (0.96) was for linear epitope HGYIRMARNRDNMCGI. Ten epitopes with higher scores were noted in Table 2. Also, the Bcepred server predicted 5 chemo-physical parameters of the epitopes (Table 3). These results demonstrated potential B-cell epitopes on Cat L of the FgNEJ sequence, which are key targets for further diagnostic and vaccine studies.
The output of the IEDB server is based on parameters including (A) Hydrophilicity, (B) Antigenicity, (C) Flexibility, (D) Surface accessibility, and (E) Beta turn. The residue with a higher score represents that the residue might have a higher probability of being part of the epitope (those residues are colored in yellow on the graphs)
Rank | Sequence | Start Position | Score |
---|---|---|---|
1 | HGYIRMARNRDNMCGI | 298 | 0.96 |
2 | YGTQDGTDYWIVKNSW | 276 | 0.94 |
3 | GSLWGDHGYIRMARNR | 292 | 0.92 |
4 | YKAEEGPCKYDSRLGV | 197 | 0.91 |
5 | DWREFGYVTEVKDQGD | 113 | 0.91 |
6 | GLETESSYPYKAEEGP | 188 | 0.90 |
7 | QQLVDCSGDYGNRGCS | 158 | 0.90 |
8 | KHIQEHNIRHDLGLVT | 49 | 0.89 |
9 | RGGIYASRNCSSEKLN | 253 | 0.89 |
10 | HEWKRMYNKEYNGVDD | 22 | 0.89 |
B-cell Epitopes Predicted on Cat L of FgNEJ Using ABCpred
Parameter | Epitope Sequence |
---|---|
Hydrophilicity | NGVDDAH, DDAHRRN, PYEANDR, YEANDRA, EVKDQGD, VKDQGDC, KDQGDCG, DQGDCGS, QGDCGSC, DCSGDYG, SGDYGNR, EYNGVDD |
Antigenicity | RLFILAI, LFILAIL, FILAILT, ILAILTF, IRHDLGL, RHDLGLV, HDLGLVT, DLGLVTY, LGLVTYT, RAVPESI, FGYVTEV, FASLPVV, ASLPVVE, SLPVVEP, LPVVEPF |
Flexibility | NGVDDAH, REIPRAS, EIPRASD, IPRASDI, AVPESID, VTEVKDQ, TEVKDQG, EVKDQGD, VKDQGDC, KDQGDCG, DQGDCGS, QGDCGSC, YKAEEGP |
Accessibility | YNKEYN, NKEYNG, KRMYNK, RMYNKE, DDAHRR, YEANDR, QYMKNQ, YMKNQK, ARNRDN, RNRDNM, |
Turn | GVFASND, ASNDDLW, SNDDLWH, YNKEYNG, NKEYNGV, KEYNGVD, EYNGVDD, YNGVDDA, HSHGIPY, SHGIPYET KDQGDCG, GDCGSCW, GNRGCSG, GDYGNRG |
B-cell Epitopes Predicted on Cat L of FgNEJ Using Bcepred
4.4. Prediction of MHC-I and MHC-II Binding Epitopes
Data from NetMHCcons and NetMHCIIpan servers are presented in Tables 4, and 5, respectively. For each MHC, 5 alleles were selected. The data used by the servers included a table containing the position, used allele, predicted peptide, Affinity/IC50 (nM), %Rank, and binding level (strong or weak). Five peptides with high affinity to MHC molecules were selected for each allele. The results revealed several epitopes strongly coupled to MHC I and MHC II molecules in Cat L. Strongest binders belonged to HLA-A01:01 (MCH I) and DRB1_0701 (MHC II).
Allele and Position | Peptide | Affinity | Binding Level |
---|---|---|---|
HLA-A01:01 | |||
71 | FTDMTFEEFK | 396.76 | SB |
74 | MTFEEFKAKY | 454.22 | SB |
109 | ESIDWREFGY | 1097.07 | SB |
110 | SIDWREFGYV | 5500.22 | SB |
175 | FMEHAYEYLY | 55.68 | SB |
HLA-A02:01 | |||
3 | FILAILTFGV | 4.57 | SB |
25 | RMYNKEYNGV | 25.41 | SB |
248 | FLMYRGGIYA | 14.17 | SB |
311 | GIASFASLPV | 39.38 | SB |
277 | TQDGTDYWIV | 289.91 | WB |
HLA-A26:01 | |||
74 | MTFEEFKAKY | 47.34 | SB |
109 | ESIDWREFGY | 44.84 | SB |
135 | SATGAMEGQY | 1874.27 | WB |
185 | EVGLETESSY | 534.26 | WB |
HLA-B27:05 | |||
38 | HRRNIWEENV | 784.45 | WB |
39 | RRNIWEENVK | 138.16 | SB |
55 | IRHDLGLVTY | 223.61 | WB |
300 | IRMARNRDNM | 168.78 | WB |
303 | ARNRDNMCGI | 1311.49 | WB |
HLA-B39:01 | |||
57 | HDLGLVTYTL | 309.35 | WB |
113 | WREFGYVTEV | 405.44 | WB |
262 | EHAYEYLYEV | 229.74 | WB |
264 | TQDGTDYWIV | 149.03 | WB |
Prediction of MHC-I Binding Epitopes on Cat L of FgNEJ Using NetMHCcons
Allele and Position | Peptide | Affinity | Binding Level |
---|---|---|---|
DRB1_0301 | |||
50 | HIQEHNIRHDLGLVT | 223.82 | WB |
51 | IQEHNIRHDLGLVTY | 107.03 | SB |
52 | QEHNIRHDLGLVTYT | 89.42 | SB |
53 | EHNIRHDLGLVTYTL | 79.73 | SB |
54 | HNIRHDLGLVTYTLG | 87.56 | SB |
DRB1_0701 | |||
310 | MCGIASFASLPVVEP | 38.94 | SB |
309 | NMCGIASFASLPVVE | 35.04 | SB |
307 | RDNMCGIASFASLPV | 47.03 | WB |
308 | DNMCGIASFASLPVV | 33.80 | SB |
311 | CGIASFASLPVVEPF | 44.05 | WB |
DRB3_0101 | |||
50 | HIQEHNIRHDLGLV | 221.36 | WB |
51 | IQEHNIRHDLGLVTY | 125.50 | WB |
52 | QEHNIRHDLGLVTYT | 115.17 | WB |
53 | EHNIRHDLGLVTYTL | 101.56 | WB |
54 | HNIRHDLGLVTYTLG | 121.59 | WB |
DRB5_0101 | |||
296 | GDHGYIRMARNRDNM | 41.12 | WB |
297 | DHGYIRMARNRDNMC | 42.07 | WB |
298 | HGYIRMARNRDNMCG | 39.00 | WB |
299 | GYIRMARNRDNMCGI | 48.01 | WB |
300 | YIRMARNRDNMCGIA | 112.03 | WB |
HLA-DPA10103-DPB110201 | |||
71 | QFTDMTFEEFKAKYL | 115.56 | SB |
72 | FTDMTFEEFKAKYLR | 140.86 | WB |
73 | TDMTFEEFKAKYLRE | 203.14 | WB |
74 | DMTFEEFKAKYLREI | 166.61 | WB |
75 | MTFEEFKAKYLREIP | 172.08 | WB |
Prediction of MHC-II Binding Epitopes on Cat L of FgNEJ Using NetMHCIIpan
4.5. Prediction of CTL Epitopes
Prediction of cytotoxic T lymphocytes from the CTL server was performed based on their scores. Finally, the top 10 epitopes with the highest score were obtained by CTLpred, as presented in Table 6.
Peptide Rank | Start Position | Sequence | Score (ANN/SVM) |
---|---|---|---|
1 | 1 | MRLFILAIL | 1 |
2 | 209 | RLGVAKVNG | 1 |
3 | 278 | TQDGTDYWI | 1 |
4 | 16 | SNDDLWHEW | 0.990 |
5 | 89 | PRASDIHSH | 0.990 |
6 | 176 | FMEHAYEYL | 0.990 |
7 | 187 | VGLETESSY | 0.990 |
8 | 63 | VTYTLGLNQ | 0.980 |
9 | 105 | DRAVPESID | 0.980 |
10 | 231 | HLVGDKGPA | 0.980 |
Predicted CTL Epitopes of Cat L of FgNEJ Using CTLpred
5. Discussion
Parasitic infections are a major threat to humans and animals, leading to serious diseases (22, 23). The control strategy for Fasciola infection is currently based on using antihelmintic drugs. However, due to the long-term use of drugs, drug resistance has been reported in many countries, including Australia, Ireland, and Argentina (24). Vaccination seems to be the best way to control fasciolosis infection (25) Vaccines are more sustainable, cost-effective, and environmentally friendly tools that significantly boost immunity against pathogens. They mostly consist of killed or attenuated pathogens, which may sometimes be dangerous for the host due to problems in its preparation process. Thus, it is safer to apply an epitope-based vaccine, inducing and stimulating an immune response against the specific pathogen (26). Different antigens have been described as vaccine targets against fasciolosis. It has been reported that cysteine proteinases are promising vaccine antigens as they play a vital role in host-parasite interaction. Cathepsins are from the family of cysteine proteases expressed in Fasciola species. Several isoforms of Cat L and B are expressed in different stages of Fasciola parasites (27).
One of the most valuable ways to design and deliver an epitope-based vaccine is by analyzing the peptide signal and antigen properties using bioinformatics tools (20, 28). Introductory bioinformatics analysis of potential vaccine candidates causes a significant reduction in the number of used animals in vaccine trials and provides important and essential data about the protein of interest in an easy and simple way (29). In order to construct an ideal vaccine with perfect protection against pathogens, it is crucial to understand the entire structure of their antigens (30). Therefore, computational analysis can be pivotal in the vaccine design process. The present study employed various bioinformatics tools to design the vaccine to learn more about the Cat L of FgNEJ. The signal peptide is the most common characteristic researchers analyze when their target is to find secreted or membrane-bound proteins in the amino acid sequence (31). In the present study, the Cat L analysis revealed an N-terminal signal peptide using the bioinformatics software.
The prediction of protein structures improves our knowledge regarding the function and biological aspects of the protein. The secondary structure plays a significant role in epitope function (32). In the current study, the maximum parts of the secondary structure belonged to the random coil, which is introduced on the protein surface and may be a potential epitope (33). Additionally, alpha-helix and beta-turn are commonly found in the internal part of the protein, and they retain proteins by high chemical-bond energy. It does not seem to work as an epitope.
The epitope is part of the antigen, detected by B-cell and T-cell molecules of the mammalian immune system. Only a few amino acids of a protein (instead of the total protein) can provide enough protective response. Therefore, predicting or identifying this part of the amino acid can be vital to understanding the mechanisms of immunity and pathogenesis of pathogens (32). Most importantly, these epitopes can be used to design epitope-based vaccines (34). Prediction of B-cell and T-cell epitopes is a key step for vaccine design, development of diagnostic reagents, and understanding the antigen-antibody interactions on a molecular level (20). Predicting linear B-cell epitopes in Cat L of FgNEJ was performed using reliable bioinformatics software to find potential targets for vaccines against Fasciola infection.
The current study applied bioinformatics tools such as BCPREDS, ABCpred, Bcepred, and IEDB to predict B-cell epitopes on Cat L of FgNEJ. The results of BCPREDS identified 5 potential antigenic epitopes on the sequence. Also, based on the data from ABCpred software, of 32 epitopes predicted on Cat L, 10 high-score epitopes can be used as targets in vaccine research. Additionally, we applied Bcepred to predict linear B-cell epitopes. The software used physical and chemical properties such as hydrophilicity, accessibility, flexibility/mobility, exposed surface, and turns in order to predict B-cell epitopes. Each of these properties predicted several B-cell epitopes on Cat L. Some protein parameters, such as flexibility, hydrophilicity, turns, accessibility, polarity, exposed surface, and antigenic propensity, have been associated with the localization of continuous epitopes. In the present study, the data obtained from IEDB disclosed some potential epitopes on Cat L of FgNEJ. The results from these tools indicated potential epitopes on this protein that can be strong candidates for designing an effective vaccine against fasciolosis (after testing in animal models). In a study performed on antibody recognition of cathepsin L, the characteristics of known linear B-cell epitopes on F. hepatica Cat L protein using the sera of F. hepatica infected or vaccinated cattle in two independent experiments were identified, and these results were confirmed by other studies (35-37). The study also demonstrated that vaccinated animals with Cat L showed fluke burden reduction with eliciting antibodies (8). Another study on the Plasmodium vivax AMA-1 protein found similar results using the same bioinformatics tools, reported several promising epitopes on AMA-1, and suggested that these bioinformatics tools were valid to predict and identify epitopes on proteins (20).
The MHC molecules represent T-cell epitopes to T-cells. Peptide binding to MHC is a major factor in the selection of potential epitopes and is an essential step in presenting the antigen to T-cells. Prediction of peptide-MHC binding epitopes could be vital for constructing a reliable epitope-based vaccine against infections (38). The significant information to understand the process of infection pathogenesis was provided by the prediction and characterization of both CD8 and CD4 T-cell epitopes on a protein (39). Another major step in vaccine development is the identification of CTL-stimulating peptides. In this study, 10 epitopes were predicted using CTLpred software. Responses derived from cytotoxic T-cells are induced during a pathway composed of intracellular antigens refining upon even epitopes as common targets.
5.1. Conclusions
The vaccine design against infectious Fasciola is still a major necessity. In fact, no commercial vaccine is available to control human and animal fasciolosis. High-quality vaccines can be designed with accurate analysis of antigens by valid bioinformatics tools. This requires the identification of potential antigens with robust protective responses. Computational analysis provides substantial information to recognize and represent proteins with immunogenic properties, which facilitates finding promising epitopes for vaccine design. In the present study, various properties, structures, and B and T-cell epitopes of Cat L of FgNEJ were predicted using valid bioinformatics tools, suggesting potential epitopes on Cat L to design an efficient vaccine against fasciolosis. Moreover, the data from the current study can be used as basic and useful information for further in vivo studies.