1. Background
Globally, an estimated 185 million people have been infected with hepatitis C virus (HCV) as one of the major causes of cirrhosis and hepatocellular carcinoma (1). HCV genome consists of approximately 9.6 kilobases, positive-sense single-stranded RNA, which encodes three structural (C, E1 and E2) and 7 non-structural (p7, NS2, NS3, NS4A, NS4B, NS5A and NS5B) proteins flanked by 5’ and 3’ untranslated regions (UTR) (2). E1 and E2 proteins are type I transmembrane proteins with both N-terminal ectodomain and a C-terminal domain (3) and contain 6 and 11 glycosylation sites, respectively (4, 5). These proteins are involved in viral entry by interacting with CD81 and Scavenger receptor class B member 1 (SRB1) (6-8). HCV glycosylation sites play an essential role in envelope proteins to ensure correct conformation for virus entry (5, 9) and antigenic variation (10). HCV E2 glycosylation sites interact with cell surface receptors directly allowing the virus to enter the cell (11, 12). Glycosylation sites may mask important epitopes from host antibody responses (13, 14). B-cell epitopes are essential in increasing the preferred immune responses (15, 16) and number of epitopes and modulation of immune recognition of antigens can be influenced by deglycosylation of E1 proteins (17). The E1 derived peptide p35 (amino acid (aa) 315–323) (18), E2-conserved synthetic peptides p37 (aa 517–531) and p38 (aa 412–419) have been reported to neutralize HCV particles, as important components of a candidate peptide vaccine (19). The molecular targets for current HCV Direct-acting antiviral (DAA) in development are mainly focused on non-structural proteins such as the NS3 protease, NS5A and the NS5B RdRp (20). Recently, considerable progress has been made to understand HCV entry (21, 22) and development of entry inhibitors (20, 21, 23, 24). Many patients do not respond to the current available therapy, therefore, there is an urgent need to develop effective HCV vaccines and specific therapeutic drugs. While both E1 and E2 are hypervariable in nature, it is difficult to design vaccines or therapeutic drugs against them. Genotype 5a accounts for over 50% of HCV infections in South Africa (25).
2. Objectives
This study aimed to characterize genotype 5a E1 and E2 sequences to determine possible glycosylation sites, conserved B-cell epitopes and peptides in HCV that could be useful targets in the design of vaccine and entry inhibitors.
3. Patients and Methods
3.1. Study Population
This study included 18 genotype 5a samples collected from treatment-naive HCV infected patients at Dr. George Mukhari Academic Hospital (DGMAH), north-west of Pretoria, South Africa, from 2007 to 2011. Patients’ demographics and genotyping based on 5’UTR were previously described in detail (25). Six of 18 samples were sequenced as part of the genotype 5a near-full length analysis previously described (26). DGMAH is an academic hospital serving a population of around 4 million from both rural and urban areas. It is a referral hospital for patients from the North West, Mpumalanga, Limpopo and the northwest part of Pretoria, Gauteng. The Medunsa Research and Ethics Committee approved the study.
3.2. PCR and Sequencing
Viral RNA was extracted from 140 μL of serum using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. HCV RNA was converted into cDNA using the enzyme RevertAid TM RT-PCR (Fermentas, Vilnius, Lithuiana). The cDNA was amplified in three overlapping fragments (Table 1) covering complete E1 and E2 regions. Direct sequencing was performed with ABI 3500XL (Inqaba Biotechnological Industry, PTY, Ltd, Pretoria, South Africa) using second round PCR primers. Sequence fragment assembly was performed using Chromas Pro1.5 (www.technelysium.com.au/chromas.html). All sequences were aligned by Mafft (mafft.cbrc.jp/alignment/server/) and translated into amino acids using BioEdit (27).
Sequences | Primers | Reference | |
---|---|---|---|
A | this study | ||
F1A | 1088 | GAC CAT TTC ATC ATC ATG TCC CA | |
R1A | 1425 | TGT ATG CGG CGG CGA ACA AGA CC | |
F2A | 1113 | CTT CGG AGG GCC GTT GAC TAC TTA GCG | |
R2A | 1413 | CGA ACA AGA CCC CCC AGT GGG | |
B | |||
M105 | 1292 | ATG GCA TGG GAC ATG ATG ATG | (27) |
R1B | 2061 | TAG GCC CTA AGT TGC AGG GTG GA | this study |
M106 | 1298 | TGG GAC ATG ATG ATG AAT TGG | (27) |
R2B | 2022 | CAA ACC CTG TGG AAT TCA TCC AG | this study |
C | this study | ||
F1C | 1743 | GGC TGG GGA ACT ATC AGC TAT | |
R1C | 2636 | AAA CCC ATG AGT CCC CGC AGC C | |
F2C | 1773 | TCG GGC CCC AGT GAT GAC AAG | |
R2C | 2612 | AGC CGC GTT TAG GAC AAT GAC GTT CT |
Sequences of the HCV Primers Used in This Study
3.3. Analysis of N-Linked Glycosylation Sites
The N glycosylation sites were predicted using the online prediction server NetNGlyc version 1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/), which predicts N glycosylation sites in proteins by artificial neural networks that examine the sequence context of Asn-Xaa-Ser/Thr sequins. The networks can identify 86% of the glycosylated and 61% of the non-glycosylated sequins, with an overall accuracy of 76%.
3.4. Prediction of B-Cell Epitopes
For identification of B-cell epitopes, 16-mer B-cell epitopes was predicted using the program ABCpred (http://www.imtech.res.in/raghava/abcpred/) at a 0.51 default threshold using a consensus sequence from 18 genotype 5a sequences created using Bioedit. ABCpred server predicts B-cell epitopes using artificial neural network using fixed length patterns (28). Antigenicity of all predicted epitopes was analyzed using VaxiJen v2.0 online antigen prediction (www.ddg-pharmfac.net/vaxijen/). Proteins having antigenic score more than 0.4 were selected as antigenic. VaxiJen v2.0 allows antigen classification based on physicochemical properties of proteins without recourse to sequence alignment. All predicted epitopes were analyzed for conservation using the IEDB database (http://tools.immuneepitope.org/tools/conservancy/iedb_input) at a threshold of 100% conservation compared to 406, 221, 98, 33, 45, 45 randomly selected sequences from each of the HCV genotypes 1a, 1b, 2, 3, 4 and 6, respectively.
3.5. Peptide Design
Structure analysis of sequence was performed using the Protparam online tool (30). Protparam computed different parameters including the molecular weight, theoretical pI, AA composition, atomic composition, extinction coefficient, instability index, aliphatic index and grand average of hydropathicity (GRAVY). To check post-translational modifications, predicted peptides were predicted for N-linked glycosylation as described above and for N-linked phosphorylation using the NetPhos 2.0 (29) program. The NetPhos 2.0 produces neural network predictions for serine, threonine and tyrosine phosphorylation sites in sequences. Only those motifs with NetPhos score of 0.7 or greater were considered.
3.6. GenBank Accession Numbers
Sequences were submitted to GenBank under the accession numbers KC7678835 - KC767846.
4. Results
4.1. Sequence Alignment and Genetic Distances
Sequence alignment of 18 genotype 5a sequences with a reference sequence from the GenBank showed that most regions in the genotype 5a E1 and E2 proteins were conserved except hypervariable 1 (HVR1), which was highly variable as expected. Comparison of genetic distances between sequences in this study showed intragroup genetic distances ranging from 8% to 17%, with an average distance of 13% (Table 2).
Genetic Distancesa,b | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sequence | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | |
1 | ZADGM7890 | ||||||||||||||||||
2 | ZADGM6544 | 0.12 | |||||||||||||||||
3 | ZADGM4227 | 0.13 | 0.12 | ||||||||||||||||
4 | ZADGM1908 | 0.14 | 0.12 | 0.13 | |||||||||||||||
5 | ZADGM1707 | 0.10 | 0.12 | 0.11 | 0.12 | ||||||||||||||
6 | ZADGM651 | 0.13 | 0.13 | 0.14 | 0.13 | 0.12 | |||||||||||||
7 | ZADGM308 | 0.12 | 0.12 | 0.12 | 0.12 | 0.10 | 0.11 | ||||||||||||
8 | ZADGM6485 | 0.13 | 0.10 | 0.13 | 0.12 | 0.12 | 0.12 | 0.10 | |||||||||||
9 | ZADGM4124 | 0.13 | 0.13 | 0.13 | 0.13 | 0.12 | 0.14 | 0.13 | 0.12 | ||||||||||
10 | ZADGM2439 | 0.14 | 0.15 | 0.15 | 0.15 | 0.12 | 0.14 | 0.12 | 0.13 | 0.14 | |||||||||
11 | ZADGM2352 | 0.13 | 0.14 | 0.14 | 0.14 | 0.13 | 0.12 | 0.11 | 0.12 | 0.14 | 0.13 | ||||||||
12 | ZADGM525gp | 0.14 | 0.14 | 0.15 | 0.14 | 0.12 | 0.15 | 0.14 | 0.14 | 0.14 | 0.16 | 0.16 | |||||||
13 | ZADGM869 | 0.14 | 0.14 | 0.15 | 0.13 | 0.13 | 0.10 | 0.13 | 0.13 | 0.14 | 0.14 | 0.14 | 0.17 | ||||||
14 | ZADGM3013 | 0.15 | 0.14 | 0.14 | 0.15 | 0.12 | 0.15 | 0.13 | 0.12 | 0.15 | 0.15 | 0.15 | 0.17 | 0.15 | |||||
15 | ZADGM0518 | 0.9 | 0.12 | 0.11 | 0.12 | 0.08 | 0.13 | 0.10 | 0.11 | 0.11 | 0.12 | 0.13 | 0.14 | 0.13 | 0.11 | ||||
16 | ZADGM2582 | 0.14 | 0.14 | 0.13 | 0.14 | 0.13 | 0.13 | 0.12 | 0.13 | 0.14 | 0.13 | 0.13 | 0.16 | 0.15 | 0.15 | 0.13 | |||
17 | ZADGM2088 | 0.14 | 0.13 | 0.13 | 0.13 | 0.12 | 0.12 | 0.11 | 0.12 | 0.13 | 0.14 | 0.12 | 0.13 | 0.14 | 0.14 | 0.12 | 0.12 | ||
18 | ZADGM1104 | 0.13 | 0.14 | 0.13 | 0.14 | 0.12 | 0.13 | 0.09 | 0.12 | 0.14 | 0.14 | 0.14 | 0.13 | 0.15 | 0.15 | 0.12 | 0.14 | 0.13 |
Genetic Distances in E1 and E2 Sequences of Genotype 5a in This Study
4.2. Analysis of E1 and E2 N-Linked Glycosylation
E1 and E2 proteins of 18 sequences were analyzed for possible glycosylation sites. Differences in the probability of glycosylation in E1 and E2 were observed in most sequences. Whereas other studies reported five N-linked glycosylation sites in the E1 region, all strains in the current study showed three or four glycosylation sites, except for ZADGM2088, which showed 2 glycosylation sites, with N325 site not predicted as glycosylation sites from all sequences. In the E2 region, three sequences (ZADGM1104, ZADGM1707 and ZADGM3013) showed nine glycosylation sites, while the remaining had variations in the number of glycosylation sites. In ZADGM308, position N430 was replaced by H, while in ZADGM6544, N448 was replaced by D. Site N476 was found in only 6 of analyzed 18 sequences. The E2 sites N423 and N576 were not predicted as glycosylation sites in all genotype 5a sequences in this study (Table 3).
Sequence | Probability at Glycosylation Site a,b | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E1 | No of Sites | E2 | No of Sites | ||||||||||||||
196 | 209 | 234 | 305 | 325 | 417 | 430 | 448 | 476 | 533 | 541 | 557 | 623 | 645 | ||||
1 | ZADGM7890 | + | ++ | ++ | + | - | 4 | + | ++ | - | - | + | ++ | + | + | + | 7 |
2 | ZADGM6544 | + | ++ | + | - | - | 3 | + | ++ | - | - | - | ++ | + | + | + | 6 |
3 | ZADGM4227 | + | ++ | ++ | + | - | 4 | ++ | ++ | - | - | + | - | + | + | + | 6 |
4 | ZADGM1908 | + | ++ | + | - | - | 3 | ++ | + | - | + | ++ | + | + | + | - | 7 |
5 | ZADGM1707 | + | ++ | + | + | - | 4 | ++ | + | + | + | + | ++ | + | + | + | 9 |
6 | ZADGM651 | + | ++ | ++ | + | - | 4 | ++ | + | - | - | + | + | + | + | - | 6 |
7 | ZADGM308 | + | ++ | ++ | - | - | 3 | ++ | - | + | - | + | ++ | + | + | - | 6 |
8 | ZADGM6485 | + | ++ | ++ | - | - | 3 | ++ | ++ | - | + | + | ++ | + | + | - | 7 |
9 | ZADGM4124 | + | ++ | + | + | - | 4 | ++ | + | - | - | + | + | + | + | + | 7 |
10 | ZADGM2439 | + | ++ | + | + | - | 4 | ++ | + | - | + | + | + | + | + | - | 7 |
11 | ZADGM2352 | + | ++ | ++ | - | - | 3 | ++ | ++ | - | - | + | ++ | + | + | - | 6 |
12 | ZADGM525gp | - | ++ | ++ | + | - | 3 | ++ | + | - | - | - | + | + | + | + | 6 |
13 | ZADGM869 | + | ++ | + | - | - | 3 | + | ++ | - | - | + | + | + | + | + | 7 |
14 | ZADGM3013 | + | ++ | + | + | - | 4 | ++ | ++ | ++ | + | + | ++ | + | + | + | 9 |
15 | ZADGM0518 | + | ++ | + | - | - | 3 | ++ | ++ | ++ | - | - | ++ | + | + | - | 6 |
16 | ZADGM2582 | + | ++ | + | - | - | 3 | ++ | + | - | + | + | ++ | + | + | + | 8 |
17 | ZADGM2088 | - | ++ | ++ | - | - | 2 | ++ | + | - | - | + | + | + | + | - | 6 |
18 | ZADGM1104 | + | ++ | ++ | - | - | 3 | ++ | ++ | + | + | + | + | + | + | + | 9 |
Probability of Glycosylation in E1 and E2 Sequences
4.3. B-Cell Epitopes Prediction
Three conserved antigenic B-cell epitopes were predicted for genotype 5a sequences in the E2 region. Epitope E2504-609 (GPVYCFTPSPVVVGTT) had the highest antigenic score of 1.1613, while E2675-690 (LPCSFTPTPALSTGLI) and E2685-700 (LSTGLIHLHQNIVDTQ) had antigenic scores of 0.5340 and 0.6639, respectively. For conservancy analysis, epitope E2504-609 was highly conserved among other genotypes, while epitope E2675-690 and E2685-700 were variable (Table 4).
Position | Predicted Epitopes | Antigen score | Genotype 1a | Genotype 1b | Genotype 2 | Genotype 3 | Genotype 4 | Genotype 6 |
---|---|---|---|---|---|---|---|---|
504 | GPVYCFTPSPVVVGTT | 1.1613 | 92 | 97 | 88 | 94 | 89 | 73 |
675 | LPCSFTPTPALSTGLI | 0.5340 | 0 | 0 | 0 | 0 | 0 | 0 |
685 | LSTGLIHLHQNIVDTQ | 0.6639 | 0 | 0 | 0 | 0 | 0 | 2 |
Predicted B-Cell Epitopes of HCV Genotype 5a and Their Antigenicity Score, Number of Allele and Conservancy (Percentage) in Different Genotypes
4.4. Peptide Design
From the consensus sequences of genotype 5a E1 and E2, eleven short peptides of 8-28 amino acids were designed from the highly conserved residues. Five peptides of 9-16 amino acids in length were derived in the E1 region, while six peptides of 8-26 amino acids were derived in the E2. Three of the peptides had post-translation modification, which is the N-linked glycosylation, although at a low probability. None of the peptides has either serine, threonine and tyrosine phosphorylation sites predicted. Most peptides were found to be the best predicted peptides useful for designing entry inhibitors (Table 5).
Positiona | Peptides | Length | Molecular Weight | Theoretical PI | Extinction Coefficient (cm-1 M-1) | Instability Index | Alphatic Index | GRAVY | Composition of hydrophobic AA’s, %b | N-linked glycosylation Cc | N-linked Phosphorylation |
---|---|---|---|---|---|---|---|---|---|---|---|
201 | YHTNDCPNSSI | 14 | 1611.7 | 5.08 | 2980 | 17.36 | 69.29 | -0.343 | 21.4 | + | - |
262 | VDYLAGGAA | 9 | 835.9 | 3.80 | 1490 | -3.53 | 108.89 | 0.867 | 22.2 | - | - |
304 | CNCSIYSGH | 9 | 983 | 6.72 | 1615 | 5.69 | 43.33 | -0.056 | 11.1 | ++ | - |
314 | TGHRMAWDMMMNWSPT | 16 | 1952.2 | 6.41 | 11000 | 29.16 | -0.706 | 6.25 | 37.5 | - | - |
352 | HWGVLFAAAY | 10 | 1134.3 | 6.74 | 6990 | -4.25 | 98 | 1.040 | 40 | - | - |
562 | VKTCGAPPC | 9 | 875 | 8.03 | 125 | 30.68 | 43.33 | 0.311 | 11.1% | - | - |
585 | TDCFRKHP | 8 | 1003.1 | 7.92 | 0 | 5.15 | 0 | -1.512 | 12.5 | - | - |
645 | ACNWTRGERCDL | 12 | 1423.5 | 6.1 | 5625 | 27.31 | 40.83 | -0.908 | 16.7 | + | - |
664 | LSPLLHTTTQ | 10 | 1110.2 | 6.74 | - | 37.86 | 117 | 0.020 | 30% | - | - |
675 | AILPCSFTPTPALSTGLIHLHQNIVDTQ | 28 | 2988.4 | 5.97 | 0 | 28.19 | 115 | 0.421 | 32.1 | - | - |
725 | FLLLADAR | 8 | 918.1 | 5.84 | - | -1.86 | 171.25 | 1.225 | 50 | - | - |
Predicted Peptides for HCV E1 and E2 Conserved in Genotype 5a
5. Discussion
Genotype 5 is the most conserved HCV genotype classified into only one subtype (5a) (26). This study was designed to identify conserved sequences of these proteins to predict antigenic epitopes and peptides that could serve as best targets for vaccine design and potential entry inhibitors. Using different structural and sequence analyses tools helped with in-silico analysis for E1 and E2 regions. HCV genotype 5a sequences were found to be conserved in most regions of E1 and E2 proteins. The most variable region within the study sequences was the HVR1 and these HVR1 differed by up to 80% between HCV genotypes and subtypes (31). Although highly variable, the HVR1 is the only region that contains neutralization determinant, which is the target for immune response (32). As expected due to HVR variability, comparison of genetic distances between sequences in this study showed high genetic distances ranging from 8% to 17%, with an average distance of 13%. Variability within the HVR1 is one of the reasons describing why human antibodies raised against HCV E2 epitopes do not provide protection against multiple viral infections (19). In this study, analysis of N-linked glycosylation sites revealed that genotype 5a sequences were not conserved at glycosylation sites as compared to other genotypes. Site N476 with a level of 75% conservation among different genotypes was absent from the sequences of genotype 5a (5) and was found in six of the 18 analyzed sequences. As reported previously, E2 sites N423 and N576 were absent in all genotype 5a sequences including the 18 sequences from this study, which is notable because these two sites were reported to be 99-100% conserved across all genotypes (5). The glycosylation sites were reported to be highly conserved among different genotypes (9). These sequence variations in genotype 5a glycosylation sites could be useful to design efficient vaccine to help host to produce good antibody response. E2 is the main target for neutralizing antibody responses and variation of this region is thought to be related to maintenance of persistent infection by emerging escape variants and subsequent development of chronic infection (33, 34). Recently, a linear region of E2 encompassing amino acids 434 to 446 has been reported to elicit non-neutralizing antibodies that can inhibit neutralizing activity of antibodies targeting amino acids 412 to 423 (35). However, a study by Tarr et al. reported conflicting results showing that human antibodies that target the region encompassing amino acids 434 to 446, are not inhibitory but capable of neutralizing HCVpp and HCVcc entry (36). All B-cell epitopes included in this study were found to be antigenic ally effective, and it can be implied that these epitopes may be important for inducing the desired immune response. The E2504-609 epitope was found to be the most conserved among other genotypes. Recently a study by Ikram et al. reported conserved epitopes among genotype 3a that was also conserved among other genotypes (37). Highly conserved epitopes might influence the immunogenic potential since variability within the epitopes can increase the chance of immune escape (38). Short polypeptides derived from viral envelope sequences of other viruses have been used to investigate protein interactions involved in viral entry and some antiviral agents have been successfully developed (39). Envelope protein peptide inhibitors for other viruses in the same family with HCV like Dengue and West Nile were shown to inhibit viral entry (40, 41). In HCV, the post-binding entry step was prevented using peptides derived from the C terminal region of E2, which plays an important role in the HCV entry process (42). For this study, conserved peptides were derived that can be used as targets for therapeutic purposes. In this study, only three peptides had glycosylation sites at low probability and no phosphorylation sites were predicted. Post translational modifications such as glycosylation and phosphorylation affect the stability of therapeutic peptides (43). Using HCV glycoproteins in therapeutic strategies may offer protection against HCV infection (44). In conclusion, genotype 5a sequences are conserved and can be used to design epitopes and peptides. The results showed that antigenic conserved predicted B-cell epitopes and stable peptides with few post-translational modifications. These epitopes and peptides are potential candidates to design entry inhibitors and vaccines able to cover a global population, especially where genotype 5a is common. Further investigations would analyze these peptides to better understand their involvement in blocking HCV entry.