Hepatitis C virus (HCV) belongs to the genus
Hepacivirus and family
Flaviviridae. Genetic material of HCV contains an Open Reading Frame (ORF) of almost 9400 nucleotides flanked by Untranslated Regions (UTR) at 5’ and 3’ (
1). Hepatitis C virus is a single-stranded, positive polarity RNA. The translation is initiated by IRES at the 5’UTR in endoplasmic reticulum. A single polyprotein precursor is processed into ten proteins by cellular and viral proteases. The essential components are the structural proteins (core, E1, E2), and p7, NS2, NS3, NS4A, NS4B, NS5A, NS5B are seven nonstructural proteins (
2). Based on nucleotide heterogeneity, HCV is classified into different genotypes. In Pakistan, genotype 3a is the predominant genotype (
3). High genetic heterogeneity characteristic of HCV is the result of error-prone RNA dependent RNA polymerase (RdRP), which is a significant problem for future vaccine development (
4). These genotypes are further divided into closely akin multiple sub-types. These subtypes are different from each other by 15% in nucleotide sequences (
5,
6).
A large number of closely related viral variants are continuously produced in this progressive disease due to mutation at the nucleotide level and higher multiplication rate of HCV. In vivo, these variants circulate as quasispecies (
5,
7). Geographically, genotypes of HCV have distinctive distributions. It seems HCV genotypes 1, 2, and 3 are present globally, but their relative presence varies (
8). Different genotypes have different clinical management and epidemiology. The cardinal way of identification and classification of new HCV genotypes can be measured by pairwise similarities in sequence or by phylogenetic study of nucleotide sequences (
9). Core protein can form topological identical trees to those determined from the analysis of complete genomic sequences (
10). Despite the limited power for phylogenetic analysis, previous studies have selected core gene for determining evolutionary relationships among viral isolates (
11,
12). The core protein of HCV is an RNA binding basic protein, which forms a viral capsid. It is released as a 191 amino acid (aa) precursor of around 23-kDa molecular weight containing three conserved domains (122 aa N terminal hydrophilic domain, 50 aa hydrophobic C terminal domain, and 20 aa signal peptide domain).
Besides viral capsid formation, it also interacts with various cellular proteins and pathways that play an essential role in the HCV life cycle (
13). Many studies have demonstrated that the highly conserved HCV core protein plays an important role in the pathogenesis and progression of the disease due to its ability to interact with a wide spectrum of viral and cellular proteins, including protooncogenes. Particularly, amino acid substitutions at positions 70 and 91 in the HCV core region have been identified as predictors of hepatocellular carcinoma (HCC) among genotype 1b-infected patients from Japan (
14). Hence, these polymorphisms might be considered surrogate markers for hepatic disease stages or the eventual development to HCC (
15). Although there is consensus on HCV genotype 3a prevalence in Pakistan, variable frequencies of other HCV genotypes have been reported (
16). A phylogenetic study has a central role in understanding evolution, ecology, biodiversity, and viral genomics.
Identification and naming of organisms are performed by phylogenetic analysis. Distribution and genetic diversity of the HCV genome are poorly documented in non-western countries. Efficient HCV classification and typing are important to clinical practice, and the greatest possible range of HCV variants needs to be sought out. The high genetic diversity of HCV is one of the significant problems of future vaccine development. However, only a few local studies had focused on HCV 3a core protein mutations with variable results. Further evaluation is needed to determine the significance of specific mutations in each genotype and how they affect the response to the treatment.
RNA viruses undergo evolution under strong selection pressures, and changes in the epidemiological pattern have been frequently reported. In developing countries like Pakistan, lack of essentials for screening and surveillance on one hand and unsatisfactory health care conditions, on the other hand, are critical factors in the spread of HCV infection. Accurate knowledge of circulating HCV strains using phylogenetic studies and an in-depth understanding of underlying genetic substitutions is, therefore, an important step in identifying epidemiological distribution as well as designing strategies for infection control and drug development. Hepatitis C Virus genotype 3a is more prevalent in Pakistan, and few studies were conducted on complete core protein sequences and phylogenetic analysis in the north-west of Pakistan. Studies performed locally revealed a prevalence of genotype 3a from 26% to 45% (
17,
18). According to a national survey conducted in 2007, 4.8% of the Pakistani population was infected with HCV, which corresponds to nearly 9.9 million of overall HCV cases in Pakistan. While the Center for Disease Analysis estimates the presence of 7.1 million HCV cases in the country (
19). Changes in the frequency distribution of circulating genotypes of HCV necessitate more research in this regard.