1. Background
Hepatitis B virus (HBV) infection poses an enormous threat to public health. According to the report of the World Health Organization (WHO), More than 240 million people have chronic (long-term) liver infections. More than 780000 people die annually due to acute or chronic consequences of hepatitis B (1). There is about 55% cancer risk caused by HBV (2).
Complete genome sequence of HBV was first determined by Galibert et al. (3). in 1979. Since then, more and more research has been devoted to sequencing the genomes of different HBV. Japanese scholars cloned three HBV belonging to subtype adw isolate from the sera of chronic and asymptomatic HBV carriers (AsC) and a 3.9% - 5.6% divergence of DNA sequences was demonstrated between the three clones. Compared with the two adw serotypes isolated in the US, the divergence was 8.3% - 9.3%. According to the sequences of 18 clones already obtained at that time, HBV clones were divided into four genotypes (A, B, C, and D), defined by the divergence of ≥ 8% in complete genome sequence (4). As more complete genome sequences of various clones of HBV were described, genotypes E, F, G and H were identified using phylogenetic analysis. Ten genotypes and several subtypes of HBV have been recognized to date (5). Each genotype was broken down into several subgenotypes. Thus far, several recombinant genotypes have been discovered. HBV infection prevalence differs worldwide with a significant regional variation (6). By 1 August 2013, there were 75373 sequences related to HBV in the NCBI, 5066 of which were complete genome sequences. These sequences were from wild-type clones, natural mutants, drug-induced dominant mutants with immune tolerance and drug resistance, as well as cloned sequences from quasi-species in patients with different diseases worldwide. These sequences are highly valuable to investigate polymorphisms, genotype characteristics, epidemiologic trends and phylogenetic analysis of HBV (7). HBV infection was highly prevalent in Asia and moderately prevalent in India and Russia (8). The largest number of complete genome sequences of HBV in the nucleic acid database is from mainland China, accounting for about one-fourth of the total. Hong Kong and Japan were in the second and third places, followed by Africa, Europe and the US.
2. Objectives
The aim of this study was to assess distribution and epidemiologic trends of HBV genotypes and subtypes in countries neighboring China based on genome analysis to promote clinical practice results.
3. Materials and Methods
3.1. Data Collection
A total of 5066 complete genome sequences of HBV were screened from the NCBI. Of these sequences, 952 were from the clones of HBV isolated in 14 countries neighboring China (Japan, Vietnam, India, Burma, Thailand, Laos, Cambodia, Indonesia, the Philippines, North Korea, South Korea, Mongolia, Malaysia and Russia). Genotyping was performed for these sequences online using the NCBI analysis tool (http://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi). The sequences were divided into different serotypes by the amino acids at positions 122, 160, 127, 134, 159, 177 and 178 in the S region of HBV (9). Preliminary analysis of the sequences in this district suggested that the distance among B subgenotyes (B3,B5,B7,B8,B9) was less than 4%, so we selected genotype B for further analysis. S, C, X and P open reading frames of 172 complete genotype B sequences (colonial sequences and incomplete sequences were excluded) from Japan (20), Indonesia (58), Thailand (15), Vietnam (16), Philippine (6) Cambodia (2) and Malaysia (55) were chosen for nucleotide sequence analysis, respectively (10). (Genotype B was only seen in the seven countries).
3.2. Phylogenetic Analysis
DNA star was used in the sequence alignment. The alignment results were imported into MEGA 5.1 (Kimura 2-parameter matrix, neighbor-joining method, and Bootstrap for 1000 times). The phylogenetic tree was established based on the complete genome sequences of HBV clones and the genetic distance was calculated (11).
3.3. Statistical Analysis
Data was analyzed using SPSS10.0 (Chicago, IL). Data were compared by chi-square and t-tests. A P < 0.05 was considered statistically significant.
4. Results
4.1. Genotype Distribution
Genotype C accounted for the highest proportion (40.65%387/951) of HBV amongst the 14 countries neighboring China, thus genotype C was considered the dominant genotype. The less dominant genotypes were B, D and A, accounting for 19.87% (189/951), 18.51% (176/951) and 11.46% (109/951), respectively. The number of sequences belonging to genotypes F, G, H, I and J were 5, 2, 6, 19 and 3, respectively (Figure 1).
Pearson chi-square test (χ2 = 1765.2, P < 0.0001) indicated that the 14 countries differed considerably regarding HBV genotypes. A t-test was performed for 10 genotypes. Genotype C showed the largest difference in distribution among the countries (t = 2.97, P < 0.05), followed by genotype B (t = 2.364, P < 0.05). The difference in the distribution of other genotypes was relatively small (P > 0.05).
4.2. Recombinant Genotype
Based on the literature, there were 45 sequences belonging to recombinant genotype, accounting for 5.94% (12-19). Malaysia had the greatest diversity of recombinant genotypes and the largest number of recombinant genotype sequences (51.11%; Table 1).
4.3. Serotype Distribution
All 10 genotypes of HBV existed in 14 countries neighboring China, with adw2 and adrq+ dominant and accounting for 29.38% (260/885) and 24.75% (219/885), respectively. The serotype adw4 was only found in 12 sequences from the Philippines and adw3 was found in one sequence from Malaysia and Japan, respectively (Figure 2).
4.4. Correspondence Between Serotype and Genotype
The association between HBV serotype and genotype in 14 countries neighboring China is shown in Table 2.
Serotype | Genotype |
---|---|
ayw1 | BCID |
ayw2 | ACD |
ayw3 | BCD |
ayw4 | ABC |
ayr | CD |
adw2 | ABCEIH |
adw3 | BH |
adw4 | ABCHF |
adrq+ | BCDEJ |
adrq- | C |
The Corresponding Relation of Serotype and Genotype
4.5. Sequence Alignment
4.5.1. Phylogenetic Tree of Genotype B and the Difference Between the Sub-Genotypes
According to the reference sequences reported in the literature, genotype B could be further divided into eight subgenotypes (B1, B2, B3, B4, B5, B7, B8 and B9) (Figure 3, Table 3).
The branch of B2 was far away from other subgenotypes and the B3 was located closer to the trunk. B2 and B3 do not belong to one monophyletic group, but on different branches.
Sub Genotype | B1 | B2 | B3 | B4 | B5 | B7 | B8 |
---|---|---|---|---|---|---|---|
B2 | 0.032 ± 0.003 | ||||||
B3 | 0.048 ± 0.004 | 0.037 ± 0.003 | |||||
B4 | 0.056 ± 0.003 | 0.046 ± 0.003 | 0.056 ± 0.003 | ||||
B5 | 0.050 ± 0.004 | 0.038 ± 0.003 | 0.027 ± 0.002 | 0.057 ± 0.003 | |||
B7 | 0.049 ± 0.004 | 0.038 ± 0.003 | 0.020 ± 0.001 | 0.055 ± 0.003 | 0.027 ± 0.002 | ||
B8 | 0.050 ± 0.004 | 0.041 ± 0.003 | 0.028 ± 0.002 | 0.058 ± 0.003 | 0.031 ± 0.002 | 0.028 ± 0.002 | |
B9 | 0.046 ± 0.004 | 0.036 ± 0.003 | 0.025 ± 0.002 | 0.054 ± 0.003 | 0.025 ± 0.002 | 0.025 ± 0.002 | 0.028 ± 0.003 |
Mean Nucleotide Sequence Divergences Over Complete Genome Sequences of HBV Between Sub Genotypes
4.5.2. Alignment of Complete Genomes of Genotype B
Four open reading, frames (S, P, X, C) of HBV genotype B were analyzed, there were no specificity in P, X and C regions besides S region. Alignment shows that the amino acid sequences translated by the S open reading frame reflects the heterogeneity of HBV from each country (Figure 4).
The subgenotype B found in Indonesia and Malaysia (B3, B5, B7, B8 and B9) shared six common specific amino acid sites in S-ORF as follows: AA39 D; AA55 S; AA152 N; AA154 V; AA158 A and AA160 S. For most subgenotypes, I and N were found at AA56 and AA84, respectively. However, B7 has a T at position AA56; B9 has an M at position AA84; B1 from Japan has H and I at these two positions, respectively, and for Vietnam, 87.5% of AA84 was I.
In B4, all AA158 was A, with T at AA164, F at AA165 and Y at AA374. Subgenotype B4 from other four countries has L, V, I, L and F at these positions, respectively. For Indonesia, B8 has specific T at AA164 in four sequences.
Generally, there would be no more than four types of amino acids appearing at one position in the complete genome; however, in P-ORF, seven types of amino acids (C, N, S, G, V, I, and D) were found at AA69. In some sequences from Malaysia and Indonesia, five types of amino acids were detected at this position. Moreover, at AA73 there were five types of amino acids (K, D, N, Q, and E). The sequences from Malaysia had all five types of amino acids.
4.6. Stop and Start Codons
The stop codon of S-ORF was TAA in 141 sequences of HBV, belonging to genotype B. The stop codon of P-ORF was TGA. The stop codon was TAG in 95.74% of the sequences in C-ORF. Thus, there were only six sequences with TAA as the stop codon. The stop codon was TAA in 92.9% of the sequences in X-ORF, with the exceptions of TGA in one case and TAG in eight cases (figures omitted).
There was a start codon mutation in the pre-S2 region, with ATG mutated into GTG (DQ993682, AB205122, GQ924641) or ATA (JQ801474, GQ924646, GQ924625, JQ027325) or ACG (JQ027315).
The number of patients with HBV infection reported by the WHO for each district and country of the world was positively correlated with the number of HBV sequences in the database isolated from the corresponding district and country.
5. Discussion
As shown in Figure 1, genotype compositions of HBV from adjacent countries tend to be similar, which was related to a geographical factor. The two southeastern Asian countries, Indonesia and Malaysia, are located near the equator, between the Indian and Pacific Oceans. Both countries have many islands and share a border extending for 874 km. The dominant genotypes of HBV were B and C in the two countries. For Mongolia and Russia (north of China), D was the dominant genotype. Furthermore, the diversity of genotypes differs for each country. Japan has a high diversity of HBV genotypes, probably because Japan was frequent mobility with foreign countries due to its special historical background and developed economy (22). During the exchanges, different genotypes of HBV were introduced. In contrast, for developing Asian countries, such as Laos, Burma and North Korea, fewer genotypes were detected. Harboring a relatively small population on such vast areas of land, Russia and Mongolia reported only two genotypes, as shown by our study. Although India ranks the second in the world regarding population size, India attracts much fewer immigrants because of its underdevelopment (7). Therefore, only four genotypes of HBV have been detected in India. Indeed, land area, population size and level of development are all important factors, which determine the diversity of HBV genotypes and subgenotypes for a country.
An association was found between genotype and serotype. According to our study, such an association in 14 countries neighboring China seems to deviate from the previous results. Thedja MD et al. (20) reported that the serotype ayw1 mainly corresponds to genotype A and very rarely to B on a global scale. The serotype ayw3 corresponds to D in most cases and rarely to C. The serotype ayw4 is expected to correspond to E. The serotype adw2 often corresponds to A, B, G and rarely C. However, based on our results, ayw1 corresponds to B in 57 sequences (79.2%) and no sequence showed correspondence association to A. Although ayw3 corresponds to D in 90 sequences, a correspondence to C was found in a few sequences. The serotype ayw4 does not correspond to E as indicated by our study, but only corresponds to A, B and C. In 72 and 96 sequences, adw2 was found to correspond to genotypes A and B, respectively; however, a correspondence was found between adw2 and C (41, 15.7%). This result may demonstrate a certain regional characteristic or a sampling bias can be suspected.
In the phylogenetic tree, the branch of subgenotype B2 was far away from other subgenotypes. B2 was more phylogenetically ancient. According to a previous study (23), B9 was more phylogenetically ancient than B3 and B7, but in the phylogenetic tree constructed in our study, B3 was located closer to the trunk. Such divergence may be attributed to the difference in the selection of sequences belonging to each subgenotype.
Phylogenetic analysis showed that B2 and B3 do not belong to one monophyletic group, but on different branches. This result contradicts the principle that the genotype and the corresponding subgenotypes of HBV should belong to one monophyletic group (21). Although B7, B5, B9, B4 and B1 have high bootstrap values (> 87%), only B4 and B1 have a sequence divergence more than 4% with other sequences. Thus, only B4 and B1 can be accurately differentiated, which was consistent with a previous study (24). Because B3, B5, B7, B8 and B9 were the recombinant subgenotypes between B and C (25-28), sequence divergence was below 4% for these subgenotypes and can be clustered into quasi-subgenotype B3 (29).
The pre-S2 region of HBV isolated from Vietnam contains three specific amino acid positions (AA158 A, AA164 T, and AA165 F) and the corresponding sequences all belong to subgenotype B4; however, whether this feature could be used as the criterion for differentiating B4 should be verified with a large sample size. For the sequences from Indonesia and Malaysia, six specific amino acid positions simultaneously appear in subgenotypes B3, B5, B7, B8 and B9. It was further confirmed that B3, B5, B7, B8 and B9 can be clustered into quasi-subgenotype B3. More evidence need to be gathered before we can take these six positions as the criterion for differentiating quasi-subgenotype B3. The preS1 and preS2 proteins have very strong antigenicity. It was the reason why they were conserved in that genotype.
Codon bias was a phenomenon that the synonymous codon disequilibrium of organism encoded in the same kinds of amino acids. This phenomenon related with the carrier of genetic information of DNA molecules and biological function of proteins, so it has an important biological significance (30). Similar with amino acids encoded codon bias, this study found that the termination codon of the genotype B has a partiality in different regains.
The stop codon of HBsAg was TAA and TGA, according to the existing literature (31). This study showed that C-ORF and P-ORF have a TAG preference. X-ORF occasionally contains TAG.
Three aspects might explain this feature. First, research showed that synonymous codon usage frequency of one gene and abundance of identify their identify them have a positive correlation, codon bias increases along the length of genes (32). Second, various tRNA on each specific ability to recognize different termination codon. High recognition ability of RNA was preferred to use (33). Last but not least, natural condition, when the terminal codon of HBsAg were TAA,TGA or TGA, the corresponding amino acids of polymerase on overlap region were AA236N,AA236N or AA236D, which means that different terminal codon of HBsAg might change the polarity of aa236. It was known that amino acid 236 located on the catalytic center of polymerase, so the polarity change might have a significant impact on catalytic activity.
The start codon mutation of the pre-S2 region reported herein was consistent with the reports from Japan, Korean and Thailand (34, 35). Occurring in a variety of species, this mutation might be considered to be a new start codon for HBV. Several studies reported the presence of GTG or TTG in a few bacterial species as the start codon (36-38). Kadowaki et al. showed that the mitochondrial ACG codon encodes protein (39). There were also mitochondria and chloroplasts with ATT and ATA as the start codons (40-42). Some studies indicated that this mutation may induce changes in proteins encoded by the pre-S2 region or interfere with the binding of HBV to hepatic cells, thus causing a high risk of hepatic cancer (25, 42). These start codons were still translated as Met when at the start of a protein, even if the codon encodes a different amino acid, because a separate transfer RNA (tRNA) was used for initiation (39, 40). Therefore, this start codon mutation of the pre-S2 region of HBV was probably a same-sense mutation, without affecting the survival and replication of HBV in the host cells. Currently, there has been no experimental report on the possible influence of this mutation.
YMDD mutation occurs in polymerase gene of HBV. Until recently, most research about YMDD mutation focused on the occurrence of lamivudine-related YMDD mutation and its impact on antiviral treatment (43). The YMDD mutation, also known as the M204V/I mutation, was the substitution of methionine by valine or isoleucine and designated as the YVDD or YIDD variant (44, 45). Of 150 sequences in our experiments, YVDD was found in one sequence from Thailand and Indonesia and YIDD was found in two sequences from Japan. In addition, the SMDD mutation (AP011093) was found in one sequence from Indonesia and the YMND mutation (DQ993695) was found in one sequence from Vietnam. Similar SMDD mutations have not been previously reported. The causes for the mutation may be drug-induced tolerance or natural variation.
The largest number of complete genome sequences of HBV in the nucleic acid database was from mainland China, accounting for about one-fourth of the total. Hong Kong and Japan were the second and third, followed by Africa, Europe and the US. The number of cases of HBV infection reported by the WHO for each district and country of the world was positively correlated with the number of HBV sequences in the database isolated from the corresponding district and country.
China is a country with a high incidence of hepatitis. The high prevalence of chronic hepatitis B has already threatened the public health. In NCBI, nearly 1000 sequences of HBV were submitted from 14 countries neighboring China, accounting for one-fifth of the total. It is an integral part of the basic research of HBV to use data of genome sequences from China and the neighboring countries and to promote clinical practice based on genome analysis results.