Full Sequence Analysis and Characterization of Human Bocavirus Type 2 in South Korea

authors:

avatar Bomina Paik 1 , avatar Sung-Geun Lee 2 , avatar Han-Gil Cho 3 , avatar Yu-Jung Won 1 , avatar Lae-Hyung Kang 1 , avatar Soon-Young Paik ORCID 1 , avatar Seung-Jin Hong 1 , *

Department of Microbiology, College of Medicine, Catholic University of Korea, Seoul, South Korea
Korea Zoonosis Research Institute, Chonbuk National University, Iksan, South Korea
Division of Public Health Research, Gyeonggi Province Institute of Health and Environment, Suwon, South Korea

How To Cite Paik B, Lee S, Cho H , Won Y, Kang L, et al. Full Sequence Analysis and Characterization of Human Bocavirus Type 2 in South Korea. Jundishapur J Microbiol. 2019;12(4):e79145. https://doi.org/10.5812/jjm.79145.

Abstract

Background:

Human bocavirus (HBoV) is found worldwide and can infect the respiratory and gastrointestinal tracts of infants and children.

Objectives:

The aim of the present study was to characterize the complete genome of a new HBoV2A strain isolated from a patient in Korea with gastroenteritis.

Methods:

Viral genomic DNA was extracted from an HBoV-positive stool specimen isolated from 3-year-old female with gastroenteritis. Entire coding sequences were analyzed using a newly designed set of primers in the conserved regions in 2017.

Results:

The full-length genome was 5,107 bp long. Phylogenetic analysis based on the complete genome sequence, including the three open reading frames (ORFs), indicated that CUK18 belonged to the HBoV2A genotype. The CUK18 strain showed the highest similarity with strain Nsc10-N386 isolated in Russia. Analysis of the ORF3, which encodes the viral capsid proteins VP1 and VP2, found that amino acid sequences corresponding to the three-fold-symmetry-related monomer were frequently substituted, a distinguishing feature of this specific genotype.

Conclusions:

Results of this study may provide valuable information for HBoV epidemiology studies and vaccine development.

1. Background

Human bocavirus (HBoV) is a recently identified virus of the family Parvoviridae, subfamily Parvovirinae, genus Bocavirus (1, 2). Human bocavirus is a small, non-enveloped virus (3), and the genome is composed of a linear, single-stranded DNA molecule of approximately 5,300 bp (4). Three open-reading frames (ORFs) have been identified in the HBoV genomes. Open reading frame 1 encodes the non-structural (NS) 1 protein, a multifunctional protein that participates in DNA replication, apoptosis, and gene transactivation. Open reading frame 2 encodes an additional NS protein, nuclear phosphoprotein (NP) 1, which plays a role in the expression of viral capsid proteins (5) and is involved in viral DNA replication at the replication origin (6). Open reading frame 3 encodes the two structural viral-capsid proteins (VP) 1 and 2, which are generated through alternative splicing events.

Human bocavirus are classified into four main genotypes: HBoV1 (7), HBoV2 (8), HBoV3 (2), and HBoV4 (9). HBoV2 can be further subdivided into two variants, HBoV2A and HBoV2B (10). Human bocavirus 1 is commonly detected in pediatric patients with respiratory tract infections, as well as in those with gastrointestinal symptoms. In contrast, the other three genotypes (HBoV2-4) have been isolated from fecal specimens (11, 12). Human bocavirus is difficult to culture in vitro, and animal models for HBoV infection have not been developed (13). In addition, HBoV infection frequently coexists with other viral or bacterial infections. Thus, sequence information from the HBoV genome may be useful for designing efficient diagnostic markers for HBoV infection.

Human bocavirus are known to be distributed globally, with reports from Africa (14), America (15), Australia (10), Asia (16-20), and Europe (9, 17). In Korea, approximately 8% of acute lower respiratory tract infections are associated with HBoV1 (21), while HBoV2 has been reported to be detected in approximately 4% of gastroenteritis patients (21, 22). However, HBoV studies in Korea were limited to partial ORF sequences, except for one study (21-24).

2. Objectives

The aim of this study was to characterize a new HBoV strain isolated from a child with acute gastroenteritis using whole-genome sequencing.

3. Methods

3.1. Specimen Preparation and Viral DNA Extraction

An HBoV-positive stool specimen from a 3-year-old female with acute gastroenteritis in Gyeonggido, South Korea was collected in July 2013. The specimen was obtained through the Waterborne Virus Bank (Seoul, Korea). The use of this sample for the purpose of this study was reviewed and approved by the Songeui Campus Institutional Review Board of the Catholic University of Korea, Catholic Medical Center. The stool specimen was stored at -20°C until analysis. Viral genomic DNA was extracted from 140 µL of 10% fecal suspension using the QIAamp DNA Mini kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions.

3.2. Amplification of HBoV Genomic DNA

To analyze the complete HBoV genome, PCR was performed with 2 × Emerald Amp PCR Master Mix (TaKaRa, Shiga, Japan). Nine primer sets were newly designed for the amplification of HBoV (Table 1). Among these, four were for the amplification of ORF1, one for ORF2, and four for ORF3. Thermal cycles were as follows: denaturation at 94°C for 5 minutes; amplification for 25 cycles at 94°C for 30 seconds, 54 - 60°C for 30 seconds, and 72°C for 1 minute; and extension at 72°C for 7 minutes.

Table 1.

Primers Used in This Study

PrimerSequence (5’→3’)LocationaSize, bp
GSP1GCG TTC AAC GCG TCT GAG TAG757 - 777
GSP2GCT CCT CCA ACA AGT ATG TGA603 - 623
Nested GSPCTA GTG AGC ATA AAA TAT CT444 - 464
ORF1-1FGCG TGG TGA GTG ACA CTA TGG CC239 - 262612
ORF1-1RCAT GAA GGT CAC CTC GCT TGT CTC826 - 850
ORF1-2FCAT GGT CAG GGC ACA CTG GTA AC782 - 805797
ORF1-2RRCG AAC GCC TTG RAC TAT GG1558 - 1578
ORF1-3FCGC CAT CTG CTG TGT ACT TAA C1464 - 1486830
ORF1-3RGAT TAT CCA CGT TCG ATG CCT CC2270 - 2293
ORF1/ORF2-FCAT TCA CAG GAC TAC ACG CTT C2038 - 2060376
ORF1/ORF2-RGGA GCT CAT CTT CGT CTC TAG G2391 - 2413
ORF2-FCAA CCT AGA GAC GAA GAT GAG C2389 - 2411617
ORF2-RCTT CGT CTG TTA CCT CCT CTG2984 - 3005
ORF3-1FCAG GAA TCA GAG GAG GTA ACA G2978 - 3000737
ORF3-1RAAT ATG ACC AHG GTG TGC TKA CG3691 - 3714
ORF3-2FGAG GCA GCT AYT TYA CTG AYT C3559 - 3581704
ORF3-2RTGA TGC TGT GYT TCC GTG YTG TC4239 - 4262
ORF3-3FGAY RTV ATG CCA GAA CTT CC3948 - 3968634
ORF3-3RTTC CTC TGT AGA GAG CTT GRT C4559 - 4581
ORF3-4FCCA YCA YTR TCC ATG CTY AGA GAC4539 - 4563615
ORF3-4RCAT CGG ACT RTA GCC TCG AAC TY5156 - 5153
3′-Oligo(dT)-anchor-RGTT CCT CTC CAA TGG ACA AGA GGA TTT TTT TTT TTT TT3′-end poly(A) tail
3′-Anchor-RGTT CCT CTC CAA TGG ACA AGA GGA

3.3. Analysis of the 5’ and 3’-ends of the HBoV Genome

The 5’ and 3’ terminal sequences were determined by rapid amplification of complementary DNA ends (RACE) using the RACE version 2.0 kit (Invitrogen, Carlsbad, CA, USA). Three gene-specific primers (GSP1, GSP2, and nested GSP) for 5’ RACE PCR amplification were designed based on ORF1 (Table 1). The 3’ end of HBoV genomic DNA was determined with 3’ RACE, with a reverse transcription reaction performed using the 3’-oligo (dT)-anchor-R primer. The second PCR amplification was performed with the VP1/VP2-F and 3’-anchor-R primers.

3.4. Cloning and Sequencing of the Complete Genome

The PCR products were subjected to 1.5% agarose gel electrophoresis and visualized by ethidium bromide staining. The amplified fragments were purified with the HiYield Gel/PCR DNA Fragments Extraction kit (RBC, Taipei, Taiwan) and cloned into the pGEM-T Easy Vectors (Promega, Madison, WI, USA) according to the manufacturer’s instructions. Transformants were selected on Luria-Bertani agar plates (Duchefa, Haarlem, Netherlands) containing 40 ng/mL X-gal, 0.1 mM isopropyl-β-D-thiogalactoside, and 50 mg/mL ampicillin. Selected clones were incubated overnight at 37°C in a shaking incubator. Plasmid DNA was purified using the HiYield Plasmid Mini Kit (RBC, Taipei, Taiwan) and sequenced by Macrogen (Seoul, Korea).

3.5. Phylogenetic Analysis

The composite sequences of the nine fragments were aligned with DNAStar software (Madison, WI, USA). The complete genome was composed of three ORFs of 1,917 bp (ORF1), 648 bp (ORF2), and 2,004 bp (ORF3). The complete genomic sequences of 20 HBoV reference strains were obtained from the GenBank database (http://www.ncbi.nih.gov) (Table 2). Multiple sequence alignment was performed with the Muscle algorithm using molecular evolutionary genetics analysis (MEGA) version 7.0 (25). Dendrograms were plotted with the neighbor-joining method using MEGA version 7.0 (25). The isolated HBoV variant was designated as CUK18. Nucleotide sequence data have been deposited into GenBank (accession number: MG195156).

Table 2.

Nucleotide Sequence Similarities Between CUK18 and HBoV Reference Strains Using Full-Length NS1, NP1, and VP1/VP2 Sequences

StrainAccession No.GenotypeNucleotide Sequence Similarity, %Amino Acid Similarity, %
Full-LengthNS1NP1VP1/VP2NS1NP1VP1/VP2
CBJ030KM464730HBoV176.374.276.479.077.669.879.9
HZ1402KP710212HBoV176.374.376.278.977.669.880.0
KU3JQ411251HBoV176.374.476.478.777.669.879.7
Mty1117KX373885HBoV176.374.476.478.877.669.879.7
P214KX373884HBoV176.274.476.478.577.669.879.9
Nsc10-N386JQ964116HBoV2A99.299.099.499.599.510099.9
UK-648FJ170280HBoV2A98.998.799.499.099.510099.4
W153EU082213HBoV2A98.998.899.499.099.510099.6
CU54THGU048663HBoV2B97.598.199.796.299.210097.5
LZFB080KM624025HBoV2B97.097.599.595.699.110097.2
NI-327FJ973559HBoV2B96.097.696.694.299.299.196.4
NI-213FJ973560HBoV2B95.897.496.894.299.299.196.3
SH3FJ375129HBoV2B96.697.297.595.698.998.698.8
46-BJ07HM132056HBoV379.974.575.887.774.370.290.9
CU2139UKGU048665HBoV380.074.576.287.674.370.290.6
TUA21007FJ973562HBoV380.074.576.187.774.369.890.9
W471NC012564HBoV380.074.576.287.774.370.290.7
W855FJ948861HBoV380.074.676.187.674.369.890.7
CMHS01111KC461233HBoV489.491.492.087.193.089.890.4
HBoV4-NI-385FJ973561HBoV488.690.487.887.690.683.390.3

4. Results

4.1. Phylogenetic and Similarity Analysis

The full-length genome of CUK18 was 5,107 bp long. Phylogenetic analysis was performed to evaluate the genetic relationships between the HBoV CUK18 strain and 20 reference strains. Following the analysis of the full-length sequences, CUK18 was found to cluster with three HBoV2A strains: Nsc10-N386, UK-648, and W153 (Figure 1). CUK18 exhibited the highest similarity to the HBoV2A strain Nsc10-N386 from Russia (Table 2). The nucleotide similarity between the two strains was 99.2%. The CUK18 strain was therefore classified into the HBoV2A genotype.

Phylogenetic analysis of the complete genome sequences of HBoV CUK18 and reference strains. Representative strains are referred to by “genotype strain country (detection year)”. The closed circle indicates the novel HBoV CUK18 strain analyzed in this study.
Phylogenetic analysis of the complete genome sequences of HBoV CUK18 and reference strains. Representative strains are referred to by “genotype strain country (detection year)”. The closed circle indicates the novel HBoV CUK18 strain analyzed in this study.

The nucleotide sequences of the three large ORFs (NS1, NP1, and VP1/VP2) of the HBoV strains were compared (Figure 2A). In the analysis of the NS1 sequence, CUK18 again clustered with the three HBoV2A reference strains, with similarities ranging from 98.7% to 99.0% (Table 2). In contrast, similarities between CUK18 and the other 17 reference strains ranged from 74.2% to 98.1%. In the analysis of the NP1 gene, several nucleotide substitutions were found between the HBoV2A and HBoV2B strains. The NP1 nucleotide sequences of the four HBoV2A strains and two HBoV2B strains (CU54TH and LZFB080) were clustered together. When comparing CUK18 with the 20 reference strains, the nucleotide similarity of the NP1 gene ranged from 75.8% to 99.7% (Table 2). With respect to the ORF3 sequence, which encodes for the VP1 and VP2 genes, CUK18 and the three HBoV2A reference strains (Nsc10-N386, UK-648, and W153) clustered together, with similarities ranging from 99% to 99.5% (Figure 2A). In contrast, the CUK18 strain exhibited low sequence similarity with the five HBoV1 strains (CBJ030, HZ1402, KU3, Mty1117, and P214) with values ranging from 78.5% to 79.0% (Table 2). The nucleotide sequence similarities of CUK18 with the other 12 reference strains ranged from 87.1% to 96.2% for ORF3.

Phylogenetic analysis of the nucleotide (A) and amino acid (B) sequences of NS1, NP1, and VP1 of HBoV CUK18 and reference strains. Representative strains are referred to by “genotype strain”. Closed circles indicate the novel HBoV CUK18 strain analyzed in this study.
Phylogenetic analysis of the nucleotide (A) and amino acid (B) sequences of NS1, NP1, and VP1 of HBoV CUK18 and reference strains. Representative strains are referred to by “genotype strain”. Closed circles indicate the novel HBoV CUK18 strain analyzed in this study.

The amino acid sequences of the three large ORFs (NS1, NP1, and VP1/VP2) of the HBoV strains were compared (Figure 2A). In the analysis of the NS1 sequence, CUK18 again clustered with the three HBoV2A reference strains, with similarities ranging from 99.5% (Table 2). In contrast, similarities between CUK18 and the other 17 reference strains ranged from 74.3% to 99.2%. The NP1 amino acid sequences of the four HBoV2A strains and two HBoV2B strains (CU54TH and LZFB080) were clustered together. When comparing CUK18 with the 20 reference strains, the amino acid similarity of the NP1 gene ranged from 69.8% to 100% (Table 2). With respect to the ORF3 sequence, which encodes for the VP1 and VP2 genes, CUK18 and the three HBoV2A reference strains (Nsc10-N386, UK-648, and W153) clustered together, with similarities ranging from 99.4% to 99.9% (Figure 2A). In contrast, the CUK18 strain exhibited low sequence similarity with the five HBoV1 strains (CBJ030, HZ1402, KU3, Mty1117, and P214) with values ranging from 79.7% to 80% (Table 2). The amino acid sequence similarities of CUK18 with the other 12 reference strains ranged from 90.3% to 98.8% for ORF3. Among the HBoV2A strains, there was a 100% similarity in NP1 and NS1 genes, but there was a difference in similarity in VP1/VP2 genes.

4.2. Open Reading Frame Analysis

The deduced protein sequences of the three ORFs were compared for the HBoV strains (Figure 2B). In the analysis of the NS1 protein sequence, HBoV2A strains clustered together and some minor amino acid sequence differences were revealed. Two amino acid substitutions were found in CUK18 at amino acid positions 378 and 612 of NS1, resulting in changes from asparagine (N) to aspartate (D) and glycine (G) to arginine (R), respectively. Analysis of the NP1 peptide sequence showed that the four HBoV2A strains and two HBoV2B strains (CU54TH and LZFB080) clustered together and exhibited identical amino acid sequences.

A comparison of the amino acid sequences of ORF3, encoding VP1 and VP2, revealed that CUK18 had the highest sequence similarity to the HBoV2A reference strain, Nsc10-N386, with 99.9% deduced protein sequence similarity for VP1. Comparing the CUK18 and Nsc10-N386 strains, a leucine (L) was changed to an arginine (R) at amino acid position 243 of VP1, which corresponded to amino acid position 111 of VP2. Among the four HBoV2A strains, three additional substitutions were identified at amino acid positions 29, 68, and 489 of VP1, reflecting substitutions from lysine (K) to glutamate (E), from aspartate (D) to asparagine (N), and from glutamate (E) to alanine (A), respectively.

Of the three ORFs, ORF3 showed the highest amino acid variation among the HBoV2A strains. Thus, the amino acid sequences of ORF3 were compared among HBoV1, HBoV2A, HBoV2B, HBoV3, and HBoV4 genotypes (Figure 3), with two representative strains selected for each genotype. When the amino acid sequences were aligned, a total of five hyper-variable regions were observed at amino acid positions 143 - 151, 206 - 213, 407 - 421, 444 - 471, and 631 - 643. Three of the five hyper-variable regions overlapped with the variable regions (VR) known to be the binding sites of HBoV antibody (26). In particular, the amino acid sequences from 444 to 471, corresponding to the VR-5 region, showed genotype-specific substitutions that can distinguish the five HBoV genotypes.

Comparison amino acid substitutions in ORF3 among HBoV genotypes. Two representative strains were selected for each HBoV genotype. Red vertical arrows indicate sites of amino acid substitutions. Amino acid sequences that differ among viral strains are highlighted.
Comparison amino acid substitutions in ORF3 among HBoV genotypes. Two representative strains were selected for each HBoV genotype. Red vertical arrows indicate sites of amino acid substitutions. Amino acid sequences that differ among viral strains are highlighted.

5. Discussion

Gastroenteritis in infants and children can be caused by various bacteria and viruses, such as Salmonella, Shigella, rotavirus, norovirus, and astrovirus (13, 21, 22, 27-29). Human bocavirus gastrointestinal infections account for approximately 6% of gastroenteritis cases (13). Human bocavirus frequently exist in co-infection with other microorganisms. The average co-infection rate is 46% in patients with gastrointestinal infections (13). Complicating the study of HBoV, there are no in vitro cell culture systems or in vivo animal models for HBoV (13). In addition, new variants and uncommon HBoV types have the potential to become dominant. Thus, a molecular diagnostic method is needed for the detection of HBoV infection. Although HBoV is not a dominant cause of gastroenteritis in Korea (21, 22), the accurate diagnosis for HBoV infection is important for reducing the burden of HBoV-related disease and corresponding hospital costs.

New HBoV strains are emerging rapidly due to a high mutation rate (19). Single-stranded DNA viruses like parvoviruses have mutation rates approaching those of RNA viruses (30). Human bocavirus 1 was initially isolated from respiratory specimens in 2005 (2). Human bocavirus 2, HBoV3, and HBoV4 were subsequently detected in stool specimens (2, 8-10, 13). In addition, recombination is frequently observed between HBoV strains. Previous studies have reported that recombination break points are located at the beginning of ORF1, encoding NP1, and ORF3, encoding VP1 and VP2 (10, 17). In this study, analysis of the full-length sequence of an HBoV isolate led to the identification of a new HBoV2A strain (CUK18). The ORF1 and ORF3 nucleotide sequences of CUK18 exhibited the highest similarities with those of strain Nsc10-N386, but the nucleotide sequences of ORF2, encoding NP1, exhibited the highest similarity with those of both HBoV2A and HBoV2B strains. Previous studies on the molecular epidemiology of HBoVs have used partial ORF sequences from one of the three ORFs (21-23). Our results indicated that at least two ORFs should be used to discriminate among HBoV strains. Full-length sequences are thus helpful for defining newly emerging HBoV strains.

Alignment analysis of VP1 showed frequent amino acid substitutions among HBoV genotypes (Figure 3). Of the five hyper-variable regions identified, three regions overlapped with known HBoV antibody binding sites (26). In particular, the VR-5 region, which corresponds to a three-fold-symmetry-related monomer, exhibited considerable amino acid differences among HBoV strains. Thus, the VR-5 regions should be further analyzed for evaluating genotype-specific pathogenesis and antigenicity.

To our knowledge, this is the second report of the full-length sequence of an HBoV2A variant isolated from a stool specimen in South Korea. CUK18 is very similar to the Korean HBoV isolate CUK-BC20 (MF680549) in reported recently. However, there is a relatively low similarity compared to Russian HBoV isolate Rus-Nsc10-N386, which were isolated in 2010 - 2011. Analysis of the NP1 amino acid sequence showed a high similarity (100%) with the Rus-Nsc10-N386 strain, while relatively low similarity with CUK-BC20 strain (99.5%). Similarly, more amino acid substitutions were seen when compared to CUK-BC20 strain than when compared to Rus-Nsc10-N386 strain. The result may have had a time impact rather than a regional one. However, this study was not able to be used for analysis due to the lack of full-length sequenced Korean strains.

We suggest that the full-length sequence of the CUK18 strain can be used as a standard for comparison with other HBoV strains. The complete genome sequence determined in this study provides valuable information for improving the diagnosis of HBoVs, which can cause gastroenteritis in infants and children. In addition, it may be helpful for predicting the emergence of HBoV variants in neighboring countries and developing an effective HBoV vaccine.

5.1. Conclusions

The full-length sequence of an HBoV variant isolated from a clinical sample in South Korea was determined. Phylogenetic analysis suggested that the newly isolated HBoV belonged to the HBoV2A genotype. Five hyper-variable regions distinguishing this HBoV genotype were found in the viral capsid protein.

References