The results of sequence alignments and phylogenetic trees (
Figure 1), predicted secondary and tertiary structures (3D structures) of ligand binding sites of distinct sequences for each year and percentage similarity/ divergence matrices (data not shown and is available upon request) showed that in 2006 to 2008, the sequences were identical and there was no change in ligand binding sites, type, and number.
Evolutionary relationships of individual coding regions from influenza A viruses sampled in Iran during 2006 to 2013. Length of each branch pair represents the evolutionary distance between sequence pairs. Scale indicates the number of substitution. MegalignTM constructed phylogram (DNAStar, Madison, WI).
In 2009, with respect to the occurrence of H1N12009 pandemic, by studying the available similarities and differences in alignments of sequences and by investigating the phylogenetic tree of 2009 isolates (
Figure 2), it was found that all influenza isolates are located in two entirely distinct and separated branches and form two different groups in terms of amino acid properties.
Evolutionary relationships of individual coding regions from influenza A viruses sampled in Iran during 2009. Length of each branch pair represents the evolutionary distance between sequence pairs. Scale indicates the number of substitution. MegalignTM constructed phylogram (DNAStar, Madison, WI).
The remarkable point is that, the first group, in terms of amino acid sequences and gene positions, is similar to the sequences of 2006 to 2008 isolates and in some cases is identical, while the second group is similar to the sequences of 2010 to 2013 isolates and in some cases is identical. This similarity, in terms of alignment analyses of amino acid sequences and the structure of phylogenetic trees between the two different groups in 2009, shows the importance of changes in the genomic structure of the virus in this year, which were made during possible mutations in genome, likely by occurrence of frame shift or gene shuffling, causing different sequences and a new group. This new group became dominant in the following years, with significant amino acid sensible changes in HA genome that resulted in isolates with different bioinformatics and clinical properties. Moreover, it was found that, in 2009, the distinct sequence, which had the highest similarity (approximately 99%) with the other sequences of this year (ADJ18175) also had the highest similarity with the sequences of the following year’s isolates (2010 to 2013), and the distinct sequence, which had the lowest similarity (approximately 77%) with the other sequences of 2009 (ADG23241), had the highest similarity with the sequences of previous years’ isolates (2006 to 2008) (
Figure 1). This shows the domination of 2009 pandemic strain from 2010 onwards.
By predicting and analyzing the secondary and tertiary structures in the amino acid sequences of 2009 and comparison of these structures, and analysis and comparison of regions and amino acid sequences in binding ligands and active sites of protein in molecular interactions between ligand and receptor, the differences between these two groups in amino acid sequences analysis (bifurcation of phylogenetic tree) of 2009 isolates became more clear. It was found that in ligand-binding site 1 with 12 ligands, amino acid positions 239- 240- 241, in two distinct sequences of 2009 isolates were changed from Asp- Arg- Gln to Asp- Gln- Glu (
Figure 3).
3D Ligand site visualization of the protein structure with the residues, which form part of the binding site, is colored blue. The ligands that formed the cluster used for the prediction are also displayed as wireframes.
Since 2010 to 2013, sequences have been identical and there is no change in ligand binding sites, type, and number of ligands. This result shows that the predominant 2009 pandemic strain is also dominant in these years. From 2009 onwards, some changes in ligand binding sites 2 and 3 of distinct sequences of each year were observed compared to previous years from Thr255 in site 2 to Ile183- Lys256, and from Asp292- Asn303 in site 3 to Asn293- Asn304 (
Table 1). There were also some changes in ligand binding sites, type, and number of ligands in 2012 in compare to previous and next years as follows: from Asp239- Gln240- Glu241 in site 1 to Lys236- Asp239- Arg240, and from Ile183- Lys256 with 6 ligands in site 2 to Ile183- Gly254- Lys256 with 7 ligands in site 2, and from Asn293- Asn304 with 4 ligands in site 3 to Lys39- Asn40 with 10 ligands in site 3 (
Table 2).
| Ligand Binding Site | Year of Isolation |
|---|
| 2006 - 2008 | 2009 - 2013 |
|---|
| 2 | THR255 | ILE183-LYS256 |
| 3 | ASP292-ASN303 | ASN293-ASN304 |
| Ligand Binding Site | 2012 |
|---|
| Change From | To |
|---|
| 1 | ASP239-GLN240-GLU241 | LYS236-ASP239-ARG240 |
| 2 | ILE183-LYS256 with 6 ligands | ILE183-GLY254-LYS256 with 7 ligands |
| 3 | ASN293-ASN304 with 4 ligands | LYS39-ASN40 with 10 ligands |
In the next phase of this study, the sequences of different isolates were studied in terms of position and number of glycosylation sites (ASN- X- SER/ THR, in which X can be any amino acid except proline). The presence of glycosylation sites, which are commonly conserved, mask the protein surface from recognition by an antibody. Addition of glycans to the HA is an important mechanism contributing to antigenic drift and therefore sustained circulation of influenza A virus in the human population (
20,
21). The sequences of 2006 to 2008 isolates have 10 glycosylation sites at positions: 27, 28, 40, 71, 104, 142, 176, 303, 497, and 556, except some cases (
Table 3). The sequences of 2009 to 2013 isolates have 8 glycosylation sites at positions: 27, 28, 40, 104, 293, 304, 498, and 557, except a few cases (
Table 3), in which some changes (creation/ deletion of glycosylation sites) have occurred. This implies a change in the number and location of glycosylation sites of 2009 pandemic strain, which became dominant in the following years.
| Year | Positions |
|---|
| 27 | 28 | 40 | 71 | 104 | 142 | 176/7 | 293 | 303/4 | 497/8 | 556/7 |
|---|
| 2006 | C | C | C | C | C | NHT→KHT | C | - | C | C | C |
| 2007 | C | C | C | C | C | C | C | - | C | C | C |
| 2008 | C | C | C | C | C | NHT→KHT | C | - | C | C | C |
| 2009 | C | C | C | KCN→NCS | - | NHD→NHT | KLS→NLS | NTT→DAK | C | C | C |
| 2010 | C | C | C | - | NGT→KWT | - | - | C | C | C | C |
| 2011 | C | C | C | - | C | - | - | C | C | C | C |
| 2012 | C | C | C | - | C | - | - | C | C | C | C |
| 2013 | C | C | C | - | NGT→KGT | - | - | C | NTS→NPS | C | C |
a C, Conserved; -, No glycosylation site position.
Position 240, which is located in receptor binding site, is one of the important positions in HA (
22). This position is conserved in sequences of 2006 to 2008 isolates, yet, from 2009 onwards, Q240R mutation has occurred in some isolates. It has been found that this mutation greatly increases infectivity of virus without affecting its antigenicity.
Positions that have the potential for being the active site to bind with cell-receptors, and form the structure of binding site in HA have been listed in
Table 4 (
23,
24). By evaluation and analysis of sequence alignments of each year, the conservation and variability of these positions were determined (
Table 4). It was found that, positions S153/ 4, N155/ 6, T168/ 9, H209/ 10, E211/ 12, and S274/ 5 are conserved in the period of 2006 to 2008, yet, from 2009 onwards, they were changed to S154P, N156A, T169V, H210Q, E212A, and S275E, respectively. In other words, in 2009, these positions were changed as mentioned above and these changes have remained conserved from 2010 to 2013. It was found that in 2009, in some cases, this mutation had not occurred.
| Year | Positions |
|---|
| Y115 | G147/8 | S153/4 | N155/6 | W166/7 | T168/9 | L207/8 | Y208/9 | H209/10 | E211/12 | S274/5 |
|---|
| 2006 | C | C | C | C | C | C | C | C | C | C | C |
| 2007 | C | C | C | C | C | C | C | C | C | C | C |
| 2008 | C | C | C | C | C | C | C | C | C | C | C |
| 2009 | C | C | 154P | 156A | C | 169V | C | C | 210Q | 212A | 275E |
| 2010 | C | C | 154P | 156A | C | 169V | C | C | 210Q | 212A | 275E |
| 2011 | C | C | 154P | 156A | C | 169V | C | C | 210Q | 212A | 275E |
| 2012 | C | C | 154P | 156A | C | 169V | C | C | 210Q | 212A | 275E |
| 2013 | C | C | 154P | 156A | C | 169V | C | C | 210Q | 212A | 275E |
Another investigated conserved region is the fusion peptide region. This region is critical for viral fusion function and is composed of 23 amino acid residues, including several large hydrophobic and several glycine residues interspersed throughout the sequence. The fusion peptide sequence GLFGAIAGFIEGGWTGMVDGWYG is extremely well conserved in H1N1 viruses (
24). It was found that, the peptide fusion is well conserved in sequences of 2006 to 2013 isolates of Iran.
Another important position that was found and detected in pandemic 2009 H1N1 viruses, yet, not in previous H1N1 human viruses was V169. This position is associated with receptor binding specificity of HA (
24). It was found that this position is conserved from 2009 to 2013 and may acquire V169T change in the human population in the near future. In 2009 this mutation was observed in Iran. It has been found that, D239G substitution in the HA of the pandemic 2009 H1N1 virus would increase the severity of disease. This substitution is associated with change of receptor binding affinity (
25). It was found that, this position is almost conserved in 2009 to 2013 isolates except a few cases in each year, in which this substitution has occurred.
Molecular and phylogenetic analyses were also performed to reveal the relationship of changes of each year’s isolates to the vaccine strains (eg, A/California/07/2009 (H1N1), which is the standard vaccine strain from 2010 onwards). It was found that all changes and obtained results of each year are in compliance with and observed in standard vaccine strains of that year and the seasonal isolates turn out to be closely related to the corresponding vaccine strains. This means that studying and identifying antigenic changes of the virus in each year can be used for the prediction of changes in the future.