4.1. Homology Modeling Results
Reference sequence alignment of the spike protein with the NCBI Reference Sequence YP_009724390.1 and the sequence of the PDB file 7kj2 indicated that missing loops and residues are present in these structures. Here, homology modeling was applied using MODELLER v9.25 (
http://salilab.org/modeller/) to find the complete 3D structure of the trimeric viral spike protein. The sequence of the spike protein (YP_009724390.1) was used as the primary sequence, and the spike structure in the 7kj2 PDB file was used as the template for MODELLER. Moreover, the structure of the ACE2 protein was extracted from the PDB file of 7kj2.
The structure of the spike protein of SARS-CoV-2 (PDB: 7kj2) was superimposed on the MODELLER model of the complete 3D structure of the trimeric viral spike protein in Chimera using the MatchMaker structural-alignment command tool. The root mean square deviations (RMSD) between 946 pruned atom pairs was 0.567 angstroms (across all 973 pairs: 0.897) (Appendix 1a). These results indicated that some of the turns and loops were deleted in the crystallographic structure of the SARS-CoV-2 spike protein.
To investigate the 3D structure of the UK SARS-CoV-2 spike protein with multiple mutations, including deletion 69 - 70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H, MODELLER v9.25 was used.
The structure of the non-mutated spike of SARS-CoV-2 was superimposed on the MODELLER model of the UK SARS-CoV-2 spike protein with the same multiple mutations in Chimera using the MatchMaker structural-alignment command tool. The RMSD between 1,104 pruned atom pairs was 0.260 angstroms (across all 1,117 pairs: 0.540) (Appendix 1b).
The 3D structure of the wild-type and UK SARS-CoV-2 spike protein with multiple mutations, including deletion 144, A570D, T716I, D614G, N501Y, P681H, deletion 69-70, S982A, and D1118H, is shown using VMD in Appendix 1c.
4.2. Residues Selection
Using LigPlot software, critically essential residues involved in interactions between the SARS-CoV-2 spike protein and the ACE2 protein were examined. Results indicated that residues such as Tyr449, Gly496, Gly502, Asn487, Ala475, Lys417, Thr500, Tyr489, and Phe486 of the SARS-CoV-2 CTD spike interact with residues such as Asp38, Lys353, Tyr41, Gln24, Tyr83, Asp30, Thr27, and His34 of ACE2. The complete results are shown in Appendix 1d.
To investigate the interaction between the mutated and wild-type SARS-CoV-2 CTD spike with ACE2, the two proteins were docked into the specific sites of residues such as Tyr449, Gly496, Gly502, Asn487, Ala475, Lys417, Thr500, Tyr489, and Phe486 of the SARS-CoV-2 CTD spike with residues such as Asp38, Lys353, Tyr41, Gln24, Tyr83, Asp30, Thr27, and His34 of ACE2 using HADDOCK.
Residues involved in interactions between the spike protein and ACE2 protein in the HADDOCK results were examined using LigPlot software (Appendix 1e). The results indicated that the residues involved in spike-ACE2 interaction in the wild-type form of the docked results are very similar to those in the 7kj2 structure. The complete results are shown in Appendix 1e.
4.3. Molecular Dynamics Simulation of Docked Complexes
The MD simulation of two forms of monomeric wild-type and mutated SARS-CoV-2 spike proteins was carried out in 50,000,000 steps (100 ns). To analyze the stability and behavior of the two different forms of the monomeric SARS-CoV-2 spike protein complex with ACE2, a MD simulation was conducted over the course of 50,000,000 steps (100 ns). To compare the two structures, various analyses were conducted. Using LigPlot software, critically essential residues involved in interactions between the SARS-CoV-2 spike protein and ACE2 protein were examined in each 10 ns of the MD simulation results for both the wild-type and mutated SARS-CoV-2 spike-ACE2 complexes (Appendix 2). The results indicated that hydrogen bonds, hydrophobic interactions, and double/triple bonds were more prevalent in the mutated spike-ACE2 complex.
4.4. Root Mean Square Deviations
An important parameter for assessing the conformational stability of proteins and protein-protein complexes in a dynamic state during simulation is the RMSDs between C-α atoms on the protein backbone. RMSDs were calculated by comparing each frame to the initial conformation, and plots of RMSD against simulation time were used to analyze overall system behavior. Consistent fluctuations with low levels of RMSD throughout the simulation suggest that the system has reached equilibrium and stabilization. Conversely, higher fluctuations in RMSD indicate lower stability of the system. Additionally, significant deviations in RMSD graphs may suggest major conformational changes occurring within the protein as it attempts to achieve a stable configuration with its receptor site.
The RMSDs for the wild-type spike protein backbone Cα atoms, the mutant spike protein backbone Cα atoms, and the wild-type spike- and mutant spike-ACE2 complexes were calculated and produced as a graph depicted in Appendix 3. Appendix 3 revealed significant discrepancies in the RMSD observed between the wild-type monomeric spike protein and its mutated counterpart (a), the wild-type spike-ACE2 complex and the mutant spike-ACE2 complex (b), ACE2 in the wild-type spike-ACE2 complex and ACE2 in the mutant spike-ACE2 complex (c), and the active site of the wild-type spike and the active site of the mutant spike in the spike-ACE2 complex (d).
The RMSD value of the monomeric spike protein with multiple mutations, including deletion 69-70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H, increased gradually from 20 ns to 100 ns. This suggests that there was evidence of instability in the protein due to mutations in its spike. In contrast, the graph showed consistent and stable patterns without significant fluctuations observed across different time intervals among the trajectories. These results imply that the protein backbone underwent major structural perturbations in the mutant form of the spike. The RMSD of the wild-type monomeric spike model increased suddenly and stabilized at 80 ns. Fluctuations were observed during the simulation and then suddenly increased at 32, 60, 82, and 100 ns to 2.4, 2.7, 2.9, and 3.2 nm in the mutant form of the spike protein, whereas for the wild-type monomeric spike, fluctuations were observed at 31 ns and 45 ns. Based on these findings, it can be concluded that the wild-type monomeric spike protein maintained a consistent conformation throughout the simulation, with only minimal changes in structure occurring (Appendix 3a).
The root mean square deviations were computed for the protein backbone Cα atoms of both the wild-type spike-ACE2 complex and the mutant spike-ACE2 complex, which includes multiple mutations such as deletion 69-70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H in the spike protein. These calculations resulted in a graph shown in Appendix 3b, illustrating the range of RMSD values obtained. The magnitude of these RMSDs varied greatly, with a minimum value of 0.0000047 nm and a maximum value of 3.29 nm when comparing the interactions between the wild-type spike and ACE2 versus the mutant spike and ACE2, respectively.
In the complexes of wild-type spike-ACE2 and mutant spike-ACE2, the data showed stable trajectories with small fluctuations. This suggests that there were minor changes in the structural configurations of these complexes. The mutant spike-ACE2 complex exhibited lower RMSD values after 10 ns until 100 ns compared to the wild-type spike-ACE2 complex. This was a strong indication of conformational stability in the mutant spike with deletions and mutations in complex with the ACE2 ligand, relative to the wild-type spike-ACE2.
Moreover, noticeable differences in the paths taken by the wild-type spike-ACE2 and mutant spike-ACE2 complexes were observed at different time intervals. Specifically, after 10 ns, the RMSD fluctuation of the wild-type and mutant spike-ACE2 complexes became much more prominent, indicating substantial conformational and structural changes. The variations in movement were initially detected at 10 ns, with the RMSD of the two complexes demonstrating a gradual increase for the wild-type spike-ACE2 and a subsequent decrease for the mutant spike-ACE2. The fluctuations ranged from 1.1 nm upwards and gradually decreased accordingly.
The experimental findings showed that the spike-ACE2 complex with the mutations of deletion 69-70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H maintained a consistent conformation throughout the simulation, experiencing minimal changes in structure. Appendix 3c illustrates the range of RMSD values for ACE2 in both the wild-type spike-ACE2 complex and the mutant spike-ACE2 complex, indicating that there were stable trajectories with small fluctuations.
The RMSD of the backbone Cα atoms for the active site of the wild-type spike and the active site of the mutant spike in the spike-ACE2 complex indicated that the variations in movement were initially detected at 30 ns. The RMSD values of the two complexes demonstrated a gradual increase for the active site of the mutant spike in the spike-ACE2 complex and a subsequent decrease for the active site of the wild-type spike in the spike-ACE2 complex (Appendix 3d).
4.5. Root Mean Square Fluctuation
Every individual amino acid in the protein is crucial for attaining a stable structure of both the protein itself and the receptor-ligand complex. The extent of stability can be assessed by analyzing the root mean square fluctuation (RMSF) of the Cα atom. By plotting RMSF against residue number, we can determine the average fluctuation experienced by each amino acid throughout the MD simulation.
The graph of RMSF shows the fluctuation values on the y-axis against the residues depicted on the x-axis. The plot demonstrates different levels of flexibility for each residue in the spike protein. Higher RMSF fluctuations indicate increased flexibility, suggesting a greater potential for interaction. Similarly, residues of the spike protein with low flexibility indicate their decreased potential to interact with the ACE2 receptor molecule. The RMSFs for the residues of both wild-type and mutant spike proteins, as well as the wild-type spike-ACE2 complex and the mutant spike-ACE2 complex, were computed. These calculations yielded a graph presented in Appendix 4, depicting notable fluctuations in flexible residues with prominent peaks.
For the monomeric wild-type spike and mutant spike protein with multiple mutations, including deletion 69 - 70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H, it is evident that the fluctuation of the mutant spike protein structure is greater compared to the wild-type spike structure in the RMSF graph during the MD simulation. The protein residues involved in ligand binding, specifically those ranging from residues 370 to 510 in the mutant spike, exhibit higher RMSF values, implying greater fluctuation in these protein regions.
In the mutant spike protein, fluctuations were observed in regions from residue indices 282 - 484, 500 - 540, 547 - 606, 608 - 692, 708 - 733, 756 - 778, 849 - 898, 934 - 977, 996 - 1024, and 1037 - 1117. The highest fluctuations were noted between residues 313 - 325, 409 - 424, 443 - 480, 501 - 520, 667 - 689, 939 - 967, and 1044 - 1117, suggesting these regions could be affected by multiple mutations. Specific residues such as THR316, ASP376, VAL416, GLN469, and LYS1057 in the mutant spike protein displayed pronounced fluctuations. In contrast, the wild-type spike exhibited more fluctuations in residues SER433 and LEU815. It is clear that there is a higher fluctuation in the mutant spike protein compared to the wild-type spike. In other words, as observed from the RMSF plot, the mutant spike protein structure is less stable compared to the wild-type spike structure (Appendix 4a).
Appendix 4b illustrates the analysis of RMSF trajectories for the spike in both the wild-type ACE2 complex and the mutant spike-ACE2 complex. In the case of the spike-ACE2 complex, the RMSF analysis revealed that certain residues of the spike protein deviated from their average structure upon interaction with ACE2 receptors. In the wild-type spike-ACE2 complex, fluctuations were observed in regions from residue indices 313-564 in the wild-type spike protein. These fluctuations imply that these residues could be involved in protein-receptor binding. On the other hand, in the mutant spike-ACE2 complex, fluctuations were low in the RMSF plot for the spike protein with multiple mutations including deletion 69-70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H.
Additionally, the RMSF of the backbone Cα atoms of ACE2 in both the wild-type spike-ACE2 complex and the mutant spike-ACE2 complex are shown in Appendix 4c. In the case of the mutant spike-ACE2 complex, fluctuations were also low in the RMSF plot for ACE2. These results indicate that the mutant spike-ACE2 complex is more stable than the wild-type spike-ACE2 complex.
4.6. Radius of Gyration
In order to examine the impact of multiple mutations (deletion 69 - 70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H) on the compactness of spike proteins and their complex with the ACE2 receptor, we conducted calculations for the radius of gyration (Rg). Radius of gyration is a metric that indicates the distance between atoms in a system relative to its center or axis. By plotting Rg over simulation time, it provides insights into the compactness of protein structures influenced by various combinations of secondary structures. The stability of protein folding can be inferred from a relatively consistent Rg value throughout; however, in unfolded proteins, the Rg will change over time.
In addition to the analysis of the compactness and folding of the unbound protein, the Rg parameter was also used to analyze the compactness of the protein-ligand complexes. Weak intermolecular bonds promote a loss of compactness in the system and affect the stability of the complex. When the interactions between the ligand and protein are weaker, the compactness of the protein-ligand complex decreases, resulting in a higher Rg value for the complex.
Figure 1A illustrates how the Rg of the mutant monomeric spike changes over time compared to the wild-type monomeric spike. The graph depicting the Rg values over time clearly demonstrates that the mutant monomeric spike consistently remains in an unstable compact (unfolded) state throughout the 100 ns duration. This suggests that the protein's Rg is highly volatile in the case of the mutant variant. The Rg value in the complex ranges between 3.8 and 4.9 nm, and it gradually balances at 55 ns.
The Rg trajectories of the wild-type and mutant spike-ACE2 complexes were also investigated. As shown in
Figure 1B, it is evident that the mutant spike-ACE2 complex is more compact compared to the wild-type spike-ACE2 complex. From the beginning of the simulation process, it can be concluded from
Figure 1 that the mutant spike-ACE2 complex has lesser compactness compared to the wild type, and at the end of the simulation, higher fluctuations indicate a loss of compactness.
Radius of gyration (Rg) of the backbone Cα atoms of wild-type spike and mutant spike protein (A) and the wild-type spike-ACE2 complex and mutant spike-angiotensin-converting enzyme 2 (ACE2) complex (B).
4.7. Solvent Accessible Surface Area Analysis
The solvation free energy of a protein is determined by the way polar and non-polar residues interact with each other. The solvent molecule's probe monitors the surface area of the protein to find and assess the vdW surface of the protein. The solvent-accessible surface area (SASA) has been used for predicting changes in solvent-accessible area induced by residue changes or ligand interactions. Among all residues, hydrophobic residues are primarily responsible for increasing the SASA value. The residue SASA value, an important parameter in SASA analysis, provides an understanding of protein conformational changes per residue contribution. Indeed, the interaction of the protein and protein-ligand complex with various solvents, or the interaction of the protein alone with both various solvents and ligands, primarily depends on their surface properties.
The last 100 ns trajectory was used for the calculation of the SASA value, as plotted in
Figure 2. Results indicated that the values fluctuate more in the mutant spike protein and wild-type spike-ACE2 complex than in the wild-type spike protein and mutant spike-ACE2 complex, respectively, indicating greater compactness of the latter. Hence, we analyzed the average SASA value for all the systems. The average SASA values for the wild-type spike protein, mutant spike protein, wild-type spike-ACE2 complex, and mutant spike-ACE2 complex were 562.09, 564.39, 827.19, and 821.78 nm², respectively. The findings from this study suggest that the simultaneous occurrence of multiple mutations, including deletions at positions 69 - 70 and 144, along with N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H on the spike protein, has a minimal impact on the exposed surface area of its structure.
The SASA of the four structures wild-type spike protein, mutant spike protein, wild-type spike- angiotensin-converting enzyme 2 (ACE2) complex and mutant spike-ACE2 complex.
4.8. Hydrogen Bond Analysis
The formation of hydrogen bonds between the ligand and the protein is a crucial factor in substrate recognition and in maintaining the stability of the ligand-protein complex. A hydrogen bond occurs when there is an angle of less than 30° between the hydrogen donor and acceptor, with a distance shorter than 0.3500 nm. To enhance our visual analysis of the hydrogen bonding between the spike protein and ACE2 protein during the simulation, we examined the number of observed hydrogen bonds, as depicted in
Figure 3. The number of hydrogen bonds formed between the spike protein and the ACE2 protein was analyzed in both the wild-type and mutant spike-ACE2 complexes. The results show that during the simulation period of 8000-9500 ns, the wild-type complex exhibited a range of 4 to 13 hydrogen bonds, with an average of approximately 8 hydrogen bonds. On the other hand, in the mutant complex, at least 3 hydrogen bonds were observed, with a maximum of 13 bonds. The average number of hydrogen bonds decreased to approximately 7 during the 8000 - 9500 ns simulation in the mutant spike-ACE2 complex. The hydrogen bond formation between the spike and ACE2 proteins emphasized that they had a strong and stable binding before the multiple mutations of deletion 69-70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H occurred in the spike protein.
H-bond formed by spike and angiotensin-converting enzyme 2 (ACE2) in wild-type spike-ACE2 complex and mutant spike-ACE2 complex.
4.9. Principal Component Analysis
Using principal component analysis (PCA), an analysis of eigenvectors and eigenvalues was performed. In MD simulations, PCA obtains the associated motion of the protein and protein-ligand complex to define the protein's function. In the calculation of the principal components (PCs) or essential dynamics (ED), the conformational space of the protein, which contains a few degrees of freedom despite the protein's motion, was identified.
The total movement in the system is equal to the combined values of the eigenvalues, making it a useful tool for assessing the protein's flexibility under various conditions. GROMACS offers a feature that allows for calculating eigenvectors to analyze protein motions before and after applying multiple mutations (deletion 69-70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H) in the spike protein-ACE2 complex. The principal motion was used to characterize protein motion by eigenvector and eigenvalues. Hence, in this study, PCA was performed to analyze the atomic motions of wild-type spike protein, mutant spike protein, wild-type spike-ACE2 complex, and mutant spike-ACE2 complex using Cartesian coordinates of the Cα atoms. We plotted the eigenvalues with the eigenvectors for the 1500 eigenvectors during the 8000 - 9500 ns of the simulation in
Figure 4.
The principal component analysis (PCA) of wild type monomeric spike (A), mutated monomeric spike (B), wild type spike- angiotensin-converting enzyme 2 (ACE2) compelexes (C) and mutated spike-ACE2 complexes (D).
Due to the clear depiction of the calculation of eigenvalue vs. eigenvector, the result was plotted in
Figure 4A, selecting only the first 40 eigenvectors for wild-type spike protein, mutant spike protein, wild-type spike-ACE2 complex, and mutant spike-ACE2 complex. The mutant spike-ACE2 complex showed very high motion compared to the wild-type spike-ACE2 complex, while the wild-type spike protein and mutant spike protein exhibited significantly less motion. This indicates that the multiple mutations—deletion 69 - 70, deletion 144, N501Y, A570D, T716I, D614G, P681H, S982A, and D1118H—in the spike-ACE2 complex induce structural changes and motions in the protein. Furthermore, both the wild-type spike-ACE2 complex and mutant spike-ACE2 complex demonstrated substantial motions compared to the wild-type spike protein and mutant spike protein.
To clearly understand the motions induced by the multiple mutations, the correlated motions (averaged) were calculated for 1500 eigenvectors during the 8000 - 9500 ns of the simulation. The wild-type spike protein, mutant spike protein, wild-type spike-ACE2 complex, and mutant spike-ACE2 complex showed 4%, 3%, 11%, and 8% correlated motions, respectively.
4.10. Binding Free Energy Calculation by g-mm-pbsa
The GROMACS molecular mechanics Poisson-Boltzmann surface area (G-MM-PBSA) method was used to calculate the binding free energy of the wild-type spike-ACE2 complex and the mutant spike-ACE2 complex during the simulation. The results are shown in
Table 1. The highest interaction energy was associated with the mutant spike-ACE2 complex.
| Variables | VdW (kj/mol) | Elec (kj/mol) | Total (kj/mol) | Hbond |
|---|
| Wild-type spike-ACE2 complex | -277.49 | -893.97 | -1171.47 | 8.37 |
| Mutant spike-ACE2 complex | -300.98 | -1352.22 | -1653.20 | 7.38 |
Abbreviation: ACE2, angiotensin-converting enzyme 2.