The sugar residues decorating the spike (S) protein of SARS-CoV-2 are essential to the viral-receptor binding. A new study explores the glycosylation of the S protein and its functional implications, especially concerning the rapidity of viral transmission. The study was published in the preprint server bioRxiv* in December 2020.
Image Credit: Kateryna Kon/Shutterstock.com
The S protein exists as a homotrimer, with S1 and S2 subunits that mediate viral attachment to the host receptor and viral-cell membrane fusion, respectively. Earlier research has found 22 potential N-glycosylation sites in the spike protein of the SARS-CoV and 23 N-glycosylation sites with MERS-CoV. The current study sought to establish similar sites on SARS-CoV-2.
The researchers obtained 1,169 representative sequences from the NCBI database on April 15, 2020. They used the NetNGlyc 1.0 Server to predict N-glycosylation sites. Subsequently, they conducted a more detailed analysis of 49 representative sequences from seasonal as well as highly pathogenic coronaviruses, to explore the phylogeny.
While there are several complete genomic sequences available, the peripheral elements of these genomes are less clear. The investigators thus attempted to carry out homologous modeling on one selected spike sequence. They created a 3D structure of the protein with 99.9% similarity to a reported S protein and carried out molecular dynamics (MD) simulations to optimize the structure.
They also tested glycan ligand candidates for the spike protein using docking analysis. The ligands included sugars such as mannose, glucose, galactose, N--acetyl-β-D-galactosamine, and sialic acid, with terminal structures at the N-glycan sites, as well as disaccharides, blood group B antigens, and others. The glycosylated ACE2-Spike complex was again passed through MD simulations.
SARS-CoV-2 is genomically very similar to the bat coronavirus (CoV), with more than 99% similarity, and more than 90 viral genomes were uploaded to the NCBI database. The S protein of SARS-CoV-2 is 99.9% similar to that of the latter virus. This protein is about 1,273 residues in length, being encoded in a genome of about 30 k base pairs.
They found 22 potential N-linked glycosylation sites on the SARS-CoV-2 spike protein. The N-X-S/T (where X cannot be proline sequon is the motif for N-glycosylation. The primary 14-sugar glycan is attached here within the endoplasmic reticulum at first but then processed as the protein undergoes folding post-transcription.
NTD Abundant in N-Glycosylation Sites
The NTD houses 8 of these sites showing that it is a prime target and exposed to host immune surveillance, while the N-glycans could shield the epitope. Of these, N343 is atop the RBD, N331 is at the RBD-SD1 linker, while another is near the S1/S2 protease cleavage site (and is identical in all these viruses). N801 may help shield the cleavage site, but many others are at the bottom of the S proteins.
The RBD-ACE2 crystal structures have already been reported. The current study showed that 8 tyrosine residues in the RBD interact non-covalently with the N-terminal helix of the ACE2 receptor. Most of these tyrosine residues are highly conserved in the subgenus of coronaviruses.
Glycan-Protein Recognition in Viral Invasion
Glycan-protein recognition is a common phenomenon that underlies viral invasion. For instance, sialic acids bind to the hemagglutinin receptors in influenza type A. In the current work, therefore, the researchers conducted a docking analysis of the S-glycans.
Comparing the binding energies of five different spike proteins with a set of glycans, they found the NL63-3-sialyl lactose binding had the highest binding ability. The lowest binding affinity was with SARS-CoV-2-X4X4X4X and NL63-Fuc, at (-4.2 kcal/mol vs -9.4 kcal/mol with the former). Overall, the best binding was with sialoglycans and galactose-related glycans.
The investigators concluded that there could be three potential glycan-recognized domains (GCDs) that have lower binding energies, close to the NTD. These have peripheral elements like a small helix, β-sheet, or loop elements. The first of these, on the β-sandwich core, shows a stronger affinity for SA-related and Gal-related glycans. It comprises a smaller helix, two smaller β-sheets (one 5-stranded and one 6-stranded), and long loops.
The most important residues on the lower β-sheets are E155, V157, K115, and V113, while the Y146, W138 residues on Helix140 take part in hydrogen bond formation in the binding pocket. These bonds are mediated by the SA and galactose residues.
The second GCD is on the outermost angles of the trimer and has a high affinity for sialoglycans. It contains Loop10, Loop250 and Helix150, and receptor-ligand interactions. The key receptor-ligand interactions are mediated by V16, S254, and E154 residues.
The third GCD has a high affinity to some sialoglycans and is located between the other two. Its key residues include T109, D111, K113, and R457, and are unique to this virus. Several N-glycosylation sites are located at the edge of the GCDs.
N-Glycans and RBD-ACE2 Interactions
ACE2 is highly glycosylated, with six N-glycosylation sites and potential O-glycosylation sites. Five N-glycosylated sites may be located around the interface with the viral RBD. Thus, there are altogether 14 such sites at this region, including the five mentioned above, two in the RBD, and eight in the adjacent spike monomer. Of these, the closest to the RBD center are N322, N90, N122 in ACE2, and N165 in the RBD.
The N-glycans in SARS-CoV-2 may have high concentrations of mannose, or they may be hybrid or complex N-glycans. Oligomannose glycans are most commonly located at the lower end of the S-protein, and the N234 is adjacent to the RBD. 16% of the glycans at the 22 N-linked glycosylation sites have one or more sialic acid residues, but almost half have fucose residues.
Most of the N-glycosylation sites are dispersed on the outer surface of the spike protein, but none are in the embedded alpha-helices. The high degree of conservation of these N-glycans suggests that they shield vital regions. Alternatively, mutations at the RBD or cleavage sites may facilitate zoonotic spread, or alter the type of host cell to which it binds.
By analyzing their location in S-protein, the functions of N-glycosylation sites mainly involve: 1) Protein folding, 2) Protecting the antigen sites, 3) Protecting the S1/S2 cleavage site, 4) Protecting the C-terminal tail, 5) Affecting the receptor-ligand interaction.”
N-Glycans may Mediate Viral Invasion
The mechanism of complex formation is as earlier observed, with the N-terminal helix of the ACE2 lying on the RBD oriented towards the adjacent NTD. This puts the N90 N-glycan on the ACE2 on the adjacent NTD, while its terminal residues form direct interactions with the first GCD.
When a complex N-glycan was added to the N-glycosylation sites in the spike and receptor, in a glycosylated ACE2-S complex simulation, the researchers found that the N-glycan was close to one of the GCDs. The simulation showed a swinging of the N-glycans around their sites, while the terminal N-glycan residues showed greater flexibility.
When the RBD in this complex was superposed on the S trimer at the same domain, in either vertical or horizontal conformation, the N-glycans at N90 were brought into frequent contact with the GCD. This suggests that perhaps the viral entry occurs by two different mechanisms during this ACE2-RBD binding. The multiple tyrosines may strengthen the non-bond interactions, while the galactose or sialic acid residues on the N-glycan enhance the binding affinity by adding their interactions, albeit weaker.
The researchers drew attention to the potential role of Galβ1-3GlcNAc structures that are slowly accumulated in chronic disease states such as type 2 diabetes or hypertension, in the higher risk of SARS-CoV-2 infection among such patients. This could be due to the increased binding affinity of one of these GCDs to Gal-related glycans such as the above.
They suggest that some mutations at glycosylation sites in the proximity of the RBD could result in a larger area of exposure, promoting its binding to ACE2 binding and thus facilitating viral invasion.
What are the Implications?
The researchers have drawn a picture of the distribution and the functions of the N-glycosylation sites in SARS-CoV-2, and the potential GCDs in its spike protein. They indicate the possibility of a dual binding mechanism, because of the high N-glycan density around the RBD-ACE2 interface, via protein-protein interaction between the RBD and ACE2 proteins, and glycan-protein interaction between the N-glycan and the first GCD1.
During invasion, the RBD of S-protein bond to the N-terminal helix of ACE2, meanwhile, the N-glycan terminal (especially the SA and GalNAc residues) at N90 of ACE2 is apt to bind the GCD1 in S-protein.”
This may result in a conformational switch of the spike protein. This could shed light on the rapid and extensive spread of the virus if confirmed and elucidated by further studies.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.
- Chen, W. et al. (2020). The N-glycosylation sites and Glycan-binding ability of S-protein in SARS-CoV-2 Coronavirus. bioRxiv preprint. doi: https://doi.org/10.1101/2020.12.01.406025. https://www.biorxiv.org/content/10.1101/2020.12.01.406025v1