Why COVID-19 is man made


SUBMITTED BY: anun21

DATE: March 31, 2020, 8:54 p.m.

FORMAT: Text only

SIZE: 54.3 kB

HITS: 751

  1. Contents lists available at ScienceDirect
  2. Antiviral Research
  3. journal homepage: www.elsevier.com/locate/antiviral
  4. The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-
  5. like cleavage site absent in CoV of the same clade
  6. B. Coutarda, C. Valleb, X. de Lamballeriea, B. Canardb, N.G. Seidahc, E. Decrolyb,∗
  7. a Unité des Virus Émergents (UVE: Aix-Marseille Univ – IRD 190 – Inserm 1207 – IHU Méditerranée Infection), Marseille, France
  8. b Aix Marseille Université, CNRS, AFMB UMR 7257, Marseille, France
  9. c Laboratory of Biochemical Neuroendocrinology, Montreal Clinical Research Institute (IRCM, Affiliated to the University of Montreal), 110 Pine Ave West, Montreal, QC,
  10. H2W1R7, Canada
  11. A R T I C L E I N F O
  12. Keywords:
  13. 2019-nCoV
  14. SARS-CoV
  15. Spike protein
  16. Maturation protease
  17. Furin
  18. Antivirals
  19. A B S T R A C T
  20. In 2019, a new coronavirus (2019-nCoV) infecting Humans has emerged in Wuhan, China. Its genome has been
  21. sequenced and the genomic information promptly released. Despite a high similarity with the genome sequence
  22. of SARS-CoV and SARS-like CoVs, we identified a peculiar furin-like cleavage site in the Spike protein of the
  23. 2019-nCoV, lacking in the other SARS-like CoVs. In this article, we discuss the possible functional consequences
  24. of this cleavage site in the viral cycle, pathogenicity and its potential implication in the development of anti-
  25. virals.
  26. Human coronaviruses (CoV) are enveloped positive-stranded RNA
  27. viruses belonging to the order Nidovirales, and are mostly responsible
  28. for upper respiratory and digestive tract infections. Among them SARS-
  29. CoV and MERS-CoV that spread in 2002 and 2013 respectively, have
  30. been associated with severe human illnesses, such as severe pneumonia
  31. and bronchiolitis, and even meningitis in more vulnerable populations
  32. (de Wit et al., 2016). In December 2019, a new CoV (2019-nCoV) has
  33. been detected in the city of Wuhan, and this emerging viral infection
  34. was associated with severe human respiratory disease with a ~2–3%
  35. fatality rate (Li et al., 2020). The virus that was presumed to have in-
  36. itially been transmitted from an animal reservoir to humans possibly via
  37. an amplifying host. However human-to-human transmission has been
  38. reported, leading to a sustained epidemic spread with > 31,000 con-
  39. firmed human infections, including > 640 deaths, reported by the
  40. WHO in early February 2020. The estimated effective reproductive
  41. number (R) value of ~2.90 (95%: 2.32–3.63) at the beginning of the
  42. outbreak raises the possibility of a pandemics (Zhao et al., 2020). This
  43. prompted WHO to declare it as a Public Health Emergency of Inter-
  44. national Concern. This is especially relevant because so far there are no
  45. specific antiviral treatments available or vaccine. Based on its genome
  46. sequence, 2019-nCoV belongs to lineage b of Betacoronavirus (Fig. 1A),
  47. which also includes the SARS-CoV and bat CoV ZXC21, the latter and
  48. CoV ZC45 being the closest to 2019-nCoV. 2019-nCoV shares ~76%
  49. amino acid sequence identity in the Spike (S)-protein sequence with
  50. SARS-CoV and 80% with CoV ZXC21 (Chan et al., 2020). In this article,
  51. we focus on a specific furin-like protease recognition pattern present in
  52. the vicinity of one of the maturation sites of the S protein (Fig. 1B) that
  53. may have significant functional implications for virus entry.
  54. The proprotein convertases (PCs; genes PCSKs) constitute a family
  55. of nine serine secretory proteases that regulate various biological pro-
  56. cesses in both healthy and disease states (Seidah and Prat, 2012). By
  57. proteolysis, PCs are responsible for the activation of a wide variety of
  58. precursor proteins, such as growth factors, hormones, receptors and
  59. adhesion molecules, as well as cell surface glycoproteins of infectious
  60. viruses (Seidah and Chretien, 1999) (Table 1). Seven PCs cleave pre-
  61. cursor proteins at specific single or paired basic amino acids (aa) within
  62. the motif (R/K)-(2X)n-(R/K)↓, where n = 0, 1, 2, or 3 spacer aa (Seidah
  63. and Chretien, 1999). Because of their role in the processing of many
  64. critical cell surface proteins PCs, especially furin, have been implicated
  65. in viral infections. They have the potential to cleave specifically viral
  66. envelope glycoproteins, thereby enhancing viral fusion with host cell
  67. membranes (Izaguirre, 2019; Moulard and Decroly, 2000). In the case
  68. of human-infecting coronaviruses such as HCoV-OC43 (Le Coupanec
  69. et al., 2015), MERS-CoV (Millet and Whittaker, 2014), and HKU1 (Chan
  70. et al., 2008) the spike protein has been demonstrated to be cleaved at
  71. an S1/S2 cleavage site (Fig. 2) generating the S1 and S2 subunits. The
  72. above three viruses display the canonical (R/K)-(2X)n-(R/K)↓ motif
  73. (Table 1). Additionally, it has been demonstrated that variation around
  74. the viral envelope glycoprotein cleavage site plays a role in cellular
  75. tropism and pathogenesis. For instance, the pathogenesis of some CoV
  76. https://doi.org/10.1016/j.antiviral.2020.104742
  77. Received 3 February 2020; Received in revised form 7 February 2020; Accepted 8 February 2020
  78. ∗ Corresponding author.
  79. E-mail address: etienne.decroly@afmb.univ-mrs.fr (E. Decroly).
  80. Antiviral Research 176 (2020) 104742
  81. Available online 10 February 2020
  82. 0166-3542/ © 2020 Elsevier B.V. All rights reserved.
  83. T
  84. has been previously related to the presence of a furin-like cleavage site
  85. in the S-protein sequence. For example, the insertion of a similar
  86. cleavage site in the infectious bronchitis virus (IBV) S-protein results in
  87. higher pathogenicity, pronounced neural symptoms and neurotropism
  88. in infected chickens (Cheng et al., 2019).
  89. Similarly, in the case of influenza virus, low-pathogenicity forms of
  90. influenza virus contain a single basic residue at the cleavage site, which
  91. is cleaved by trypsin-like proteases and the tissue distribution of the
  92. activating protease(s) typically restricts infections to the respiratory
  93. and/or intestinal organs (Sun et al., 2010). Conversely, the highly pa-
  94. thogenic forms of influenza have a furin-like cleavage site cleaved by
  95. different cellular proteases, including furin, which are expressed in a
  96. wide variety of cell types allowing a widening of the cell tropism of the
  97. virus (Kido et al., 2012). Furthermore the insertion of a multibasic motif
  98. RERRRKKR↓GL at the H5N1 hemagglutinin HA cleavage site was likely
  99. associated with the hyper-virulence of the virus during the Hong Kong
  100. 1997 outbreak (Claas et al., 1998). This motif exhibits the critical Arg at
  101. P1 and basic residues at P2 and P4, as well as P6 and P8 and an ali-
  102. phatic Leu at P2’ positions (Table 1) (Schechter and Berger nomen-
  103. clature (Schechter and Berger, 1968)), typical of a furin-like cleavage
  104. specificity (Braun and Sauter, 2019; Izaguirre, 2019; Seidah and Prat,
  105. 2012).
  106. The coronavirus S-protein is the structural protein responsible for
  107. the crown-like shape of the CoV viral particles, from which the original
  108. name “coronavirus” was coined. The ~1200 aa long S-protein belongs
  109. to class-I viral fusion proteins and contributes to the cell receptor
  110. binding, tissue tropism and pathogenesis (Lu et al., 2015; Millet and
  111. Whittaker, 2014). It contains several conserved domains and motifs
  112. Fig. 1. Characterization of an nCoV-peculiar se-
  113. quence at the S1/S2 cleavage site in the S-protein
  114. sequence, compared SARS-like CoV. (A)
  115. Phylogenetic tree of selected coronaviruses from
  116. genera alphacoronavirus (α-Cov) and betacor-
  117. onavirus (β-CoV), lineages a, b, c and d: 2019-nCoV
  118. (NC_045512.2), CoV-ZXC21 (MG772934), SARS-
  119. CoV (NC_004718.3), SARS-like BM4821
  120. (MG772934), HCoV-OC43 (AY391777), HKU9-1
  121. (EF065513), HCoV-NL63 (KF530114.1), HCoV229E
  122. (KF514433.1), MERS-CoV (NC019843.3), HKU1
  123. (NC_006577.2). The phylogenetic tree was obtained
  124. on the Orf1ab amino acid sequence using the
  125. Maximum Likelihood method by Mega X software.
  126. Red asterisks indicate the presence of a canonical
  127. furin-like cleavage motif at site 1; (B) Alignment of
  128. the coding and amino acid sequences of the S-pro-
  129. tein from CoV-ZXC21 and 2019-nCoV at the S1/S2
  130. site. The 2019-nCoV-specific sequence is in bold.
  131. The sequence of CoV-ZXC21 S-protein at this posi-
  132. tion is representative of the sequence of the other
  133. betacoronaviruses belonging to lineage b, except the
  134. one of 2019-nCoV. (For interpretation of the refer-
  135. ences to colour in this figure legend, the reader is
  136. referred to the Web version of this article.)
  137. Table 1
  138. Comparative sequences of envelope protein cleavage site(s) in coronaviruses (above) and in other RNA viruses (below). Empty boxes: no consensus motif detected..
  139. B. Coutard, et al. Antiviral Research 176 (2020) 104742
  140. 2
  141. (Fig. 2). The trimetric S-protein is processed at the S1/S2 cleavage site
  142. by host cell proteases, during infection. Following cleavage, also known
  143. as priming, the protein is divided into an N-terminal S1-ectodomain
  144. that recognises a cognate cell surface receptor and a C-terminal S2-
  145. membrane-anchored protein involved in viral entry. The SARS-CoV S1-
  146. protein contains a conserved Receptor Binding Domain (RBD), which
  147. recognises the angiotensin-converting enzyme 2 (ACE2) (Li et al.,
  148. 2003). The SARS-CoV binds to both bat and human cells, and the virus
  149. can infect both organisms (Ge et al., 2013; Kuhn et al., 2004). The RBD
  150. surface of S1/ACE2 implicates 14 aa in the S1 of SARS-CoV (Li et al.,
  151. 2005). Among them, 8 residues are strictly conserved in 2019-nCoV,
  152. supporting the hypothesis that ACE2 is also the receptor of the newly
  153. emerged nCoV (Wan et al., 2020). The S2-protein contains the fusion
  154. peptide (FP), a second proteolytic site (S2′), followed by an internal
  155. fusion peptide (IFP) and two heptad-repeat domains preceding the
  156. transmembrane domain (TM) (Fig. 2). Notably, the IFPs of the 2019-
  157. nCoV and SARS-CoV are identical, displaying characteristics of viral
  158. fusion peptides (Fig. 2). While the molecular mechanism involved in
  159. cell entry is not yet fully understood, it is likely that both FP and IFP
  160. participate in the viral entry process (Lu et al., 2015) and thus the S-
  161. protein must likely be cleaved at both S1/S2 and S2′ cleavage sites for
  162. virus entry. The furin-like S2′ cleavage site at KR↓SF with P1 and P2
  163. basic residues and a P2′ hydrophobic Phe (Seidah and Prat, 2012),
  164. downstream of the IFP is identical between the 2019-nCoV and SARS-
  165. CoV (Fig. 2). In the MERS-CoV and HCoV-OC43 the S1/S2 site is re-
  166. placed by RXXR↓SA, with P1 and P4 basic residues, and an Ala (not
  167. aliphatic) at P2′, suggesting a somewhat less favourable cleavage by
  168. furin. However, in the other less pathogenic circulating human CoV, the
  169. S2′ cleavage site only exhibits a monobasic R↓S sequence (Fig. 2) with
  170. no basic residues at either P2 and/or P4 needed to allow furin cleavage,
  171. suggesting a less efficient cleavage or higher restriction at the entry step
  172. depending on the cognate proteases expressed by target cells. Even
  173. though processing at S2′ in 2019-nCoV is expected to be a key event for
  174. the final activation of the S-protein, the protease(s) involved in this
  175. process have not yet been conclusively identified. Based on the 2019-
  176. nCoV S2′ sequence and the above arguments, we propose that one or
  177. more furin-like enzymes would cleave the S2′ site at KR↓SF. In contrast
  178. to the S2′, the first cleavage between the RBD and the FP (S1/S2
  179. cleavage site, Fig. 2) has been extensively studied for many CoVs (Lu
  180. et al., 2015). Interestingly the S1/S2 processing site exhibits different
  181. motifs among coronaviruses (Fig. 2, site 1 & site 2), with many of them
  182. displaying cleavage after a basic residue. It is thus likely that the
  183. priming process is ensured by different host cell proteases depending on
  184. the sequence of the S1/S2 cleavage site. Accordingly the MERS-CoV S-
  185. protein, which contains a RSVR↓SV motif is cleaved during virus
  186. egress, probably by furin (Mille and Whittaker, 2014). Conversely the S-
  187. protein of SARS-CoV remains largely uncleaved after biosynthesis,
  188. possibly due to the lack of a favourable furin-like cleavage site (SLLR-
  189. ST). In this case, it was reported that following receptor binding the S-
  190. protein is cleaved at a conserved sequence AYT↓M (located 10 aa
  191. downstream of SLLR-ST) by target cells’ proteases such as elastase,
  192. cathepsin L or TMPRSS2 (Bosch et al., 2008; Matsuyama et al., 2010,
  193. 2005; Millet and Whittaker, 2015). As the priming event is essential for
  194. virus entry, the efficacy and extent of this activation step by the pro-
  195. teases of the target cells should regulate cellular tropism and viral pa-
  196. thogenesis. In the case of the 2019-nCoV S-protein, the conserved site 2
  197. sequence AYT↓M may still be cleaved, possibly after the preferred furin-
  198. cleavage at the site 1 (Fig. 2).
  199. Since furin is highly expressed in lungs, an enveloped virus that
  200. infects the respiratory tract may successfully exploit this convertase to
  201. activate its surface glycoprotein (Bassi et al., 2017; Mbikay et al.,
  202. 1997). Before the emergence of the 2019-nCoV, this important feature
  203. was not observed in the lineage b of betacoronaviruses. However, it is
  204. shared by other CoV (HCoV-OC43, MERS-CoV, MHV-A59) harbouring
  205. furin-like cleavage sites in their S-protein (Fig. 2; Table 1), which were
  206. shown to be processed by furin experimentally (Le Coupanec et al.,
  207. Fig. 2. Schematic representation of the human 2019-nCoV S-protein with a focus on the putative maturation sites. The domains were previously characterized in
  208. SARS-CoV and MERS-CoV: Signal peptide (SP), N-terminal domain (NTD), receptor-binding domain (RBD), fusion peptide (FP), internal fusion peptide (IFP), heptad
  209. repeat 1/2 (HR1/2), and the transmembrane domain (TM). The SP, S1↓S2 and S2′ cleavage sites are indicated by arrows. The sequence of different CoV S1/S2 and S2′
  210. cleavage sites were aligned using Multalin webserver (http://multalin.toulouse.inra.fr/multalin/) with manual adjustments and the figure prepared using ESPript 3
  211. (http://espript.ibcp.fr/ESPript/ESPript/) presenting the secondary structure of SARS-CoV S-protein at the bottom of the alignment (PDB 5X58) (Yuan et al., 2017).
  212. Insertion of furin like cleavage site is surrounded by a black frame. Red asterisks indicate the presence of a canonical furin-like cleavage motif at the S1/S2 site. (For
  213. interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
  214. B. Coutard, et al. Antiviral Research 176 (2020) 104742
  215. 3
  216. 2015; Mille and Whittaker, 2014). Strikingly, the 2019-nCoV S-protein
  217. sequence contains 12 additional nucleotides upstream of the single
  218. Arg↓ cleavage site 1 (Figs. 1B and 2) leading to a predictively solvent-
  219. exposed PRRAR↓SV sequence, which corresponds to a canonical furin-
  220. like cleavage site (Braun and Sauter, 2019; Izaguirre, 2019; Seidah and
  221. Prat, 2012). This furin-like cleavage site, is supposed to be cleaved
  222. during virus egress (Mille and Whittaker, 2014) for S-protein “priming”
  223. and may provide a gain-of-function to the 2019-nCoV for efficient
  224. spreading in the human population compared to other lineage b beta-
  225. coronaviruses. This possibly illustrates a convergent evolution pathway
  226. between unrelated CoVs. Interestingly, if this site is not processed, the
  227. S-protein is expected to be cleaved at site 2 during virus endocytosis, as
  228. observed for the SARS-CoV.
  229. Obviously much more work is needed to demonstrate experimen-
  230. tally our assertion, but the inhibition of such processing enzyme(s) may
  231. represent a potential antiviral strategy. Indeed, it was recently shown
  232. that in an effort to limit viral infections, host cells that are infected by a
  233. number of viruses provoke an interferon response to inhibit the enzy-
  234. matic activity of furin-like enzymes. It was also demonstrated that HIV
  235. infection induces the expression of either the protease activated re-
  236. ceptor 1 (PAR1) (Kim et al., 2015) or guanylate binding proteins 2 and
  237. 5 (GBP2,5) (Braun and Sauter, 2019) that restrict the trafficking of furin
  238. to the trans Golgi network (PAR1) or to early Golgi compartments
  239. (GBP2,5) where the proprotein convertase remains inactive. Altogether,
  240. these observations suggest that inhibitors of furin-like enzymes may
  241. contribute to inhibiting virus propagation.
  242. A variety of approaches have been proposed to inhibit furin activity
  243. to limit tumour growth, viral and bacterial infection. Thus, a variant of
  244. the naturally occurring serine protease inhibitor α-1 antitrypsin har-
  245. bouring a consensus furin cleavage, called α-1 antitrypsin Portland (α1-
  246. PDX), inhibits furin and prevents the processing of HIV-1 Env
  247. (Anderson et al., 1993). The addition of a chloromethylketone (CMK)
  248. moiety to the C-terminus of a polybasic cleavage motif and a decanoyl
  249. group at the N-terminus to favour cell penetration (dec-RVKR-cmk)
  250. irreversibly blocked the enzymatic activity of furin, PC7, PC5, PACE4
  251. and PC7 (Decroly et al., 1996; Garten et al., 1994). Finally, the eluci-
  252. dation of the crystal structure of furin resulted in the design of a 2,5-
  253. dideoxystreptamine-derived inhibitor, where two molecules of the in-
  254. hibitor form a complex with furin (Dahms et al., 2017). As furin-like
  255. enzymes are involved in a multitude of cellular processes, one im-
  256. portant issue would be to avoid systemic inhibition that may result in
  257. some toxicity. Accordingly, it is likely that such small molecule in-
  258. hibitors, or other more potent orally active ones, possibly delivered by
  259. inhalation and exhibiting a slow dissociation rate from furin to allow
  260. for sustained inhibition, deserve to be rapidly tested to assess their
  261. antiviral effect against 2019-nCoV.
  262. Acknowledgments
  263. This work was supported by a CIHR Foundation grant # 148363
  264. (NGS), a Canada Research Chairs in Precursor Proteolysis (NGS; # 950-
  265. 231335), and by the European Virus Archive Global (BCo; EVA
  266. GLOBAL) funded by the European Union's Horizon 2020 research and
  267. innovation programme under grant agreement No 871029.
  268. Appendix A. Supplementary data
  269. Supplementary data to this article can be found online at https://
  270. doi.org/10.1016/j.antiviral.2020.104742.
  271. References
  272. Anderson, E.D., Thomas, L., Hayflick, J.S., Thomas, G., 1993. Inhibition of HIV-1 gp160-
  273. dependent membrane fusion by a furin-directed α1-antitrypsin variant. J. Biol. Chem.
  274. 268, 24887–24891.
  275. Bassi, D.E., Zhang, J., Renner, C., Klein-Szanto, A.J., 2017. Targeting proprotein
  276. convertases in furin-rich lung cancer cells results in decreased in vitro and in vivo
  277. growth. Mol. Carcinog. 56, 1182–1188. https://doi.org/10.1002/mc.22550.
  278. Bosch, B.J., Bartelink, W., Rottier, P.J.M., 2008. Cathepsin L functionally cleaves the
  279. severe acute respiratory syndrome coronavirus class I fusion protein upstream of
  280. rather than adjacent to the fusion peptide. J. Virol. 82, 8887–8890. https://doi.org/
  281. 10.1128/jvi.00415-08.
  282. Braun, E., Sauter, D., 2019. Furin-mediated protein processing in infectious diseases and
  283. cancer. Clin. Transl. Immunol. 8, e1073. https://doi.org/10.1002/cti2.1073.
  284. Chan, C.M., Woo, P.C., Lau, S.K., Tse, H., Chen, H.L., Li, F., Zheng, B.J., Chen, L., Huang,
  285. J.D., Yuen, K.Y., 2008. Spike protein, S, of human coronavirus HKU1: role in viral life
  286. cycle and application in antibody detection. Exp. Biol. Med. 233, 1527–1536. https://
  287. doi.org/10.3181/0806-RM-197.
  288. Chan, J.F., Kok, K.H., Zhu, Z., Chu, H., To, K.K., Yuan, S., Yuen, K.Y., 2020. Genomic
  289. characterization of the 2019 novel human-pathogenic coronavirus isolated from a
  290. patient with atypical pneumonia after visiting Wuhan. Emerg. Microb. Infect. 9,
  291. 221–236. https://doi.org/10.1080/22221751.2020.1719902.
  292. Cheng, J., Zhao, Y., Xu, G., Zhang, K., Jia, W., Sun, Y., Zhao, J., Xue, J., Hu, Y., Zhang, G.,
  293. 2019. The S2 subunit of QX-type infectious bronchitis coronavirus spike protein is an
  294. essential determinant of neurotropism. Viruses 11. https://doi.org/10.3390/
  295. v11100972.
  296. Claas, E.C., Osterhaus, A.D., Van Beek, R., De Jong, J.C., Rimmelzwaan, G.F., Senne, D.A.,
  297. Krauss, S., Shortridge, K.F., Webster, R.G., 1998. Human influenza A H5N1 virus
  298. related to a highly pathogenic avian influenza virus. Lancet 351, 472–477. https://
  299. doi.org/10.1016/S0140-6736(97)11212-0.
  300. Dahms, S.O., Jiao, G.-S., Than, M.E., 2017. Structural studies revealed active site dis-
  301. tortions of human furin by a small molecule inhibitor. ACS Chem. Biol. 12, 2474.
  302. https://doi.org/10.1021/acschembio.7b00633.
  303. de Wit, E., van Doremalen, N., Falzarano, D., Munster, V.J., 2016. SARS and MERS: recent
  304. insights into emerging coronaviruses. Nat. Publ. Gr. https://doi.org/10.1038/
  305. nrmicro.2016.81.
  306. Decroly, E., Wouters, S., Di Bello, C., Lazure, C., Ruysschaert, J.-M., Seidah, N.G., 1996.
  307. Identification of the Paired Basic Convertases Implicated in HIV gp160 Processing
  308. Based on in Vitro Assays and Expression in CD4+ Cell Lines. J. Biol. Chem. 271,
  309. 30442–30450. https://doi.org/10.1074/jbc.271.48.30442.
  310. Garten, W., Hallenberger, S., Ortmann, D., Schäfer, W., Vey, M., Angliker, H., Shaw, E.,
  311. Klenk, H.D., 1994. Processing of viral glycoproteins by the subtilisin-like en-
  312. doprotease furin and its inhibition by specific peptidylchloroalkylketones. Biochimie
  313. 76, 217–225. https://doi.org/10.1016/0300-9084(94)90149-x.
  314. Ge, X.-Y., Li, J.-L., Yang, X.-L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K., Hu, B.,
  315. Zhang, W., Peng, C., Zhang, Y.-J., Luo, C.-M., Tan, B., Wang, N., Zhu, Y., Crameri, G.,
  316. Zhang, S.-Y., Wang, L.-F., Daszak, P., Shi, Z.-L., 2013. Isolation and characterization
  317. of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535–538.
  318. https://doi.org/10.1038/nature12711.
  319. Izaguirre, G., 2019. The proteolytic regulation of virus cell entry by furin and other
  320. proprotein convertases. Viruses 11. https://doi.org/10.3390/v11090837.
  321. Kido, H., Okumura, Y., Takahashi, E., Pan, H.Y., Wang, S., Yao, D., Yao, M., Chida, J.,
  322. Yano, M., 2012. Role of host cellular proteases in the pathogenesis of influenza and
  323. influenza-induced multiple organ failure. Biochim. Biophys. Acta Protein
  324. Proteonomics. https://doi.org/10.1016/j.bbapap.2011.07.001.
  325. Kim, W., Zekas, E., Lodge, R., Susan-Resiga, D., Marcinkiewicz, E., Essalmani, R., Mihara,
  326. K., Ramachandran, R., Asahchop, E., Gelman, B., Cohen, É.A., Power, C., Hollenberg,
  327. M.D., Seidah, N.G., 2015. Neuroinflammation-Induced interactions between pro-
  328. tease-activated receptor 1 and proprotein convertases in HIV-associated neurocog-
  329. nitive disorder. Mol. Cell Biol. 35, 3684–3700. https://doi.org/10.1128/mcb.
  330. 00764-15.
  331. Kuhn, J.H., Li, W., Choe, H., Farzan, M., 2004. Angiotensin-converting enzyme 2: a
  332. functional receptor for SARS coronavirus. Cell. Mol. Life Sci. 61, 2738–2743. https://
  333. doi.org/10.1007/s00018-004-4242-5.
  334. Le Coupanec, A., Desforges, M., Meessen-Pinard, M., Dubé, M., Day, R., Seidah, N.G.,
  335. Talbot, P.J., 2015. Cleavage of a neuroinvasive human respiratory virus spike gly-
  336. coprotein by proprotein convertases modulates neurovirulence and virus spread
  337. within the central nervous system. PLoS Pathog. 11. https://doi.org/10.1371/
  338. journal.ppat.1005261.
  339. Li, F., Li, W., Farzan, M., Harrison, S.C., 2005. Structure of SARS coronavirus spike re-
  340. ceptor-binding domain complexed with receptor. Science 309, 1864–1868. https://
  341. doi.org/10.1126/science.1116480.
  342. Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., Ren, R., Leung, K.S.M., Lau, E.H.Y.,
  343. Wong, J.Y., Xing, X., Xiang, N., Wu, Y., Li, C., Chen, Q., Li, D., Liu, T., Zhao, J., Li, M.,
  344. Tu, W., Chen, C., Jin, L., Yang, R., Wang, Q., Zhou, S., Wang, R., Liu, H., Luo, Y., Liu,
  345. Y., Shao, G., Li, H., Tao, Z., Yang, Y., Deng, Z., Liu, B., Ma, Z., Zhang, Y., Shi, G., Lam,
  346. T.T.Y., Wu, J.T.K., Gao, G.F., Cowling, B.J., Yang, B., Leung, G.M., Feng, Z., 2020.
  347. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneu-
  348. monia. N. Engl. J. Med. NEJMoa2001316. https://doi.org/10.1056/
  349. NEJMoa2001316.
  350. Li, W., Moore, M.J., Vasllieva, N., Sui, J., Wong, S.K., Berne, M.A., Somasundaran, M.,
  351. Sullivan, J.L., Luzuriaga, K., Greeneugh, T.C., Choe, H., Farzan, M., 2003.
  352. Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus.
  353. Nature 426, 450–454. https://doi.org/10.1038/nature02145.
  354. Lu, G., Wang, Q., Gao, G.F., 2015. Bat-to-human: spike features determining “host jump”
  355. of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. https://doi.
  356. org/10.1016/j.tim.2015.06.003.
  357. Matsuyama, S., Nagata, N., Shirato, K., Kawase, M., Takeda, M., Taguchi, F., 2010.
  358. Efficient activation of the severe acute respiratory syndrome coronavirus spike pro-
  359. tein by the transmembrane protease TMPRSS2. J. Virol. 84, 12658–12664. https://
  360. doi.org/10.1128/JVI.01542-10.
  361. Matsuyama, S., Ujike, M., Morikawa, S., Tashiro, M., Taguchi, F., 2005. Protease-
  362. B. Coutard, et al. Antiviral Research 176 (2020) 104742
  363. 4
  364. mediated enhancement of severe acute respiratory syndrome coronavirus infection.
  365. Proc. Natl. Acad. Sci. U.S.A. 102, 12543–12547. https://doi.org/10.1073/pnas.
  366. 0503203102.
  367. Mbikay, M., Sirois, F., Yao, J., Seidah, N.G., Chrétien, M., 1997. Comparative analysis of
  368. expression of the proprotein convertases furin, PACE4, PC1 and PC2 in human lung
  369. tumours. Br. J. Canc. 75, 1509–1514. https://doi.org/10.1038/bjc.1997.258.
  370. Mille, J.K., Whittaker, G.R., 2014. Host cell entry of Middle East respiratory syndrome
  371. coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl.
  372. Acad. Sci. U.S.A. 111, 15214–15219. https://doi.org/10.1073/pnas.1407087111.
  373. Millet, J.K., Whittaker, G.R., 2015. Host cell proteases: critical determinants of cor-
  374. onavirus tropism and pathogenesis. Virus Res. 202, 120–134. https://doi.org/10.
  375. 1016/j.virusres.2014.11.021.
  376. Millet, J.K., Whittaker, G.R., 2014. Host cell entry of Middle East respiratory syndrome
  377. coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl.
  378. Acad. Sci. U.S.A. 111, 15214–15219. https://doi.org/10.1073/pnas.1407087111.
  379. Moulard, M., Decroly, E., 2000. Maturation of HIV envelope glycoprotein precursors by
  380. cellular endoproteases. Biochim. Biophys. Acta Rev. Biomembr. https://doi.org/10.
  381. 1016/S0304-4157(00)00014-9.
  382. Schechter, I., Berger, A., 1968. On the active site of proteases. 3. Mapping the active site
  383. of papain; specific peptide inhibitors of papain. Biochem. Biophys. Res. Commun. 32,
  384. 898–902. https://doi.org/10.1016/0006-291x(68)90326-4.
  385. Seidah, N.G., Chretien, M., 1999. Proprotein and prohormone convertases: a family of
  386. subtilases generating diverse bioactive polypeptides. Brain Res. 848, 45–62. https://
  387. doi.org/10.1016/S0006-8993(99)01909-5.
  388. Seidah, N.G., Prat, A., 2012. The biology and therapeutic targeting of the proprotein
  389. convertases. Nat. Rev. Drug Discov. https://doi.org/10.1038/nrd3699.
  390. Sun, X., Tse, L.V., Ferguson, A.D., Whittaker, G.R., 2010. Modifications to the he-
  391. magglutinin cleavage site control the virulence of a neurotropic H1N1 influenza
  392. virus. J. Virol. 84, 8683–8690. https://doi.org/10.1128/JVI.00797-10.
  393. Wan, Y., Shang, J., Graham, R., Baric, R.S., Li, F., 2020. Receptor recognition by novel
  394. coronavirus from Wuhan: an analysis based on decade-long structural studies of
  395. SARS. J. Virol. https://doi.org/10.1128/JVI.00127-20.
  396. Yuan, Y., Cao, D., Zhang, Y., Ma, J., Qi, J., Wang, Q., Lu, G., Wu, Y., Yan, J., Shi, Y.,
  397. Zhang, X., Gao, G.F., 2017. Cryo-EM structures of MERS-CoV and SARS-CoV spike
  398. glycoproteins reveal the dynamic receptor binding domains. Nat. Commun. https://
  399. doi.org/10.1038/ncomms15092.
  400. Zhao, S., Ran, J., Musa, S.S., Yang, G., Wang, W., Lou, Y., Gao, D., Yang, L., He, D., Wang,
  401. M.H., 2020. Preliminary estimation of the basic reproduction number of novel cor-
  402. onavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the
  403. early phase of the outbreak. Int J Infect Dis 30053–30059. https://doi.org/10.1016/j.
  404. ijid.2020.01.050.
  405. B. Coutard, et al. Antiviral Research 176 (2020) 104742
  406. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  407. Cases of mild to severe illness, and death from the infection have been reported from Wuhan. This
  408. outbreak has spread rapidly distant nations including France, Australia and USA among others.
  409. The number of cases within and outside China are increasing steeply. Our current understanding
  410. is limited to the virus genome sequences and modest epidemiological and clinical data.
  411. Comprehensive analysis of the available 2019- nCoV sequences may provide important clues that
  412. may help advance our current understanding to manage the ongoing outbreak.
  413. The spike glycoprotein (S) of cornonavirus is cleaved into two subunits (S1 and S2). The S1
  414. subunit helps in receptor binding and the S2 subunit facilitates membrane fusion (Bosch et al.,
  415. 2003; Li, 2016). The spike glycoproteins of coronoviruses are important determinants of tissue
  416. tropism and host range. In addition the spike glycoproteins are critical targets for vaccine
  417. development (Du et al., 2013). For this reason, the spike proteins represent the most extensively
  418. studied among coronaviruses. We therefore sought to investigate the spike glycoprotein of the
  419. 2019-nCoV to understand its evolution, novel features sequence and structural features using
  420. computational tools.
  421. Methodology
  422. Retrieval and alignment of nucleic acid and protein sequences
  423. We retrieved all the available coronavirus sequences (n=55) from NCBI viral genome database
  424. (https://www.ncbi.nlm.nih.gov/) and we used the GISAID (Elbe & Buckland-Merrett,
  425. 2017)[https://www.gisaid.org/] to retrieve all available full-length sequences (n=28) of 2019-
  426. nCoV as on 27 Jan 2020. Multiple sequence alignment of all coronavirus genomes was performed
  427. by using MUSCLE software (Edgar, 2004) based on neighbour joining method. Out of 55
  428. coronavirus genome 32 representative genomes of all category were used for phylogenetic tree
  429. development using MEGAX software (Kumar et al., 2018). The closest relative was found to be
  430. SARS CoV. The glycoprotein region of SARS CoV and 2019-nCoV were aligned and visualized
  431. using Multalin software (Corpet, 1988). The identified amino acid and nucleotide sequence were
  432. aligned with whole viral genome database using BLASTp and BLASTn. The conservation of the
  433. nucleotide and amino acid motifs in 28 clinical variants of 2019-nCoV genome were presented by
  434. performing multiple sequence alignment using MEGAX software. The three dimensional structure
  435. of 2019-nCoV glycoprotein was generated by using SWISS-MODEL online server (Biasini et al.,
  436. 2014) and the structure was marked and visualized by using PyMol (DeLano, 2002).
  437. Results
  438. Uncanny similarity of novel inserts in the 2019-nCoV spike protein to HIV-1 gp120 and
  439. Gag
  440. Our phylogentic tree of full-length coronaviruses suggests that 2019-nCoV is closely related to
  441. SARS CoV [Fig1]. In addition, other recent studies have linked the 2019-nCoV to SARS CoV.
  442. We therefore compared the spike glycoprotein sequences of the 2019-nCoV to that of the SARS
  443. CoV (NCBI Accession number: AY390556.1). On careful examination of the sequence
  444. alignment we found that the 2019- nCoV spike glycoprotein contains 4 insertions [Fig.2]. To
  445. further investigate if these inserts are present in any other corona virus, we performed a multiple
  446. WITHDRAWN
  447. see manuscript DOI for details
  448. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  449. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  450. sequence alignment of the spike glycoprotein amino acid sequences of all available
  451. coronaviruses (n=55) [refer Table S.File1] in NCBI refseq (ncbi.nlm.nih.gov) this includes one
  452. sequence of 2019-nCoV[Fig.S1]. We found that these 4 insertions [inserts 1, 2, 3 and 4] are
  453. unique to 2019-nCoV and are not present in other coronaviruses analyzed. Another group from
  454. China had documented three insertions comparing fewer spike glycoprotein sequences of
  455. coronaviruses . Another group from China had documented three insertions comparing fewer
  456. spike glycoprotein sequences of coronaviruses (Zhou et al., 2020).
  457. Figure 1: Maximum likelihood genealogy show the evolution of 2019- nCoV: The evolutionary history
  458. was inferred by using the Maximum Likelihood method and JTT matrix-based model. The tree
  459. with the highest log likelihood (12458.88) is shown. Initial tree(s) for the heuristic search were
  460. obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise
  461. distances estimated using a JTT model, and then selecting the topology with superior log likelihood
  462. WITHDRAWN
  463. see manuscript DOI for details
  464. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  465. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  466. value. This analysis involved 5 amino acid sequences. There were a total of 1387 positions in the
  467. final dataset. Evolutionary analyses were conducted in MEGA X.
  468. Figure 2: Multiple sequence alignment between spike proteins of 2019-nCoV and SARS. The
  469. sequences of spike proteins of 2019-nCoV (Wuhan-HU-1, Accession NC_045512) and of SARS
  470. CoV (GZ02, Accession AY390556) were aligned using MultiAlin software. The sites of difference
  471. are highlighted in boxes.
  472. We then analyzed all available full-length sequences (n=28) of 2019-nCoV in GISAID (Elbe &
  473. Buckland-Merrett, 2017) as on January 27, 2020 for the presence of these inserts. As most of these
  474. sequences are not annotated, we compared the nucleotide sequences of the spike glycoprotein of
  475. all available 2019-nCoV sequences using BLASTp. Interestingly, all the 4 insertions were
  476. absolutely (100%) conserved in all the available 2019- nCoV sequences analyzed [Fig.S2, Fig.S3].
  477. WITHDRAWN
  478. see manuscript DOI for details
  479. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  480. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  481. We then translated the aligned genome and found that these inserts are present in all Wuhan 2019-
  482. nCoV viruses except the 2019-nCoV virus of Bat as a host [Fig.S4]. Intrigued by the 4 highly
  483. conserved inserts unique to 2019-nCoV we wanted to understand their origin. For this purpose,
  484. we used the 2019-nCoV local alignment with each insert as query against all virus genomes and
  485. considered hits with 100% sequence coverage. Surprisingly, each of the four inserts aligned with
  486. short segments of the Human immunodeficiency Virus-1 (HIV-1) proteins. The amino acid
  487. positions of the inserts in 2019-nCoV and the corresponding residues in HIV-1 gp120 and HIV-1
  488. Gag are shown in Table 1. The first 3 inserts (insert 1,2 and 3) aligned to short segments of amino
  489. acid residues in HIV-1 gp120. The insert 4 aligned to HIV-1 Gag. The insert 1 (6 amino acid
  490. residues) and insert 2 (6 amino acid residues) in the spike glycoprotein of 2019-nCoV are 100%
  491. identical to the residues mapped to HIV-1 gp120. The insert 3 (12 amino acid residues) in 2019-
  492. nCoV maps to HIV-1 gp120 with gaps [see Table 1]. The insert 4 (8 amino acid residues) maps to
  493. HIV-1 Gag with gaps.
  494. Although, the 4 inserts represent discontiguous short stretches of amino acids in spike glycoprotein
  495. of 2019-nCoV, the fact that all three of them share amino acid identity or similarity with HIV-1
  496. gp120 and HIV-1 Gag (among all annotated virus proteins) suggests that this is not a random
  497. fortuitous finding. In other words, one may sporadically expect a fortuitous match for a stretch of
  498. 6-12 contiguous amino acid residues in an unrelated protein. However, it is unlikely that all 4
  499. inserts in the 2019-nCoV spike glycoprotein fortuitously match with 2 key structural proteins of
  500. an unrelated virus (HIV-1).
  501. The amino acid residues of inserts 1, 2 and 3 of 2019-nCoV spike glycoprotein that mapped to
  502. HIV-1 were a part of the V4, V5 and V1 domains respectively in gp120 [Table 1]. Since the 2019-
  503. nCoV inserts mapped to variable regions of HIV-1, they were not ubiquitous in HIV-1 gp120, but
  504. were limited to selected sequences of HIV-1 [ refer S.File1] primarily from Asia and Africa.
  505. The HIV-1 Gag protein enables interaction of virus with negatively charged host surface
  506. (Murakami, 2008) and a high positive charge on the Gag protein is a key feature for the host-virus
  507. interaction. On analyzing the pI values for each of the 4 inserts in 2019-nCoV and the
  508. corresponding stretches of amino acid residues from HIV-1 proteins we found that a) the pI values
  509. were very similar for each pair analyzed b) most of these pI values were 10±2 [Refer Table 1] . Of
  510. note, despite the gaps in inserts 3 and 4 the pI values were comparable. This uniformity in the pI
  511. values for all the 4 inserts merits further investigation.
  512. As none of these 4 inserts are present in any other coronavirus, the genomic region encoding these
  513. inserts represent ideal candidates for designing primers that can distinguish 2019-nCoV from other
  514. coronaviruses.
  515. WITHDRAWN
  516. see manuscript DOI for details
  517. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  518. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  519. Motifs Virus
  520. Glycoprotein Motif Alignment
  521. HIV
  522. protein
  523. and
  524. Variable
  525. region
  526. HIV
  527. Genome
  528. Source
  529. Country/
  530. subtype
  531. Number
  532. of Polar
  533. Residues
  534. Total
  535. Char
  536. ge
  537. pI
  538. Valu
  539. e
  540. Insert
  541. 1
  542. 2019- nCoV (GP)
  543. HIV1(GP120)
  544. 71 76
  545. TNGTKR
  546. TNGTKR
  547. 404 409
  548. gp120-
  549. V4
  550. Thailand
  551. */
  552. CRF01_
  553. AE
  554. 5
  555. 5
  556. 2
  557. 2
  558. 11
  559. 11
  560. Insert
  561. 2
  562. 2019- nCoV (GP)
  563. HIV1(GP120)
  564. 145 150
  565. HKNNKS
  566. HKNNKS
  567. 462 467
  568. gp120-
  569. V5
  570. Kenya*/
  571. G
  572. 6
  573. 6
  574. 2
  575. 2
  576. 10
  577. 10
  578. Insert
  579. 3
  580. 2019- nCoV (GP)
  581. HIV1(GP120)
  582. 245 256
  583. RSYL- - - -TPGDSSSG
  584. RTYLFNETRGNSSSG
  585. 136 150
  586. gp120-
  587. V1 India*/C
  588. 8
  589. 10
  590. 2
  591. 1
  592. 10.84
  593. 8.75
  594. Insert
  595. 4
  596. 2019- nCoV (Poly
  597. P)
  598. HIV1(gag)
  599. 676 684
  600. QTNS-----------------------PRRA
  601. QTNSSILMQRSNFKG PRRA
  602. 366 384
  603. Gag India*/C 6
  604. 12
  605. 2
  606. 4
  607. 12.00
  608. 12.30
  609. Table 1: Aligned sequences of 2019-nCoV and gp120 protein of HIV-1 with their positions
  610. in primary sequence of protein. All the inserts have a high density of positively charged
  611. residues. The deleted fragments in insert 3 and 4 increase the positive charge to surface area
  612. ratio. *please see Supp. Table 1 for accession numbers
  613. The novel inserts are part of the receptor binding site of 2019-nCoV
  614. To get structural insights and to understand the role of these insertions in 2019-nCoV glycoprotein,
  615. we modelled its structure based on available structure of SARS spike glycoprotein (PDB:
  616. 6ACD.1.A). The comparison of the modelled structure reveals that although inserts 1,2 and 3 are
  617. at non-contiguous locations in the protein primary sequence, they fold to constitute the part of
  618. glycoprotein binding site that recognizes the host receptor (Kirchdoerfer et al., 2016) (Figure 4).
  619. The insert 1 corresponds to the NTD (N-terminal domain) and the inserts 2 and 3 correspond to
  620. the CTD (C-terminal domain) of the S1 subunit in the 2019-nCoV spike glycoprotein. The insert
  621. 4 is at the junction of the SD1 (sub domain 1) and SD2 (sub domain 2) of the S1 subunit (Ou et
  622. al., 2017). We speculate, that these insertions provide additional flexibility to the glycoprotein
  623. binding site by forming a hydrophilic loop in the protein structure that may facilitate or enhance
  624. virus-host interactions.
  625. WITHDRAWN
  626. see manuscript DOI for details
  627. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  628. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  629. Figure 3. Modelled homo-trimer spike glycoprotein of 2019-nCoV virus. The inserts from HIV
  630. envelop protein are shown with colored beads, present at the binding site of the protein.
  631. Evolutionary Analysis of 2019-nCoV
  632. It has been speculated that 2019-nCoV is a variant of Coronavirus derived from an animal source
  633. which got transmitted to humans. Considering the change of specificity for host, we decided to
  634. study the sequences of spike glycoprotein (S protein) of the virus. S proteins are surface proteins
  635. that help the virus in host recognition and attachment. Thus, a change in these proteins can be
  636. reflected as a change of host specificity of the virus. To know the alterations in S protein gene of
  637. 2019-nCoV and its consequences in structural re-arrangements we performed in-sillico analysis of
  638. 2019-nCoV with respect to all other viruses. A multiple sequence alignment between the S protein
  639. amino acid sequences of 2019-nCoV, Bat-SARS-Like, SARS-GZ02 and MERS revealed that S
  640. protein has evolved with closest significant diversity from the SARS-GZ02 (Figure 1).
  641. Insertions in Spike protein region of 2019-nCoV
  642. Since the S protein of 2019-nCoV shares closest ancestry with SARS GZ02, the sequence coding
  643. for spike proteins of these two viruses were compared using MultiAlin software. We found four
  644. new insertions in the protein of 2019-nCoV- “GTNGTKR” (IS1), “HKNNKS” (IS2), “GDSSSG”
  645. (IS3) and “QTNSPRRA” (IS4) (Figure 2). To our surprise, these sequence insertions were not only
  646. absent in S protein of SARS but were also not observed in any other member of the Coronaviridae
  647. family (Supplementary figure). This is startling as it is quite unlikely for a virus to have acquired
  648. such unique insertions naturally in a short duration of time.
  649. WITHDRAWN
  650. see manuscript DOI for details
  651. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  652. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  653. Insertions share similarity to HIV
  654. The insertions were observed to be present in all the genomic sequences of 2019-nCoV virus
  655. available from the recent clinical isolates (Supplementary Figure 1). To know the source of these
  656. insertions in 2019-nCoV a local alignment was done with BLASTp using these insertions as query
  657. with all virus genome. Unexpectedly, all the insertions got aligned with Human immunodeficiency
  658. Virus-1 (HIV-1). Further analysis revealed that aligned sequences of HIV-1 with 2019-nCoV were
  659. derived from surface glycoprotein gp120 (amino acid sequence positions: 404-409, 462-467, 136-
  660. 150) and from Gag protein (366-384 amino acid) (Table 1). Gag protein of HIV is involved in host
  661. membrane binding, packaging of the virus and for the formation of virus-like particles. Gp120
  662. plays crucial role in recognizing the host cell by binding to the primary receptor CD4.This binding
  663. induces structural rearrangements in GP120, creating a high affinity binding site for a chemokine
  664. co-receptor like CXCR4 and/or CCR5.
  665. Discussion
  666. The current outbreak of 2019-nCoV warrants a thorough investigation and understanding of its
  667. ability to infect human beings. Keeping in mind that there has been a clear change in the preference
  668. of host from previous coronaviruses to this virus, we studied the change in spike protein between
  669. 2019-nCoV and other viruses. We found four new insertions in the S protein of 2019-nCoV when
  670. compared to its nearest relative, SARS CoV. The genome sequence from the recent 28 clinical
  671. isolates showed that the sequence coding for these insertions are conserved amongst all these
  672. isolates. This indicates that these insertions have been preferably acquired by the 2019-nCoV,
  673. providing it with additional survival and infectivity advantage. Delving deeper we found that these
  674. insertions were similar to HIV-1. Our results highlight an astonishing relation between the gp120
  675. and Gag protein of HIV, with 2019-nCoV spike glycoprotein. These proteins are critical for the
  676. viruses to identify and latch on to their host cells and for viral assembly (Beniac et al., 2006).
  677. Since surface proteins are responsible for host tropism, changes in these proteins imply a change
  678. in host specificity of the virus. According to reports from China, there has been a gain of host
  679. specificity in case 2019-nCoV as the virus was originally known to infect animals and not humans
  680. but after the mutations, it has gained tropism to humans as well.
  681. Moving ahead, 3D modelling of the protein structure displayed that these insertions are present at
  682. the binding site of 2019-nCoV. Due to the presence of gp120 motifs in 2019-nCoV spike
  683. glycoprotein at its binding domain, we propose that these motif insertions could have provided an
  684. enhanced affinity towards host cell receptors. Further, this structural change might have also
  685. increased the range of host cells that 2019-nCoV can infect. To the best of our knowledge, the
  686. function of these motifs is still not clear in HIV and need to be explored. The exchange of genetic
  687. material among the viruses is well known and such critical exchange highlights the risk and the
  688. need to investigate the relations between seemingly unrelated virus families.
  689. Conclusions
  690. Our analysis of the spike glycoprotein of 2019-nCoV revealed several interesting findings: First,
  691. we identified 4 unique inserts in the 2019-nCoV spike glycoprotein that are not present in any
  692. other coronavirus reported till date. To our surprise, all the 4 inserts in the 2019-nCoV mapped to
  693. WITHDRAWN
  694. see manuscript DOI for details
  695. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  696. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  697. short segments of amino acids in the HIV-1 gp120 and Gag among all annotated virus proteins in
  698. the NCBI database. This uncanny similarity of novel inserts in the 2019- nCoV spike protein to
  699. HIV-1 gp120 and Gag is unlikely to be fortuitous. Further, 3D modelling suggests that atleast 3 of
  700. the unique inserts which are non-contiguous in the primary protein sequence of the 2019-nCoV
  701. spike glycoprotein converge to constitute the key components of the receptor binding site. Of note,
  702. all the 4 inserts have pI values of around 10 that may facilitate virus-host interactions. Taken
  703. together, our findings suggest unconventional evolution of 2019-nCoV that warrants further
  704. investigation. Our work highlights novel evolutionary aspects of the 2019-nCoV and has
  705. implications on the pathogenesis and diagnosis of this virus.
  706. References
  707. Beniac, D. R., Andonov, A., Grudeski, E., & Booth, T. F. (2006). Architecture of the SARS coronavirus
  708. prefusion spike. Nature Structural and Molecular Biology, 13(8), 751–752.
  709. https://doi.org/10.1038/nsmb1123
  710. Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T. G.,
  711. Bertoni, M., Bordoli, L., & Schwede, T. (2014). SWISS-MODEL: Modelling protein tertiary and
  712. quaternary structure using evolutionary information. Nucleic Acids Research.
  713. https://doi.org/10.1093/nar/gku340
  714. Bosch, B. J., van der Zee, R., de Haan, C. A. M., & Rottier, P. J. M. (2003). The Coronavirus Spike Protein Is
  715. a Class I Virus Fusion Protein: Structural and Functional Characterization of the Fusion Core
  716. Complex. Journal of Virology, 77(16), 8801–8811. https://doi.org/10.1128/jvi.77.16.8801-
  717. 8811.2003
  718. Chan, J. F.-W., Kok, K.-H., Zhu, Z., Chu, H., To, K. K.-W., Yuan, S., & Yuen, K.-Y. (2020). Genomic
  719. characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with
  720. atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections, 9(1), 221–236.
  721. https://doi.org/10.1080/22221751.2020.1719902
  722. Chan, J. F. W., Lau, S. K. P., To, K. K. W., Cheng, V. C. C., Woo, P. C. Y., & Yuen, K.-Y. (2015). Middle East
  723. Respiratory Syndrome Coronavirus: Another Zoonotic Betacoronavirus Causing SARS-Like Disease.
  724. https://doi.org/10.1128/CMR.00102-14
  725. Chan, J., To, K., Tse, H., Jin, D., microbiology, K. Y.-T. in, & 2013, undefined. (n.d.). Interspecies
  726. transmission and emergence of novel viruses: lessons from bats and birds. Elsevier.
  727. Corpet, F. (1988). Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research.
  728. https://doi.org/10.1093/nar/16.22.10881
  729. DeLano, W. L. (2002). The PyMOL Molecular Graphics System, Version 1.1. Schr{ö}dinger LLC.
  730. https://doi.org/10.1038/hr.2014.17
  731. Du, L., Zhao, G., Kou, Z., Ma, C., Sun, S., Poon, V. K. M., Lu, L., Wang, L., Debnath, A. K., Zheng, B.-J., Zhou,
  732. Y., & Jiang, S. (2013). Identification of a Receptor-Binding Domain in the S Protein of the Novel
  733. Human Coronavirus Middle East Respiratory Syndrome Coronavirus as an Essential Target for
  734. Vaccine Development. Journal of Virology, 87(17), 9939–9942. https://doi.org/10.1128/jvi.01048-
  735. 13
  736. WITHDRAWN
  737. see manuscript DOI for details
  738. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  739. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  740. Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput.
  741. Nucleic Acids Research. https://doi.org/10.1093/nar/gkh340
  742. Elbe, S., & Buckland-Merrett, G. (2017). Data, disease and diplomacy: GISAID’s innovative contribution
  743. to global health. Global Challenges. https://doi.org/10.1002/gch2.1018
  744. Kirchdoerfer, R. N., Cottrell, C. A., Wang, N., Pallesen, J., Yassine, H. M., Turner, H. L., Corbett, K. S.,
  745. Graham, B. S., McLellan, J. S., & Ward, A. B. (2016). Pre-fusion structure of a human coronavirus
  746. spike protein. Nature. https://doi.org/10.1038/nature17200
  747. Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X: Molecular evolutionary genetics
  748. analysis across computing platforms. Molecular Biology and Evolution.
  749. https://doi.org/10.1093/molbev/msy096
  750. Li, F. (2016). Structure, Function, and Evolution of Coronavirus Spike Proteins. Annual Review of
  751. Virology, 3(1), 237–261. https://doi.org/10.1146/annurev-virology-110615-042301
  752. Murakami, T. (2008). Roles of the interactions between Env and Gag proteins in the HIV-1 replication
  753. cycle. Microbiology and Immunology, 52(5), 287–295. https://doi.org/10.1111/j.1348-
  754. 0421.2008.00008.x
  755. Ou, X., Guan, H., Qin, B., Mu, Z., Wojdyla, J. A., Wang, M., Dominguez, S. R., Qian, Z., & Cui, S. (2017).
  756. Crystal structure of the receptor binding domain of the spike glycoprotein of human
  757. betacoronavirus HKU1. Nature Communications. https://doi.org/10.1038/ncomms15216
  758. Snijder, E. J., van der Meer, Y., Zevenhoven-Dobbe, J., Onderwater, J. J. M., van der Meulen, J., Koerten,
  759. H. K., & Mommaas, A. M. (2006). Ultrastructure and origin of membrane vesicles associated with
  760. the severe acute respiratory syndrome coronavirus replication complex. Journal of Virology,
  761. 80(12), 5927–5940. https://doi.org/10.1128/JVI.02501-05
  762. Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., Li, B., Huang, C.-L., Chen,
  763. H.-D., Chen, J., Luo, Y., Guo, H., Jiang, R.-D., Liu, M.-Q., Chen, Y., Shen, X.-R., Wang, X., … Shi, Z.-L.
  764. (2020). Discovery of a novel coronavirus associated with the recent pneumonia outbreak in
  765. humans and its potential bat origin. BioRxiv. https://doi.org/10.1101/2020.01.22.914952
  766. Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., Zhao, X., Huang, B., Shi, W., Lu, R., Niu, P., Zhan, F.,
  767. Ma, X., Wang, D., Xu, W., Wu, G., Gao, G. F., & Tan, W. (2020). A Novel Coronavirus from Patients
  768. with Pneumonia in China, 2019. New England Journal of Medicine, NEJMoa2001017.
  769. https://doi.org/10.1056/NEJMoa2001017
  770. WITHDRAWN
  771. see manuscript DOI for details
  772. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  773. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  774. Fig.S1 Multiple sequence alignment of glycoprotein of coronaviridae family, representing all the
  775. four inserts.
  776. WITHDRAWN
  777. see manuscript DOI for details
  778. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  779. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  780. Fig.S2: All four inserts are present in the aligned 28 Wuhan 2019-nCoV virus genomes obtained
  781. from GISAID. The gap in the Bat-SARS Like CoV in the last row shows that insert 1 and 4 is very
  782. unique to Wuhan 2019-nCoV.
  783. WITHDRAWN
  784. see manuscript DOI for details
  785. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  786. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  787. Fig.S3 Phylogenetic tree of 28 clinical isolates genome of 2019-nCoV including one from bat as a host.
  788. WITHDRAWN
  789. see manuscript DOI for details
  790. author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
  791. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
  792. Supplementary Fig 4. Genome alingment of Coronaviridae family. Highlighted black sequences are the
  793. inserts represented here.

comments powered by Disqus