Contents lists available at ScienceDirect Antiviral Research journal homepage: www.elsevier.com/locate/antiviral The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin- like cleavage site absent in CoV of the same clade B. Coutarda, C. Valleb, X. de Lamballeriea, B. Canardb, N.G. Seidahc, E. Decrolyb,∗ a Unité des Virus Émergents (UVE: Aix-Marseille Univ – IRD 190 – Inserm 1207 – IHU Méditerranée Infection), Marseille, France b Aix Marseille Université, CNRS, AFMB UMR 7257, Marseille, France c Laboratory of Biochemical Neuroendocrinology, Montreal Clinical Research Institute (IRCM, Affiliated to the University of Montreal), 110 Pine Ave West, Montreal, QC, H2W1R7, Canada A R T I C L E I N F O Keywords: 2019-nCoV SARS-CoV Spike protein Maturation protease Furin Antivirals A B S T R A C T In 2019, a new coronavirus (2019-nCoV) infecting Humans has emerged in Wuhan, China. Its genome has been sequenced and the genomic information promptly released. Despite a high similarity with the genome sequence of SARS-CoV and SARS-like CoVs, we identified a peculiar furin-like cleavage site in the Spike protein of the 2019-nCoV, lacking in the other SARS-like CoVs. In this article, we discuss the possible functional consequences of this cleavage site in the viral cycle, pathogenicity and its potential implication in the development of anti- virals. Human coronaviruses (CoV) are enveloped positive-stranded RNA viruses belonging to the order Nidovirales, and are mostly responsible for upper respiratory and digestive tract infections. Among them SARS- CoV and MERS-CoV that spread in 2002 and 2013 respectively, have been associated with severe human illnesses, such as severe pneumonia and bronchiolitis, and even meningitis in more vulnerable populations (de Wit et al., 2016). In December 2019, a new CoV (2019-nCoV) has been detected in the city of Wuhan, and this emerging viral infection was associated with severe human respiratory disease with a ~2–3% fatality rate (Li et al., 2020). The virus that was presumed to have in- itially been transmitted from an animal reservoir to humans possibly via an amplifying host. However human-to-human transmission has been reported, leading to a sustained epidemic spread with > 31,000 con- firmed human infections, including > 640 deaths, reported by the WHO in early February 2020. The estimated effective reproductive number (R) value of ~2.90 (95%: 2.32–3.63) at the beginning of the outbreak raises the possibility of a pandemics (Zhao et al., 2020). This prompted WHO to declare it as a Public Health Emergency of Inter- national Concern. This is especially relevant because so far there are no specific antiviral treatments available or vaccine. Based on its genome sequence, 2019-nCoV belongs to lineage b of Betacoronavirus (Fig. 1A), which also includes the SARS-CoV and bat CoV ZXC21, the latter and CoV ZC45 being the closest to 2019-nCoV. 2019-nCoV shares ~76% amino acid sequence identity in the Spike (S)-protein sequence with SARS-CoV and 80% with CoV ZXC21 (Chan et al., 2020). In this article, we focus on a specific furin-like protease recognition pattern present in the vicinity of one of the maturation sites of the S protein (Fig. 1B) that may have significant functional implications for virus entry. The proprotein convertases (PCs; genes PCSKs) constitute a family of nine serine secretory proteases that regulate various biological pro- cesses in both healthy and disease states (Seidah and Prat, 2012). By proteolysis, PCs are responsible for the activation of a wide variety of precursor proteins, such as growth factors, hormones, receptors and adhesion molecules, as well as cell surface glycoproteins of infectious viruses (Seidah and Chretien, 1999) (Table 1). Seven PCs cleave pre- cursor proteins at specific single or paired basic amino acids (aa) within the motif (R/K)-(2X)n-(R/K)↓, where n = 0, 1, 2, or 3 spacer aa (Seidah and Chretien, 1999). Because of their role in the processing of many critical cell surface proteins PCs, especially furin, have been implicated in viral infections. They have the potential to cleave specifically viral envelope glycoproteins, thereby enhancing viral fusion with host cell membranes (Izaguirre, 2019; Moulard and Decroly, 2000). In the case of human-infecting coronaviruses such as HCoV-OC43 (Le Coupanec et al., 2015), MERS-CoV (Millet and Whittaker, 2014), and HKU1 (Chan et al., 2008) the spike protein has been demonstrated to be cleaved at an S1/S2 cleavage site (Fig. 2) generating the S1 and S2 subunits. The above three viruses display the canonical (R/K)-(2X)n-(R/K)↓ motif (Table 1). Additionally, it has been demonstrated that variation around the viral envelope glycoprotein cleavage site plays a role in cellular tropism and pathogenesis. For instance, the pathogenesis of some CoV https://doi.org/10.1016/j.antiviral.2020.104742 Received 3 February 2020; Received in revised form 7 February 2020; Accepted 8 February 2020 ∗ Corresponding author. E-mail address: etienne.decroly@afmb.univ-mrs.fr (E. Decroly). Antiviral Research 176 (2020) 104742 Available online 10 February 2020 0166-3542/ © 2020 Elsevier B.V. All rights reserved. T has been previously related to the presence of a furin-like cleavage site in the S-protein sequence. For example, the insertion of a similar cleavage site in the infectious bronchitis virus (IBV) S-protein results in higher pathogenicity, pronounced neural symptoms and neurotropism in infected chickens (Cheng et al., 2019). Similarly, in the case of influenza virus, low-pathogenicity forms of influenza virus contain a single basic residue at the cleavage site, which is cleaved by trypsin-like proteases and the tissue distribution of the activating protease(s) typically restricts infections to the respiratory and/or intestinal organs (Sun et al., 2010). Conversely, the highly pa- thogenic forms of influenza have a furin-like cleavage site cleaved by different cellular proteases, including furin, which are expressed in a wide variety of cell types allowing a widening of the cell tropism of the virus (Kido et al., 2012). Furthermore the insertion of a multibasic motif RERRRKKR↓GL at the H5N1 hemagglutinin HA cleavage site was likely associated with the hyper-virulence of the virus during the Hong Kong 1997 outbreak (Claas et al., 1998). This motif exhibits the critical Arg at P1 and basic residues at P2 and P4, as well as P6 and P8 and an ali- phatic Leu at P2’ positions (Table 1) (Schechter and Berger nomen- clature (Schechter and Berger, 1968)), typical of a furin-like cleavage specificity (Braun and Sauter, 2019; Izaguirre, 2019; Seidah and Prat, 2012). The coronavirus S-protein is the structural protein responsible for the crown-like shape of the CoV viral particles, from which the original name “coronavirus” was coined. The ~1200 aa long S-protein belongs to class-I viral fusion proteins and contributes to the cell receptor binding, tissue tropism and pathogenesis (Lu et al., 2015; Millet and Whittaker, 2014). It contains several conserved domains and motifs Fig. 1. Characterization of an nCoV-peculiar se- quence at the S1/S2 cleavage site in the S-protein sequence, compared SARS-like CoV. (A) Phylogenetic tree of selected coronaviruses from genera alphacoronavirus (α-Cov) and betacor- onavirus (β-CoV), lineages a, b, c and d: 2019-nCoV (NC_045512.2), CoV-ZXC21 (MG772934), SARS- CoV (NC_004718.3), SARS-like BM4821 (MG772934), HCoV-OC43 (AY391777), HKU9-1 (EF065513), HCoV-NL63 (KF530114.1), HCoV229E (KF514433.1), MERS-CoV (NC019843.3), HKU1 (NC_006577.2). The phylogenetic tree was obtained on the Orf1ab amino acid sequence using the Maximum Likelihood method by Mega X software. Red asterisks indicate the presence of a canonical furin-like cleavage motif at site 1; (B) Alignment of the coding and amino acid sequences of the S-pro- tein from CoV-ZXC21 and 2019-nCoV at the S1/S2 site. The 2019-nCoV-specific sequence is in bold. The sequence of CoV-ZXC21 S-protein at this posi- tion is representative of the sequence of the other betacoronaviruses belonging to lineage b, except the one of 2019-nCoV. (For interpretation of the refer- ences to colour in this figure legend, the reader is referred to the Web version of this article.) Table 1 Comparative sequences of envelope protein cleavage site(s) in coronaviruses (above) and in other RNA viruses (below). Empty boxes: no consensus motif detected.. B. Coutard, et al. Antiviral Research 176 (2020) 104742 2 (Fig. 2). The trimetric S-protein is processed at the S1/S2 cleavage site by host cell proteases, during infection. Following cleavage, also known as priming, the protein is divided into an N-terminal S1-ectodomain that recognises a cognate cell surface receptor and a C-terminal S2- membrane-anchored protein involved in viral entry. The SARS-CoV S1- protein contains a conserved Receptor Binding Domain (RBD), which recognises the angiotensin-converting enzyme 2 (ACE2) (Li et al., 2003). The SARS-CoV binds to both bat and human cells, and the virus can infect both organisms (Ge et al., 2013; Kuhn et al., 2004). The RBD surface of S1/ACE2 implicates 14 aa in the S1 of SARS-CoV (Li et al., 2005). Among them, 8 residues are strictly conserved in 2019-nCoV, supporting the hypothesis that ACE2 is also the receptor of the newly emerged nCoV (Wan et al., 2020). The S2-protein contains the fusion peptide (FP), a second proteolytic site (S2′), followed by an internal fusion peptide (IFP) and two heptad-repeat domains preceding the transmembrane domain (TM) (Fig. 2). Notably, the IFPs of the 2019- nCoV and SARS-CoV are identical, displaying characteristics of viral fusion peptides (Fig. 2). While the molecular mechanism involved in cell entry is not yet fully understood, it is likely that both FP and IFP participate in the viral entry process (Lu et al., 2015) and thus the S- protein must likely be cleaved at both S1/S2 and S2′ cleavage sites for virus entry. The furin-like S2′ cleavage site at KR↓SF with P1 and P2 basic residues and a P2′ hydrophobic Phe (Seidah and Prat, 2012), downstream of the IFP is identical between the 2019-nCoV and SARS- CoV (Fig. 2). In the MERS-CoV and HCoV-OC43 the S1/S2 site is re- placed by RXXR↓SA, with P1 and P4 basic residues, and an Ala (not aliphatic) at P2′, suggesting a somewhat less favourable cleavage by furin. However, in the other less pathogenic circulating human CoV, the S2′ cleavage site only exhibits a monobasic R↓S sequence (Fig. 2) with no basic residues at either P2 and/or P4 needed to allow furin cleavage, suggesting a less efficient cleavage or higher restriction at the entry step depending on the cognate proteases expressed by target cells. Even though processing at S2′ in 2019-nCoV is expected to be a key event for the final activation of the S-protein, the protease(s) involved in this process have not yet been conclusively identified. Based on the 2019- nCoV S2′ sequence and the above arguments, we propose that one or more furin-like enzymes would cleave the S2′ site at KR↓SF. In contrast to the S2′, the first cleavage between the RBD and the FP (S1/S2 cleavage site, Fig. 2) has been extensively studied for many CoVs (Lu et al., 2015). Interestingly the S1/S2 processing site exhibits different motifs among coronaviruses (Fig. 2, site 1 & site 2), with many of them displaying cleavage after a basic residue. It is thus likely that the priming process is ensured by different host cell proteases depending on the sequence of the S1/S2 cleavage site. Accordingly the MERS-CoV S- protein, which contains a RSVR↓SV motif is cleaved during virus egress, probably by furin (Mille and Whittaker, 2014). Conversely the S- protein of SARS-CoV remains largely uncleaved after biosynthesis, possibly due to the lack of a favourable furin-like cleavage site (SLLR- ST). In this case, it was reported that following receptor binding the S- protein is cleaved at a conserved sequence AYT↓M (located 10 aa downstream of SLLR-ST) by target cells’ proteases such as elastase, cathepsin L or TMPRSS2 (Bosch et al., 2008; Matsuyama et al., 2010, 2005; Millet and Whittaker, 2015). As the priming event is essential for virus entry, the efficacy and extent of this activation step by the pro- teases of the target cells should regulate cellular tropism and viral pa- thogenesis. In the case of the 2019-nCoV S-protein, the conserved site 2 sequence AYT↓M may still be cleaved, possibly after the preferred furin- cleavage at the site 1 (Fig. 2). Since furin is highly expressed in lungs, an enveloped virus that infects the respiratory tract may successfully exploit this convertase to activate its surface glycoprotein (Bassi et al., 2017; Mbikay et al., 1997). Before the emergence of the 2019-nCoV, this important feature was not observed in the lineage b of betacoronaviruses. However, it is shared by other CoV (HCoV-OC43, MERS-CoV, MHV-A59) harbouring furin-like cleavage sites in their S-protein (Fig. 2; Table 1), which were shown to be processed by furin experimentally (Le Coupanec et al., Fig. 2. Schematic representation of the human 2019-nCoV S-protein with a focus on the putative maturation sites. The domains were previously characterized in SARS-CoV and MERS-CoV: Signal peptide (SP), N-terminal domain (NTD), receptor-binding domain (RBD), fusion peptide (FP), internal fusion peptide (IFP), heptad repeat 1/2 (HR1/2), and the transmembrane domain (TM). The SP, S1↓S2 and S2′ cleavage sites are indicated by arrows. The sequence of different CoV S1/S2 and S2′ cleavage sites were aligned using Multalin webserver (http://multalin.toulouse.inra.fr/multalin/) with manual adjustments and the figure prepared using ESPript 3 (http://espript.ibcp.fr/ESPript/ESPript/) presenting the secondary structure of SARS-CoV S-protein at the bottom of the alignment (PDB 5X58) (Yuan et al., 2017). Insertion of furin like cleavage site is surrounded by a black frame. Red asterisks indicate the presence of a canonical furin-like cleavage motif at the S1/S2 site. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.) B. Coutard, et al. Antiviral Research 176 (2020) 104742 3 2015; Mille and Whittaker, 2014). Strikingly, the 2019-nCoV S-protein sequence contains 12 additional nucleotides upstream of the single Arg↓ cleavage site 1 (Figs. 1B and 2) leading to a predictively solvent- exposed PRRAR↓SV sequence, which corresponds to a canonical furin- like cleavage site (Braun and Sauter, 2019; Izaguirre, 2019; Seidah and Prat, 2012). This furin-like cleavage site, is supposed to be cleaved during virus egress (Mille and Whittaker, 2014) for S-protein “priming” and may provide a gain-of-function to the 2019-nCoV for efficient spreading in the human population compared to other lineage b beta- coronaviruses. This possibly illustrates a convergent evolution pathway between unrelated CoVs. Interestingly, if this site is not processed, the S-protein is expected to be cleaved at site 2 during virus endocytosis, as observed for the SARS-CoV. Obviously much more work is needed to demonstrate experimen- tally our assertion, but the inhibition of such processing enzyme(s) may represent a potential antiviral strategy. Indeed, it was recently shown that in an effort to limit viral infections, host cells that are infected by a number of viruses provoke an interferon response to inhibit the enzy- matic activity of furin-like enzymes. It was also demonstrated that HIV infection induces the expression of either the protease activated re- ceptor 1 (PAR1) (Kim et al., 2015) or guanylate binding proteins 2 and 5 (GBP2,5) (Braun and Sauter, 2019) that restrict the trafficking of furin to the trans Golgi network (PAR1) or to early Golgi compartments (GBP2,5) where the proprotein convertase remains inactive. Altogether, these observations suggest that inhibitors of furin-like enzymes may contribute to inhibiting virus propagation. A variety of approaches have been proposed to inhibit furin activity to limit tumour growth, viral and bacterial infection. Thus, a variant of the naturally occurring serine protease inhibitor α-1 antitrypsin har- bouring a consensus furin cleavage, called α-1 antitrypsin Portland (α1- PDX), inhibits furin and prevents the processing of HIV-1 Env (Anderson et al., 1993). The addition of a chloromethylketone (CMK) moiety to the C-terminus of a polybasic cleavage motif and a decanoyl group at the N-terminus to favour cell penetration (dec-RVKR-cmk) irreversibly blocked the enzymatic activity of furin, PC7, PC5, PACE4 and PC7 (Decroly et al., 1996; Garten et al., 1994). Finally, the eluci- dation of the crystal structure of furin resulted in the design of a 2,5- dideoxystreptamine-derived inhibitor, where two molecules of the in- hibitor form a complex with furin (Dahms et al., 2017). As furin-like enzymes are involved in a multitude of cellular processes, one im- portant issue would be to avoid systemic inhibition that may result in some toxicity. Accordingly, it is likely that such small molecule in- hibitors, or other more potent orally active ones, possibly delivered by inhalation and exhibiting a slow dissociation rate from furin to allow for sustained inhibition, deserve to be rapidly tested to assess their antiviral effect against 2019-nCoV. Acknowledgments This work was supported by a CIHR Foundation grant # 148363 (NGS), a Canada Research Chairs in Precursor Proteolysis (NGS; # 950- 231335), and by the European Virus Archive Global (BCo; EVA GLOBAL) funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No 871029. Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.antiviral.2020.104742. References Anderson, E.D., Thomas, L., Hayflick, J.S., Thomas, G., 1993. Inhibition of HIV-1 gp160- dependent membrane fusion by a furin-directed α1-antitrypsin variant. J. Biol. Chem. 268, 24887–24891. Bassi, D.E., Zhang, J., Renner, C., Klein-Szanto, A.J., 2017. Targeting proprotein convertases in furin-rich lung cancer cells results in decreased in vitro and in vivo growth. Mol. Carcinog. 56, 1182–1188. https://doi.org/10.1002/mc.22550. Bosch, B.J., Bartelink, W., Rottier, P.J.M., 2008. Cathepsin L functionally cleaves the severe acute respiratory syndrome coronavirus class I fusion protein upstream of rather than adjacent to the fusion peptide. J. Virol. 82, 8887–8890. https://doi.org/ 10.1128/jvi.00415-08. Braun, E., Sauter, D., 2019. Furin-mediated protein processing in infectious diseases and cancer. Clin. Transl. Immunol. 8, e1073. https://doi.org/10.1002/cti2.1073. Chan, C.M., Woo, P.C., Lau, S.K., Tse, H., Chen, H.L., Li, F., Zheng, B.J., Chen, L., Huang, J.D., Yuen, K.Y., 2008. Spike protein, S, of human coronavirus HKU1: role in viral life cycle and application in antibody detection. Exp. Biol. Med. 233, 1527–1536. https:// doi.org/10.3181/0806-RM-197. Chan, J.F., Kok, K.H., Zhu, Z., Chu, H., To, K.K., Yuan, S., Yuen, K.Y., 2020. Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg. Microb. Infect. 9, 221–236. https://doi.org/10.1080/22221751.2020.1719902. Cheng, J., Zhao, Y., Xu, G., Zhang, K., Jia, W., Sun, Y., Zhao, J., Xue, J., Hu, Y., Zhang, G., 2019. The S2 subunit of QX-type infectious bronchitis coronavirus spike protein is an essential determinant of neurotropism. Viruses 11. https://doi.org/10.3390/ v11100972. Claas, E.C., Osterhaus, A.D., Van Beek, R., De Jong, J.C., Rimmelzwaan, G.F., Senne, D.A., Krauss, S., Shortridge, K.F., Webster, R.G., 1998. Human influenza A H5N1 virus related to a highly pathogenic avian influenza virus. Lancet 351, 472–477. https:// doi.org/10.1016/S0140-6736(97)11212-0. Dahms, S.O., Jiao, G.-S., Than, M.E., 2017. Structural studies revealed active site dis- tortions of human furin by a small molecule inhibitor. ACS Chem. Biol. 12, 2474. https://doi.org/10.1021/acschembio.7b00633. de Wit, E., van Doremalen, N., Falzarano, D., Munster, V.J., 2016. SARS and MERS: recent insights into emerging coronaviruses. Nat. Publ. Gr. https://doi.org/10.1038/ nrmicro.2016.81. Decroly, E., Wouters, S., Di Bello, C., Lazure, C., Ruysschaert, J.-M., Seidah, N.G., 1996. Identification of the Paired Basic Convertases Implicated in HIV gp160 Processing Based on in Vitro Assays and Expression in CD4+ Cell Lines. J. Biol. Chem. 271, 30442–30450. https://doi.org/10.1074/jbc.271.48.30442. Garten, W., Hallenberger, S., Ortmann, D., Schäfer, W., Vey, M., Angliker, H., Shaw, E., Klenk, H.D., 1994. Processing of viral glycoproteins by the subtilisin-like en- doprotease furin and its inhibition by specific peptidylchloroalkylketones. Biochimie 76, 217–225. https://doi.org/10.1016/0300-9084(94)90149-x. Ge, X.-Y., Li, J.-L., Yang, X.-L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K., Hu, B., Zhang, W., Peng, C., Zhang, Y.-J., Luo, C.-M., Tan, B., Wang, N., Zhu, Y., Crameri, G., Zhang, S.-Y., Wang, L.-F., Daszak, P., Shi, Z.-L., 2013. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535–538. https://doi.org/10.1038/nature12711. Izaguirre, G., 2019. The proteolytic regulation of virus cell entry by furin and other proprotein convertases. Viruses 11. https://doi.org/10.3390/v11090837. Kido, H., Okumura, Y., Takahashi, E., Pan, H.Y., Wang, S., Yao, D., Yao, M., Chida, J., Yano, M., 2012. Role of host cellular proteases in the pathogenesis of influenza and influenza-induced multiple organ failure. Biochim. Biophys. Acta Protein Proteonomics. https://doi.org/10.1016/j.bbapap.2011.07.001. Kim, W., Zekas, E., Lodge, R., Susan-Resiga, D., Marcinkiewicz, E., Essalmani, R., Mihara, K., Ramachandran, R., Asahchop, E., Gelman, B., Cohen, É.A., Power, C., Hollenberg, M.D., Seidah, N.G., 2015. Neuroinflammation-Induced interactions between pro- tease-activated receptor 1 and proprotein convertases in HIV-associated neurocog- nitive disorder. Mol. Cell Biol. 35, 3684–3700. https://doi.org/10.1128/mcb. 00764-15. Kuhn, J.H., Li, W., Choe, H., Farzan, M., 2004. Angiotensin-converting enzyme 2: a functional receptor for SARS coronavirus. Cell. Mol. Life Sci. 61, 2738–2743. https:// doi.org/10.1007/s00018-004-4242-5. Le Coupanec, A., Desforges, M., Meessen-Pinard, M., Dubé, M., Day, R., Seidah, N.G., Talbot, P.J., 2015. Cleavage of a neuroinvasive human respiratory virus spike gly- coprotein by proprotein convertases modulates neurovirulence and virus spread within the central nervous system. PLoS Pathog. 11. https://doi.org/10.1371/ journal.ppat.1005261. Li, F., Li, W., Farzan, M., Harrison, S.C., 2005. Structure of SARS coronavirus spike re- ceptor-binding domain complexed with receptor. Science 309, 1864–1868. https:// doi.org/10.1126/science.1116480. Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., Ren, R., Leung, K.S.M., Lau, E.H.Y., Wong, J.Y., Xing, X., Xiang, N., Wu, Y., Li, C., Chen, Q., Li, D., Liu, T., Zhao, J., Li, M., Tu, W., Chen, C., Jin, L., Yang, R., Wang, Q., Zhou, S., Wang, R., Liu, H., Luo, Y., Liu, Y., Shao, G., Li, H., Tao, Z., Yang, Y., Deng, Z., Liu, B., Ma, Z., Zhang, Y., Shi, G., Lam, T.T.Y., Wu, J.T.K., Gao, G.F., Cowling, B.J., Yang, B., Leung, G.M., Feng, Z., 2020. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneu- monia. N. Engl. J. Med. NEJMoa2001316. https://doi.org/10.1056/ NEJMoa2001316. Li, W., Moore, M.J., Vasllieva, N., Sui, J., Wong, S.K., Berne, M.A., Somasundaran, M., Sullivan, J.L., Luzuriaga, K., Greeneugh, T.C., Choe, H., Farzan, M., 2003. Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus. Nature 426, 450–454. https://doi.org/10.1038/nature02145. Lu, G., Wang, Q., Gao, G.F., 2015. Bat-to-human: spike features determining “host jump” of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. https://doi. org/10.1016/j.tim.2015.06.003. Matsuyama, S., Nagata, N., Shirato, K., Kawase, M., Takeda, M., Taguchi, F., 2010. Efficient activation of the severe acute respiratory syndrome coronavirus spike pro- tein by the transmembrane protease TMPRSS2. J. Virol. 84, 12658–12664. https:// doi.org/10.1128/JVI.01542-10. Matsuyama, S., Ujike, M., Morikawa, S., Tashiro, M., Taguchi, F., 2005. Protease- B. Coutard, et al. Antiviral Research 176 (2020) 104742 4 mediated enhancement of severe acute respiratory syndrome coronavirus infection. Proc. Natl. Acad. Sci. U.S.A. 102, 12543–12547. https://doi.org/10.1073/pnas. 0503203102. Mbikay, M., Sirois, F., Yao, J., Seidah, N.G., Chrétien, M., 1997. Comparative analysis of expression of the proprotein convertases furin, PACE4, PC1 and PC2 in human lung tumours. Br. J. Canc. 75, 1509–1514. https://doi.org/10.1038/bjc.1997.258. Mille, J.K., Whittaker, G.R., 2014. Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl. Acad. Sci. U.S.A. 111, 15214–15219. https://doi.org/10.1073/pnas.1407087111. Millet, J.K., Whittaker, G.R., 2015. Host cell proteases: critical determinants of cor- onavirus tropism and pathogenesis. Virus Res. 202, 120–134. https://doi.org/10. 1016/j.virusres.2014.11.021. Millet, J.K., Whittaker, G.R., 2014. Host cell entry of Middle East respiratory syndrome coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl. Acad. Sci. U.S.A. 111, 15214–15219. https://doi.org/10.1073/pnas.1407087111. Moulard, M., Decroly, E., 2000. Maturation of HIV envelope glycoprotein precursors by cellular endoproteases. Biochim. Biophys. Acta Rev. Biomembr. https://doi.org/10. 1016/S0304-4157(00)00014-9. Schechter, I., Berger, A., 1968. On the active site of proteases. 3. Mapping the active site of papain; specific peptide inhibitors of papain. Biochem. Biophys. Res. Commun. 32, 898–902. https://doi.org/10.1016/0006-291x(68)90326-4. Seidah, N.G., Chretien, M., 1999. Proprotein and prohormone convertases: a family of subtilases generating diverse bioactive polypeptides. Brain Res. 848, 45–62. https:// doi.org/10.1016/S0006-8993(99)01909-5. Seidah, N.G., Prat, A., 2012. The biology and therapeutic targeting of the proprotein convertases. Nat. Rev. Drug Discov. https://doi.org/10.1038/nrd3699. Sun, X., Tse, L.V., Ferguson, A.D., Whittaker, G.R., 2010. Modifications to the he- magglutinin cleavage site control the virulence of a neurotropic H1N1 influenza virus. J. Virol. 84, 8683–8690. https://doi.org/10.1128/JVI.00797-10. Wan, Y., Shang, J., Graham, R., Baric, R.S., Li, F., 2020. Receptor recognition by novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS. J. Virol. https://doi.org/10.1128/JVI.00127-20. Yuan, Y., Cao, D., Zhang, Y., Ma, J., Qi, J., Wang, Q., Lu, G., Wu, Y., Yan, J., Shi, Y., Zhang, X., Gao, G.F., 2017. Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal the dynamic receptor binding domains. Nat. Commun. https:// doi.org/10.1038/ncomms15092. Zhao, S., Ran, J., Musa, S.S., Yang, G., Wang, W., Lou, Y., Gao, D., Yang, L., He, D., Wang, M.H., 2020. Preliminary estimation of the basic reproduction number of novel cor- onavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the early phase of the outbreak. Int J Infect Dis 30053–30059. https://doi.org/10.1016/j. ijid.2020.01.050. B. Coutard, et al. Antiviral Research 176 (2020) 104742 bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Cases of mild to severe illness, and death from the infection have been reported from Wuhan. This outbreak has spread rapidly distant nations including France, Australia and USA among others. The number of cases within and outside China are increasing steeply. Our current understanding is limited to the virus genome sequences and modest epidemiological and clinical data. Comprehensive analysis of the available 2019- nCoV sequences may provide important clues that may help advance our current understanding to manage the ongoing outbreak. The spike glycoprotein (S) of cornonavirus is cleaved into two subunits (S1 and S2). The S1 subunit helps in receptor binding and the S2 subunit facilitates membrane fusion (Bosch et al., 2003; Li, 2016). The spike glycoproteins of coronoviruses are important determinants of tissue tropism and host range. In addition the spike glycoproteins are critical targets for vaccine development (Du et al., 2013). For this reason, the spike proteins represent the most extensively studied among coronaviruses. We therefore sought to investigate the spike glycoprotein of the 2019-nCoV to understand its evolution, novel features sequence and structural features using computational tools. Methodology Retrieval and alignment of nucleic acid and protein sequences We retrieved all the available coronavirus sequences (n=55) from NCBI viral genome database (https://www.ncbi.nlm.nih.gov/) and we used the GISAID (Elbe & Buckland-Merrett, 2017)[https://www.gisaid.org/] to retrieve all available full-length sequences (n=28) of 2019- nCoV as on 27 Jan 2020. Multiple sequence alignment of all coronavirus genomes was performed by using MUSCLE software (Edgar, 2004) based on neighbour joining method. Out of 55 coronavirus genome 32 representative genomes of all category were used for phylogenetic tree development using MEGAX software (Kumar et al., 2018). The closest relative was found to be SARS CoV. The glycoprotein region of SARS CoV and 2019-nCoV were aligned and visualized using Multalin software (Corpet, 1988). The identified amino acid and nucleotide sequence were aligned with whole viral genome database using BLASTp and BLASTn. The conservation of the nucleotide and amino acid motifs in 28 clinical variants of 2019-nCoV genome were presented by performing multiple sequence alignment using MEGAX software. The three dimensional structure of 2019-nCoV glycoprotein was generated by using SWISS-MODEL online server (Biasini et al., 2014) and the structure was marked and visualized by using PyMol (DeLano, 2002). Results Uncanny similarity of novel inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag Our phylogentic tree of full-length coronaviruses suggests that 2019-nCoV is closely related to SARS CoV [Fig1]. In addition, other recent studies have linked the 2019-nCoV to SARS CoV. We therefore compared the spike glycoprotein sequences of the 2019-nCoV to that of the SARS CoV (NCBI Accession number: AY390556.1). On careful examination of the sequence alignment we found that the 2019- nCoV spike glycoprotein contains 4 insertions [Fig.2]. To further investigate if these inserts are present in any other corona virus, we performed a multiple WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the sequence alignment of the spike glycoprotein amino acid sequences of all available coronaviruses (n=55) [refer Table S.File1] in NCBI refseq (ncbi.nlm.nih.gov) this includes one sequence of 2019-nCoV[Fig.S1]. We found that these 4 insertions [inserts 1, 2, 3 and 4] are unique to 2019-nCoV and are not present in other coronaviruses analyzed. Another group from China had documented three insertions comparing fewer spike glycoprotein sequences of coronaviruses . Another group from China had documented three insertions comparing fewer spike glycoprotein sequences of coronaviruses (Zhou et al., 2020). Figure 1: Maximum likelihood genealogy show the evolution of 2019- nCoV: The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model. The tree with the highest log likelihood (12458.88) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the value. This analysis involved 5 amino acid sequences. There were a total of 1387 positions in the final dataset. Evolutionary analyses were conducted in MEGA X. Figure 2: Multiple sequence alignment between spike proteins of 2019-nCoV and SARS. The sequences of spike proteins of 2019-nCoV (Wuhan-HU-1, Accession NC_045512) and of SARS CoV (GZ02, Accession AY390556) were aligned using MultiAlin software. The sites of difference are highlighted in boxes. We then analyzed all available full-length sequences (n=28) of 2019-nCoV in GISAID (Elbe & Buckland-Merrett, 2017) as on January 27, 2020 for the presence of these inserts. As most of these sequences are not annotated, we compared the nucleotide sequences of the spike glycoprotein of all available 2019-nCoV sequences using BLASTp. Interestingly, all the 4 insertions were absolutely (100%) conserved in all the available 2019- nCoV sequences analyzed [Fig.S2, Fig.S3]. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the We then translated the aligned genome and found that these inserts are present in all Wuhan 2019- nCoV viruses except the 2019-nCoV virus of Bat as a host [Fig.S4]. Intrigued by the 4 highly conserved inserts unique to 2019-nCoV we wanted to understand their origin. For this purpose, we used the 2019-nCoV local alignment with each insert as query against all virus genomes and considered hits with 100% sequence coverage. Surprisingly, each of the four inserts aligned with short segments of the Human immunodeficiency Virus-1 (HIV-1) proteins. The amino acid positions of the inserts in 2019-nCoV and the corresponding residues in HIV-1 gp120 and HIV-1 Gag are shown in Table 1. The first 3 inserts (insert 1,2 and 3) aligned to short segments of amino acid residues in HIV-1 gp120. The insert 4 aligned to HIV-1 Gag. The insert 1 (6 amino acid residues) and insert 2 (6 amino acid residues) in the spike glycoprotein of 2019-nCoV are 100% identical to the residues mapped to HIV-1 gp120. The insert 3 (12 amino acid residues) in 2019- nCoV maps to HIV-1 gp120 with gaps [see Table 1]. The insert 4 (8 amino acid residues) maps to HIV-1 Gag with gaps. Although, the 4 inserts represent discontiguous short stretches of amino acids in spike glycoprotein of 2019-nCoV, the fact that all three of them share amino acid identity or similarity with HIV-1 gp120 and HIV-1 Gag (among all annotated virus proteins) suggests that this is not a random fortuitous finding. In other words, one may sporadically expect a fortuitous match for a stretch of 6-12 contiguous amino acid residues in an unrelated protein. However, it is unlikely that all 4 inserts in the 2019-nCoV spike glycoprotein fortuitously match with 2 key structural proteins of an unrelated virus (HIV-1). The amino acid residues of inserts 1, 2 and 3 of 2019-nCoV spike glycoprotein that mapped to HIV-1 were a part of the V4, V5 and V1 domains respectively in gp120 [Table 1]. Since the 2019- nCoV inserts mapped to variable regions of HIV-1, they were not ubiquitous in HIV-1 gp120, but were limited to selected sequences of HIV-1 [ refer S.File1] primarily from Asia and Africa. The HIV-1 Gag protein enables interaction of virus with negatively charged host surface (Murakami, 2008) and a high positive charge on the Gag protein is a key feature for the host-virus interaction. On analyzing the pI values for each of the 4 inserts in 2019-nCoV and the corresponding stretches of amino acid residues from HIV-1 proteins we found that a) the pI values were very similar for each pair analyzed b) most of these pI values were 10±2 [Refer Table 1] . Of note, despite the gaps in inserts 3 and 4 the pI values were comparable. This uniformity in the pI values for all the 4 inserts merits further investigation. As none of these 4 inserts are present in any other coronavirus, the genomic region encoding these inserts represent ideal candidates for designing primers that can distinguish 2019-nCoV from other coronaviruses. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Motifs Virus Glycoprotein Motif Alignment HIV protein and Variable region HIV Genome Source Country/ subtype Number of Polar Residues Total Char ge pI Valu e Insert 1 2019- nCoV (GP) HIV1(GP120) 71 76 TNGTKR TNGTKR 404 409 gp120- V4 Thailand */ CRF01_ AE 5 5 2 2 11 11 Insert 2 2019- nCoV (GP) HIV1(GP120) 145 150 HKNNKS HKNNKS 462 467 gp120- V5 Kenya*/ G 6 6 2 2 10 10 Insert 3 2019- nCoV (GP) HIV1(GP120) 245 256 RSYL- - - -TPGDSSSG RTYLFNETRGNSSSG 136 150 gp120- V1 India*/C 8 10 2 1 10.84 8.75 Insert 4 2019- nCoV (Poly P) HIV1(gag) 676 684 QTNS-----------------------PRRA QTNSSILMQRSNFKG PRRA 366 384 Gag India*/C 6 12 2 4 12.00 12.30 Table 1: Aligned sequences of 2019-nCoV and gp120 protein of HIV-1 with their positions in primary sequence of protein. All the inserts have a high density of positively charged residues. The deleted fragments in insert 3 and 4 increase the positive charge to surface area ratio. *please see Supp. Table 1 for accession numbers The novel inserts are part of the receptor binding site of 2019-nCoV To get structural insights and to understand the role of these insertions in 2019-nCoV glycoprotein, we modelled its structure based on available structure of SARS spike glycoprotein (PDB: 6ACD.1.A). The comparison of the modelled structure reveals that although inserts 1,2 and 3 are at non-contiguous locations in the protein primary sequence, they fold to constitute the part of glycoprotein binding site that recognizes the host receptor (Kirchdoerfer et al., 2016) (Figure 4). The insert 1 corresponds to the NTD (N-terminal domain) and the inserts 2 and 3 correspond to the CTD (C-terminal domain) of the S1 subunit in the 2019-nCoV spike glycoprotein. The insert 4 is at the junction of the SD1 (sub domain 1) and SD2 (sub domain 2) of the S1 subunit (Ou et al., 2017). We speculate, that these insertions provide additional flexibility to the glycoprotein binding site by forming a hydrophilic loop in the protein structure that may facilitate or enhance virus-host interactions. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Figure 3. Modelled homo-trimer spike glycoprotein of 2019-nCoV virus. The inserts from HIV envelop protein are shown with colored beads, present at the binding site of the protein. Evolutionary Analysis of 2019-nCoV It has been speculated that 2019-nCoV is a variant of Coronavirus derived from an animal source which got transmitted to humans. Considering the change of specificity for host, we decided to study the sequences of spike glycoprotein (S protein) of the virus. S proteins are surface proteins that help the virus in host recognition and attachment. Thus, a change in these proteins can be reflected as a change of host specificity of the virus. To know the alterations in S protein gene of 2019-nCoV and its consequences in structural re-arrangements we performed in-sillico analysis of 2019-nCoV with respect to all other viruses. A multiple sequence alignment between the S protein amino acid sequences of 2019-nCoV, Bat-SARS-Like, SARS-GZ02 and MERS revealed that S protein has evolved with closest significant diversity from the SARS-GZ02 (Figure 1). Insertions in Spike protein region of 2019-nCoV Since the S protein of 2019-nCoV shares closest ancestry with SARS GZ02, the sequence coding for spike proteins of these two viruses were compared using MultiAlin software. We found four new insertions in the protein of 2019-nCoV- “GTNGTKR” (IS1), “HKNNKS” (IS2), “GDSSSG” (IS3) and “QTNSPRRA” (IS4) (Figure 2). To our surprise, these sequence insertions were not only absent in S protein of SARS but were also not observed in any other member of the Coronaviridae family (Supplementary figure). This is startling as it is quite unlikely for a virus to have acquired such unique insertions naturally in a short duration of time. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Insertions share similarity to HIV The insertions were observed to be present in all the genomic sequences of 2019-nCoV virus available from the recent clinical isolates (Supplementary Figure 1). To know the source of these insertions in 2019-nCoV a local alignment was done with BLASTp using these insertions as query with all virus genome. Unexpectedly, all the insertions got aligned with Human immunodeficiency Virus-1 (HIV-1). Further analysis revealed that aligned sequences of HIV-1 with 2019-nCoV were derived from surface glycoprotein gp120 (amino acid sequence positions: 404-409, 462-467, 136- 150) and from Gag protein (366-384 amino acid) (Table 1). Gag protein of HIV is involved in host membrane binding, packaging of the virus and for the formation of virus-like particles. Gp120 plays crucial role in recognizing the host cell by binding to the primary receptor CD4.This binding induces structural rearrangements in GP120, creating a high affinity binding site for a chemokine co-receptor like CXCR4 and/or CCR5. Discussion The current outbreak of 2019-nCoV warrants a thorough investigation and understanding of its ability to infect human beings. Keeping in mind that there has been a clear change in the preference of host from previous coronaviruses to this virus, we studied the change in spike protein between 2019-nCoV and other viruses. We found four new insertions in the S protein of 2019-nCoV when compared to its nearest relative, SARS CoV. The genome sequence from the recent 28 clinical isolates showed that the sequence coding for these insertions are conserved amongst all these isolates. This indicates that these insertions have been preferably acquired by the 2019-nCoV, providing it with additional survival and infectivity advantage. Delving deeper we found that these insertions were similar to HIV-1. Our results highlight an astonishing relation between the gp120 and Gag protein of HIV, with 2019-nCoV spike glycoprotein. These proteins are critical for the viruses to identify and latch on to their host cells and for viral assembly (Beniac et al., 2006). Since surface proteins are responsible for host tropism, changes in these proteins imply a change in host specificity of the virus. According to reports from China, there has been a gain of host specificity in case 2019-nCoV as the virus was originally known to infect animals and not humans but after the mutations, it has gained tropism to humans as well. Moving ahead, 3D modelling of the protein structure displayed that these insertions are present at the binding site of 2019-nCoV. Due to the presence of gp120 motifs in 2019-nCoV spike glycoprotein at its binding domain, we propose that these motif insertions could have provided an enhanced affinity towards host cell receptors. Further, this structural change might have also increased the range of host cells that 2019-nCoV can infect. To the best of our knowledge, the function of these motifs is still not clear in HIV and need to be explored. The exchange of genetic material among the viruses is well known and such critical exchange highlights the risk and the need to investigate the relations between seemingly unrelated virus families. Conclusions Our analysis of the spike glycoprotein of 2019-nCoV revealed several interesting findings: First, we identified 4 unique inserts in the 2019-nCoV spike glycoprotein that are not present in any other coronavirus reported till date. To our surprise, all the 4 inserts in the 2019-nCoV mapped to WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the short segments of amino acids in the HIV-1 gp120 and Gag among all annotated virus proteins in the NCBI database. This uncanny similarity of novel inserts in the 2019- nCoV spike protein to HIV-1 gp120 and Gag is unlikely to be fortuitous. Further, 3D modelling suggests that atleast 3 of the unique inserts which are non-contiguous in the primary protein sequence of the 2019-nCoV spike glycoprotein converge to constitute the key components of the receptor binding site. Of note, all the 4 inserts have pI values of around 10 that may facilitate virus-host interactions. Taken together, our findings suggest unconventional evolution of 2019-nCoV that warrants further investigation. Our work highlights novel evolutionary aspects of the 2019-nCoV and has implications on the pathogenesis and diagnosis of this virus. References Beniac, D. R., Andonov, A., Grudeski, E., & Booth, T. F. (2006). Architecture of the SARS coronavirus prefusion spike. Nature Structural and Molecular Biology, 13(8), 751–752. https://doi.org/10.1038/nsmb1123 Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T. G., Bertoni, M., Bordoli, L., & Schwede, T. (2014). SWISS-MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research. https://doi.org/10.1093/nar/gku340 Bosch, B. J., van der Zee, R., de Haan, C. A. M., & Rottier, P. J. M. (2003). The Coronavirus Spike Protein Is a Class I Virus Fusion Protein: Structural and Functional Characterization of the Fusion Core Complex. Journal of Virology, 77(16), 8801–8811. https://doi.org/10.1128/jvi.77.16.8801- 8811.2003 Chan, J. F.-W., Kok, K.-H., Zhu, Z., Chu, H., To, K. K.-W., Yuan, S., & Yuen, K.-Y. (2020). Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections, 9(1), 221–236. https://doi.org/10.1080/22221751.2020.1719902 Chan, J. F. W., Lau, S. K. P., To, K. K. W., Cheng, V. C. C., Woo, P. C. Y., & Yuen, K.-Y. (2015). Middle East Respiratory Syndrome Coronavirus: Another Zoonotic Betacoronavirus Causing SARS-Like Disease. https://doi.org/10.1128/CMR.00102-14 Chan, J., To, K., Tse, H., Jin, D., microbiology, K. Y.-T. in, & 2013, undefined. (n.d.). Interspecies transmission and emergence of novel viruses: lessons from bats and birds. Elsevier. Corpet, F. (1988). Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research. https://doi.org/10.1093/nar/16.22.10881 DeLano, W. L. (2002). The PyMOL Molecular Graphics System, Version 1.1. Schr{ö}dinger LLC. https://doi.org/10.1038/hr.2014.17 Du, L., Zhao, G., Kou, Z., Ma, C., Sun, S., Poon, V. K. M., Lu, L., Wang, L., Debnath, A. K., Zheng, B.-J., Zhou, Y., & Jiang, S. (2013). Identification of a Receptor-Binding Domain in the S Protein of the Novel Human Coronavirus Middle East Respiratory Syndrome Coronavirus as an Essential Target for Vaccine Development. Journal of Virology, 87(17), 9939–9942. https://doi.org/10.1128/jvi.01048- 13 WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. https://doi.org/10.1093/nar/gkh340 Elbe, S., & Buckland-Merrett, G. (2017). Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges. https://doi.org/10.1002/gch2.1018 Kirchdoerfer, R. N., Cottrell, C. A., Wang, N., Pallesen, J., Yassine, H. M., Turner, H. L., Corbett, K. S., Graham, B. S., McLellan, J. S., & Ward, A. B. (2016). Pre-fusion structure of a human coronavirus spike protein. Nature. https://doi.org/10.1038/nature17200 Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution. https://doi.org/10.1093/molbev/msy096 Li, F. (2016). Structure, Function, and Evolution of Coronavirus Spike Proteins. Annual Review of Virology, 3(1), 237–261. https://doi.org/10.1146/annurev-virology-110615-042301 Murakami, T. (2008). Roles of the interactions between Env and Gag proteins in the HIV-1 replication cycle. Microbiology and Immunology, 52(5), 287–295. https://doi.org/10.1111/j.1348- 0421.2008.00008.x Ou, X., Guan, H., Qin, B., Mu, Z., Wojdyla, J. A., Wang, M., Dominguez, S. R., Qian, Z., & Cui, S. (2017). Crystal structure of the receptor binding domain of the spike glycoprotein of human betacoronavirus HKU1. Nature Communications. https://doi.org/10.1038/ncomms15216 Snijder, E. J., van der Meer, Y., Zevenhoven-Dobbe, J., Onderwater, J. J. M., van der Meulen, J., Koerten, H. K., & Mommaas, A. M. (2006). Ultrastructure and origin of membrane vesicles associated with the severe acute respiratory syndrome coronavirus replication complex. Journal of Virology, 80(12), 5927–5940. https://doi.org/10.1128/JVI.02501-05 Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., Li, B., Huang, C.-L., Chen, H.-D., Chen, J., Luo, Y., Guo, H., Jiang, R.-D., Liu, M.-Q., Chen, Y., Shen, X.-R., Wang, X., … Shi, Z.-L. (2020). Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. BioRxiv. https://doi.org/10.1101/2020.01.22.914952 Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., Zhao, X., Huang, B., Shi, W., Lu, R., Niu, P., Zhan, F., Ma, X., Wang, D., Xu, W., Wu, G., Gao, G. F., & Tan, W. (2020). A Novel Coronavirus from Patients with Pneumonia in China, 2019. New England Journal of Medicine, NEJMoa2001017. https://doi.org/10.1056/NEJMoa2001017 WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Fig.S1 Multiple sequence alignment of glycoprotein of coronaviridae family, representing all the four inserts. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Fig.S2: All four inserts are present in the aligned 28 Wuhan 2019-nCoV virus genomes obtained from GISAID. The gap in the Bat-SARS Like CoV in the last row shows that insert 1 and 4 is very unique to Wuhan 2019-nCoV. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Fig.S3 Phylogenetic tree of 28 clinical isolates genome of 2019-nCoV including one from bat as a host. WITHDRAWN see manuscript DOI for details author/funder. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the Supplementary Fig 4. Genome alingment of Coronaviridae family. Highlighted black sequences are the inserts represented here.