Contents lists available at ScienceDirect
Antiviral Research
journal homepage: www.elsevier.com/locate/antiviral
The spike glycoprotein of the new coronavirus 2019-nCoV contains a furin-
like cleavage site absent in CoV of the same clade
B. Coutarda, C. Valleb, X. de Lamballeriea, B. Canardb, N.G. Seidahc, E. Decrolyb,∗
a Unité des Virus Émergents (UVE: Aix-Marseille Univ – IRD 190 – Inserm 1207 – IHU Méditerranée Infection), Marseille, France
b Aix Marseille Université, CNRS, AFMB UMR 7257, Marseille, France
c Laboratory of Biochemical Neuroendocrinology, Montreal Clinical Research Institute (IRCM, Affiliated to the University of Montreal), 110 Pine Ave West, Montreal, QC,
H2W1R7, Canada
A R T I C L E I N F O
Keywords:
2019-nCoV
SARS-CoV
Spike protein
Maturation protease
Furin
Antivirals
A B S T R A C T
In 2019, a new coronavirus (2019-nCoV) infecting Humans has emerged in Wuhan, China. Its genome has been
sequenced and the genomic information promptly released. Despite a high similarity with the genome sequence
of SARS-CoV and SARS-like CoVs, we identified a peculiar furin-like cleavage site in the Spike protein of the
2019-nCoV, lacking in the other SARS-like CoVs. In this article, we discuss the possible functional consequences
of this cleavage site in the viral cycle, pathogenicity and its potential implication in the development of anti-
virals.
Human coronaviruses (CoV) are enveloped positive-stranded RNA
viruses belonging to the order Nidovirales, and are mostly responsible
for upper respiratory and digestive tract infections. Among them SARS-
CoV and MERS-CoV that spread in 2002 and 2013 respectively, have
been associated with severe human illnesses, such as severe pneumonia
and bronchiolitis, and even meningitis in more vulnerable populations
(de Wit et al., 2016). In December 2019, a new CoV (2019-nCoV) has
been detected in the city of Wuhan, and this emerging viral infection
was associated with severe human respiratory disease with a ~2–3%
fatality rate (Li et al., 2020). The virus that was presumed to have in-
itially been transmitted from an animal reservoir to humans possibly via
an amplifying host. However human-to-human transmission has been
reported, leading to a sustained epidemic spread with > 31,000 con-
firmed human infections, including > 640 deaths, reported by the
WHO in early February 2020. The estimated effective reproductive
number (R) value of ~2.90 (95%: 2.32–3.63) at the beginning of the
outbreak raises the possibility of a pandemics (Zhao et al., 2020). This
prompted WHO to declare it as a Public Health Emergency of Inter-
national Concern. This is especially relevant because so far there are no
specific antiviral treatments available or vaccine. Based on its genome
sequence, 2019-nCoV belongs to lineage b of Betacoronavirus (Fig. 1A),
which also includes the SARS-CoV and bat CoV ZXC21, the latter and
CoV ZC45 being the closest to 2019-nCoV. 2019-nCoV shares ~76%
amino acid sequence identity in the Spike (S)-protein sequence with
SARS-CoV and 80% with CoV ZXC21 (Chan et al., 2020). In this article,
we focus on a specific furin-like protease recognition pattern present in
the vicinity of one of the maturation sites of the S protein (Fig. 1B) that
may have significant functional implications for virus entry.
The proprotein convertases (PCs; genes PCSKs) constitute a family
of nine serine secretory proteases that regulate various biological pro-
cesses in both healthy and disease states (Seidah and Prat, 2012). By
proteolysis, PCs are responsible for the activation of a wide variety of
precursor proteins, such as growth factors, hormones, receptors and
adhesion molecules, as well as cell surface glycoproteins of infectious
viruses (Seidah and Chretien, 1999) (Table 1). Seven PCs cleave pre-
cursor proteins at specific single or paired basic amino acids (aa) within
the motif (R/K)-(2X)n-(R/K)↓, where n = 0, 1, 2, or 3 spacer aa (Seidah
and Chretien, 1999). Because of their role in the processing of many
critical cell surface proteins PCs, especially furin, have been implicated
in viral infections. They have the potential to cleave specifically viral
envelope glycoproteins, thereby enhancing viral fusion with host cell
membranes (Izaguirre, 2019; Moulard and Decroly, 2000). In the case
of human-infecting coronaviruses such as HCoV-OC43 (Le Coupanec
et al., 2015), MERS-CoV (Millet and Whittaker, 2014), and HKU1 (Chan
et al., 2008) the spike protein has been demonstrated to be cleaved at
an S1/S2 cleavage site (Fig. 2) generating the S1 and S2 subunits. The
above three viruses display the canonical (R/K)-(2X)n-(R/K)↓ motif
(Table 1). Additionally, it has been demonstrated that variation around
the viral envelope glycoprotein cleavage site plays a role in cellular
tropism and pathogenesis. For instance, the pathogenesis of some CoV
https://doi.org/10.1016/j.antiviral.2020.104742
Received 3 February 2020; Received in revised form 7 February 2020; Accepted 8 February 2020
∗ Corresponding author.
E-mail address: etienne.decroly@afmb.univ-mrs.fr (E. Decroly).
Antiviral Research 176 (2020) 104742
Available online 10 February 2020
0166-3542/ © 2020 Elsevier B.V. All rights reserved.
T
has been previously related to the presence of a furin-like cleavage site
in the S-protein sequence. For example, the insertion of a similar
cleavage site in the infectious bronchitis virus (IBV) S-protein results in
higher pathogenicity, pronounced neural symptoms and neurotropism
in infected chickens (Cheng et al., 2019).
Similarly, in the case of influenza virus, low-pathogenicity forms of
influenza virus contain a single basic residue at the cleavage site, which
is cleaved by trypsin-like proteases and the tissue distribution of the
activating protease(s) typically restricts infections to the respiratory
and/or intestinal organs (Sun et al., 2010). Conversely, the highly pa-
thogenic forms of influenza have a furin-like cleavage site cleaved by
different cellular proteases, including furin, which are expressed in a
wide variety of cell types allowing a widening of the cell tropism of the
virus (Kido et al., 2012). Furthermore the insertion of a multibasic motif
RERRRKKR↓GL at the H5N1 hemagglutinin HA cleavage site was likely
associated with the hyper-virulence of the virus during the Hong Kong
1997 outbreak (Claas et al., 1998). This motif exhibits the critical Arg at
P1 and basic residues at P2 and P4, as well as P6 and P8 and an ali-
phatic Leu at P2’ positions (Table 1) (Schechter and Berger nomen-
clature (Schechter and Berger, 1968)), typical of a furin-like cleavage
specificity (Braun and Sauter, 2019; Izaguirre, 2019; Seidah and Prat,
2012).
The coronavirus S-protein is the structural protein responsible for
the crown-like shape of the CoV viral particles, from which the original
name “coronavirus” was coined. The ~1200 aa long S-protein belongs
to class-I viral fusion proteins and contributes to the cell receptor
binding, tissue tropism and pathogenesis (Lu et al., 2015; Millet and
Whittaker, 2014). It contains several conserved domains and motifs
Fig. 1. Characterization of an nCoV-peculiar se-
quence at the S1/S2 cleavage site in the S-protein
sequence, compared SARS-like CoV. (A)
Phylogenetic tree of selected coronaviruses from
genera alphacoronavirus (α-Cov) and betacor-
onavirus (β-CoV), lineages a, b, c and d: 2019-nCoV
(NC_045512.2), CoV-ZXC21 (MG772934), SARS-
CoV (NC_004718.3), SARS-like BM4821
(MG772934), HCoV-OC43 (AY391777), HKU9-1
(EF065513), HCoV-NL63 (KF530114.1), HCoV229E
(KF514433.1), MERS-CoV (NC019843.3), HKU1
(NC_006577.2). The phylogenetic tree was obtained
on the Orf1ab amino acid sequence using the
Maximum Likelihood method by Mega X software.
Red asterisks indicate the presence of a canonical
furin-like cleavage motif at site 1; (B) Alignment of
the coding and amino acid sequences of the S-pro-
tein from CoV-ZXC21 and 2019-nCoV at the S1/S2
site. The 2019-nCoV-specific sequence is in bold.
The sequence of CoV-ZXC21 S-protein at this posi-
tion is representative of the sequence of the other
betacoronaviruses belonging to lineage b, except the
one of 2019-nCoV. (For interpretation of the refer-
ences to colour in this figure legend, the reader is
referred to the Web version of this article.)
Table 1
Comparative sequences of envelope protein cleavage site(s) in coronaviruses (above) and in other RNA viruses (below). Empty boxes: no consensus motif detected..
B. Coutard, et al. Antiviral Research 176 (2020) 104742
2
(Fig. 2). The trimetric S-protein is processed at the S1/S2 cleavage site
by host cell proteases, during infection. Following cleavage, also known
as priming, the protein is divided into an N-terminal S1-ectodomain
that recognises a cognate cell surface receptor and a C-terminal S2-
membrane-anchored protein involved in viral entry. The SARS-CoV S1-
protein contains a conserved Receptor Binding Domain (RBD), which
recognises the angiotensin-converting enzyme 2 (ACE2) (Li et al.,
2003). The SARS-CoV binds to both bat and human cells, and the virus
can infect both organisms (Ge et al., 2013; Kuhn et al., 2004). The RBD
surface of S1/ACE2 implicates 14 aa in the S1 of SARS-CoV (Li et al.,
2005). Among them, 8 residues are strictly conserved in 2019-nCoV,
supporting the hypothesis that ACE2 is also the receptor of the newly
emerged nCoV (Wan et al., 2020). The S2-protein contains the fusion
peptide (FP), a second proteolytic site (S2′), followed by an internal
fusion peptide (IFP) and two heptad-repeat domains preceding the
transmembrane domain (TM) (Fig. 2). Notably, the IFPs of the 2019-
nCoV and SARS-CoV are identical, displaying characteristics of viral
fusion peptides (Fig. 2). While the molecular mechanism involved in
cell entry is not yet fully understood, it is likely that both FP and IFP
participate in the viral entry process (Lu et al., 2015) and thus the S-
protein must likely be cleaved at both S1/S2 and S2′ cleavage sites for
virus entry. The furin-like S2′ cleavage site at KR↓SF with P1 and P2
basic residues and a P2′ hydrophobic Phe (Seidah and Prat, 2012),
downstream of the IFP is identical between the 2019-nCoV and SARS-
CoV (Fig. 2). In the MERS-CoV and HCoV-OC43 the S1/S2 site is re-
placed by RXXR↓SA, with P1 and P4 basic residues, and an Ala (not
aliphatic) at P2′, suggesting a somewhat less favourable cleavage by
furin. However, in the other less pathogenic circulating human CoV, the
S2′ cleavage site only exhibits a monobasic R↓S sequence (Fig. 2) with
no basic residues at either P2 and/or P4 needed to allow furin cleavage,
suggesting a less efficient cleavage or higher restriction at the entry step
depending on the cognate proteases expressed by target cells. Even
though processing at S2′ in 2019-nCoV is expected to be a key event for
the final activation of the S-protein, the protease(s) involved in this
process have not yet been conclusively identified. Based on the 2019-
nCoV S2′ sequence and the above arguments, we propose that one or
more furin-like enzymes would cleave the S2′ site at KR↓SF. In contrast
to the S2′, the first cleavage between the RBD and the FP (S1/S2
cleavage site, Fig. 2) has been extensively studied for many CoVs (Lu
et al., 2015). Interestingly the S1/S2 processing site exhibits different
motifs among coronaviruses (Fig. 2, site 1 & site 2), with many of them
displaying cleavage after a basic residue. It is thus likely that the
priming process is ensured by different host cell proteases depending on
the sequence of the S1/S2 cleavage site. Accordingly the MERS-CoV S-
protein, which contains a RSVR↓SV motif is cleaved during virus
egress, probably by furin (Mille and Whittaker, 2014). Conversely the S-
protein of SARS-CoV remains largely uncleaved after biosynthesis,
possibly due to the lack of a favourable furin-like cleavage site (SLLR-
ST). In this case, it was reported that following receptor binding the S-
protein is cleaved at a conserved sequence AYT↓M (located 10 aa
downstream of SLLR-ST) by target cells’ proteases such as elastase,
cathepsin L or TMPRSS2 (Bosch et al., 2008; Matsuyama et al., 2010,
2005; Millet and Whittaker, 2015). As the priming event is essential for
virus entry, the efficacy and extent of this activation step by the pro-
teases of the target cells should regulate cellular tropism and viral pa-
thogenesis. In the case of the 2019-nCoV S-protein, the conserved site 2
sequence AYT↓M may still be cleaved, possibly after the preferred furin-
cleavage at the site 1 (Fig. 2).
Since furin is highly expressed in lungs, an enveloped virus that
infects the respiratory tract may successfully exploit this convertase to
activate its surface glycoprotein (Bassi et al., 2017; Mbikay et al.,
1997). Before the emergence of the 2019-nCoV, this important feature
was not observed in the lineage b of betacoronaviruses. However, it is
shared by other CoV (HCoV-OC43, MERS-CoV, MHV-A59) harbouring
furin-like cleavage sites in their S-protein (Fig. 2; Table 1), which were
shown to be processed by furin experimentally (Le Coupanec et al.,
Fig. 2. Schematic representation of the human 2019-nCoV S-protein with a focus on the putative maturation sites. The domains were previously characterized in
SARS-CoV and MERS-CoV: Signal peptide (SP), N-terminal domain (NTD), receptor-binding domain (RBD), fusion peptide (FP), internal fusion peptide (IFP), heptad
repeat 1/2 (HR1/2), and the transmembrane domain (TM). The SP, S1↓S2 and S2′ cleavage sites are indicated by arrows. The sequence of different CoV S1/S2 and S2′
cleavage sites were aligned using Multalin webserver (http://multalin.toulouse.inra.fr/multalin/) with manual adjustments and the figure prepared using ESPript 3
(http://espript.ibcp.fr/ESPript/ESPript/) presenting the secondary structure of SARS-CoV S-protein at the bottom of the alignment (PDB 5X58) (Yuan et al., 2017).
Insertion of furin like cleavage site is surrounded by a black frame. Red asterisks indicate the presence of a canonical furin-like cleavage motif at the S1/S2 site. (For
interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
B. Coutard, et al. Antiviral Research 176 (2020) 104742
3
2015; Mille and Whittaker, 2014). Strikingly, the 2019-nCoV S-protein
sequence contains 12 additional nucleotides upstream of the single
Arg↓ cleavage site 1 (Figs. 1B and 2) leading to a predictively solvent-
exposed PRRAR↓SV sequence, which corresponds to a canonical furin-
like cleavage site (Braun and Sauter, 2019; Izaguirre, 2019; Seidah and
Prat, 2012). This furin-like cleavage site, is supposed to be cleaved
during virus egress (Mille and Whittaker, 2014) for S-protein “priming”
and may provide a gain-of-function to the 2019-nCoV for efficient
spreading in the human population compared to other lineage b beta-
coronaviruses. This possibly illustrates a convergent evolution pathway
between unrelated CoVs. Interestingly, if this site is not processed, the
S-protein is expected to be cleaved at site 2 during virus endocytosis, as
observed for the SARS-CoV.
Obviously much more work is needed to demonstrate experimen-
tally our assertion, but the inhibition of such processing enzyme(s) may
represent a potential antiviral strategy. Indeed, it was recently shown
that in an effort to limit viral infections, host cells that are infected by a
number of viruses provoke an interferon response to inhibit the enzy-
matic activity of furin-like enzymes. It was also demonstrated that HIV
infection induces the expression of either the protease activated re-
ceptor 1 (PAR1) (Kim et al., 2015) or guanylate binding proteins 2 and
5 (GBP2,5) (Braun and Sauter, 2019) that restrict the trafficking of furin
to the trans Golgi network (PAR1) or to early Golgi compartments
(GBP2,5) where the proprotein convertase remains inactive. Altogether,
these observations suggest that inhibitors of furin-like enzymes may
contribute to inhibiting virus propagation.
A variety of approaches have been proposed to inhibit furin activity
to limit tumour growth, viral and bacterial infection. Thus, a variant of
the naturally occurring serine protease inhibitor α-1 antitrypsin har-
bouring a consensus furin cleavage, called α-1 antitrypsin Portland (α1-
PDX), inhibits furin and prevents the processing of HIV-1 Env
(Anderson et al., 1993). The addition of a chloromethylketone (CMK)
moiety to the C-terminus of a polybasic cleavage motif and a decanoyl
group at the N-terminus to favour cell penetration (dec-RVKR-cmk)
irreversibly blocked the enzymatic activity of furin, PC7, PC5, PACE4
and PC7 (Decroly et al., 1996; Garten et al., 1994). Finally, the eluci-
dation of the crystal structure of furin resulted in the design of a 2,5-
dideoxystreptamine-derived inhibitor, where two molecules of the in-
hibitor form a complex with furin (Dahms et al., 2017). As furin-like
enzymes are involved in a multitude of cellular processes, one im-
portant issue would be to avoid systemic inhibition that may result in
some toxicity. Accordingly, it is likely that such small molecule in-
hibitors, or other more potent orally active ones, possibly delivered by
inhalation and exhibiting a slow dissociation rate from furin to allow
for sustained inhibition, deserve to be rapidly tested to assess their
antiviral effect against 2019-nCoV.
Acknowledgments
This work was supported by a CIHR Foundation grant # 148363
(NGS), a Canada Research Chairs in Precursor Proteolysis (NGS; # 950-
231335), and by the European Virus Archive Global (BCo; EVA
GLOBAL) funded by the European Union's Horizon 2020 research and
innovation programme under grant agreement No 871029.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://
doi.org/10.1016/j.antiviral.2020.104742.
References
Anderson, E.D., Thomas, L., Hayflick, J.S., Thomas, G., 1993. Inhibition of HIV-1 gp160-
dependent membrane fusion by a furin-directed α1-antitrypsin variant. J. Biol. Chem.
268, 24887–24891.
Bassi, D.E., Zhang, J., Renner, C., Klein-Szanto, A.J., 2017. Targeting proprotein
convertases in furin-rich lung cancer cells results in decreased in vitro and in vivo
growth. Mol. Carcinog. 56, 1182–1188. https://doi.org/10.1002/mc.22550.
Bosch, B.J., Bartelink, W., Rottier, P.J.M., 2008. Cathepsin L functionally cleaves the
severe acute respiratory syndrome coronavirus class I fusion protein upstream of
rather than adjacent to the fusion peptide. J. Virol. 82, 8887–8890. https://doi.org/
10.1128/jvi.00415-08.
Braun, E., Sauter, D., 2019. Furin-mediated protein processing in infectious diseases and
cancer. Clin. Transl. Immunol. 8, e1073. https://doi.org/10.1002/cti2.1073.
Chan, C.M., Woo, P.C., Lau, S.K., Tse, H., Chen, H.L., Li, F., Zheng, B.J., Chen, L., Huang,
J.D., Yuen, K.Y., 2008. Spike protein, S, of human coronavirus HKU1: role in viral life
cycle and application in antibody detection. Exp. Biol. Med. 233, 1527–1536. https://
doi.org/10.3181/0806-RM-197.
Chan, J.F., Kok, K.H., Zhu, Z., Chu, H., To, K.K., Yuan, S., Yuen, K.Y., 2020. Genomic
characterization of the 2019 novel human-pathogenic coronavirus isolated from a
patient with atypical pneumonia after visiting Wuhan. Emerg. Microb. Infect. 9,
221–236. https://doi.org/10.1080/22221751.2020.1719902.
Cheng, J., Zhao, Y., Xu, G., Zhang, K., Jia, W., Sun, Y., Zhao, J., Xue, J., Hu, Y., Zhang, G.,
2019. The S2 subunit of QX-type infectious bronchitis coronavirus spike protein is an
essential determinant of neurotropism. Viruses 11. https://doi.org/10.3390/
v11100972.
Claas, E.C., Osterhaus, A.D., Van Beek, R., De Jong, J.C., Rimmelzwaan, G.F., Senne, D.A.,
Krauss, S., Shortridge, K.F., Webster, R.G., 1998. Human influenza A H5N1 virus
related to a highly pathogenic avian influenza virus. Lancet 351, 472–477. https://
doi.org/10.1016/S0140-6736(97)11212-0.
Dahms, S.O., Jiao, G.-S., Than, M.E., 2017. Structural studies revealed active site dis-
tortions of human furin by a small molecule inhibitor. ACS Chem. Biol. 12, 2474.
https://doi.org/10.1021/acschembio.7b00633.
de Wit, E., van Doremalen, N., Falzarano, D., Munster, V.J., 2016. SARS and MERS: recent
insights into emerging coronaviruses. Nat. Publ. Gr. https://doi.org/10.1038/
nrmicro.2016.81.
Decroly, E., Wouters, S., Di Bello, C., Lazure, C., Ruysschaert, J.-M., Seidah, N.G., 1996.
Identification of the Paired Basic Convertases Implicated in HIV gp160 Processing
Based on in Vitro Assays and Expression in CD4+ Cell Lines. J. Biol. Chem. 271,
30442–30450. https://doi.org/10.1074/jbc.271.48.30442.
Garten, W., Hallenberger, S., Ortmann, D., Schäfer, W., Vey, M., Angliker, H., Shaw, E.,
Klenk, H.D., 1994. Processing of viral glycoproteins by the subtilisin-like en-
doprotease furin and its inhibition by specific peptidylchloroalkylketones. Biochimie
76, 217–225. https://doi.org/10.1016/0300-9084(94)90149-x.
Ge, X.-Y., Li, J.-L., Yang, X.-L., Chmura, A.A., Zhu, G., Epstein, J.H., Mazet, J.K., Hu, B.,
Zhang, W., Peng, C., Zhang, Y.-J., Luo, C.-M., Tan, B., Wang, N., Zhu, Y., Crameri, G.,
Zhang, S.-Y., Wang, L.-F., Daszak, P., Shi, Z.-L., 2013. Isolation and characterization
of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature 503, 535–538.
https://doi.org/10.1038/nature12711.
Izaguirre, G., 2019. The proteolytic regulation of virus cell entry by furin and other
proprotein convertases. Viruses 11. https://doi.org/10.3390/v11090837.
Kido, H., Okumura, Y., Takahashi, E., Pan, H.Y., Wang, S., Yao, D., Yao, M., Chida, J.,
Yano, M., 2012. Role of host cellular proteases in the pathogenesis of influenza and
influenza-induced multiple organ failure. Biochim. Biophys. Acta Protein
Proteonomics. https://doi.org/10.1016/j.bbapap.2011.07.001.
Kim, W., Zekas, E., Lodge, R., Susan-Resiga, D., Marcinkiewicz, E., Essalmani, R., Mihara,
K., Ramachandran, R., Asahchop, E., Gelman, B., Cohen, É.A., Power, C., Hollenberg,
M.D., Seidah, N.G., 2015. Neuroinflammation-Induced interactions between pro-
tease-activated receptor 1 and proprotein convertases in HIV-associated neurocog-
nitive disorder. Mol. Cell Biol. 35, 3684–3700. https://doi.org/10.1128/mcb.
00764-15.
Kuhn, J.H., Li, W., Choe, H., Farzan, M., 2004. Angiotensin-converting enzyme 2: a
functional receptor for SARS coronavirus. Cell. Mol. Life Sci. 61, 2738–2743. https://
doi.org/10.1007/s00018-004-4242-5.
Le Coupanec, A., Desforges, M., Meessen-Pinard, M., Dubé, M., Day, R., Seidah, N.G.,
Talbot, P.J., 2015. Cleavage of a neuroinvasive human respiratory virus spike gly-
coprotein by proprotein convertases modulates neurovirulence and virus spread
within the central nervous system. PLoS Pathog. 11. https://doi.org/10.1371/
journal.ppat.1005261.
Li, F., Li, W., Farzan, M., Harrison, S.C., 2005. Structure of SARS coronavirus spike re-
ceptor-binding domain complexed with receptor. Science 309, 1864–1868. https://
doi.org/10.1126/science.1116480.
Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., Ren, R., Leung, K.S.M., Lau, E.H.Y.,
Wong, J.Y., Xing, X., Xiang, N., Wu, Y., Li, C., Chen, Q., Li, D., Liu, T., Zhao, J., Li, M.,
Tu, W., Chen, C., Jin, L., Yang, R., Wang, Q., Zhou, S., Wang, R., Liu, H., Luo, Y., Liu,
Y., Shao, G., Li, H., Tao, Z., Yang, Y., Deng, Z., Liu, B., Ma, Z., Zhang, Y., Shi, G., Lam,
T.T.Y., Wu, J.T.K., Gao, G.F., Cowling, B.J., Yang, B., Leung, G.M., Feng, Z., 2020.
Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneu-
monia. N. Engl. J. Med. NEJMoa2001316. https://doi.org/10.1056/
NEJMoa2001316.
Li, W., Moore, M.J., Vasllieva, N., Sui, J., Wong, S.K., Berne, M.A., Somasundaran, M.,
Sullivan, J.L., Luzuriaga, K., Greeneugh, T.C., Choe, H., Farzan, M., 2003.
Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus.
Nature 426, 450–454. https://doi.org/10.1038/nature02145.
Lu, G., Wang, Q., Gao, G.F., 2015. Bat-to-human: spike features determining “host jump”
of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. https://doi.
org/10.1016/j.tim.2015.06.003.
Matsuyama, S., Nagata, N., Shirato, K., Kawase, M., Takeda, M., Taguchi, F., 2010.
Efficient activation of the severe acute respiratory syndrome coronavirus spike pro-
tein by the transmembrane protease TMPRSS2. J. Virol. 84, 12658–12664. https://
doi.org/10.1128/JVI.01542-10.
Matsuyama, S., Ujike, M., Morikawa, S., Tashiro, M., Taguchi, F., 2005. Protease-
B. Coutard, et al. Antiviral Research 176 (2020) 104742
4
mediated enhancement of severe acute respiratory syndrome coronavirus infection.
Proc. Natl. Acad. Sci. U.S.A. 102, 12543–12547. https://doi.org/10.1073/pnas.
0503203102.
Mbikay, M., Sirois, F., Yao, J., Seidah, N.G., Chrétien, M., 1997. Comparative analysis of
expression of the proprotein convertases furin, PACE4, PC1 and PC2 in human lung
tumours. Br. J. Canc. 75, 1509–1514. https://doi.org/10.1038/bjc.1997.258.
Mille, J.K., Whittaker, G.R., 2014. Host cell entry of Middle East respiratory syndrome
coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl.
Acad. Sci. U.S.A. 111, 15214–15219. https://doi.org/10.1073/pnas.1407087111.
Millet, J.K., Whittaker, G.R., 2015. Host cell proteases: critical determinants of cor-
onavirus tropism and pathogenesis. Virus Res. 202, 120–134. https://doi.org/10.
1016/j.virusres.2014.11.021.
Millet, J.K., Whittaker, G.R., 2014. Host cell entry of Middle East respiratory syndrome
coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl.
Acad. Sci. U.S.A. 111, 15214–15219. https://doi.org/10.1073/pnas.1407087111.
Moulard, M., Decroly, E., 2000. Maturation of HIV envelope glycoprotein precursors by
cellular endoproteases. Biochim. Biophys. Acta Rev. Biomembr. https://doi.org/10.
1016/S0304-4157(00)00014-9.
Schechter, I., Berger, A., 1968. On the active site of proteases. 3. Mapping the active site
of papain; specific peptide inhibitors of papain. Biochem. Biophys. Res. Commun. 32,
898–902. https://doi.org/10.1016/0006-291x(68)90326-4.
Seidah, N.G., Chretien, M., 1999. Proprotein and prohormone convertases: a family of
subtilases generating diverse bioactive polypeptides. Brain Res. 848, 45–62. https://
doi.org/10.1016/S0006-8993(99)01909-5.
Seidah, N.G., Prat, A., 2012. The biology and therapeutic targeting of the proprotein
convertases. Nat. Rev. Drug Discov. https://doi.org/10.1038/nrd3699.
Sun, X., Tse, L.V., Ferguson, A.D., Whittaker, G.R., 2010. Modifications to the he-
magglutinin cleavage site control the virulence of a neurotropic H1N1 influenza
virus. J. Virol. 84, 8683–8690. https://doi.org/10.1128/JVI.00797-10.
Wan, Y., Shang, J., Graham, R., Baric, R.S., Li, F., 2020. Receptor recognition by novel
coronavirus from Wuhan: an analysis based on decade-long structural studies of
SARS. J. Virol. https://doi.org/10.1128/JVI.00127-20.
Yuan, Y., Cao, D., Zhang, Y., Ma, J., Qi, J., Wang, Q., Lu, G., Wu, Y., Yan, J., Shi, Y.,
Zhang, X., Gao, G.F., 2017. Cryo-EM structures of MERS-CoV and SARS-CoV spike
glycoproteins reveal the dynamic receptor binding domains. Nat. Commun. https://
doi.org/10.1038/ncomms15092.
Zhao, S., Ran, J., Musa, S.S., Yang, G., Wang, W., Lou, Y., Gao, D., Yang, L., He, D., Wang,
M.H., 2020. Preliminary estimation of the basic reproduction number of novel cor-
onavirus (2019-nCoV) in China, from 2019 to 2020: a data-driven analysis in the
early phase of the outbreak. Int J Infect Dis 30053–30059. https://doi.org/10.1016/j.
ijid.2020.01.050.
B. Coutard, et al. Antiviral Research 176 (2020) 104742
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Cases of mild to severe illness, and death from the infection have been reported from Wuhan. This
outbreak has spread rapidly distant nations including France, Australia and USA among others.
The number of cases within and outside China are increasing steeply. Our current understanding
is limited to the virus genome sequences and modest epidemiological and clinical data.
Comprehensive analysis of the available 2019- nCoV sequences may provide important clues that
may help advance our current understanding to manage the ongoing outbreak.
The spike glycoprotein (S) of cornonavirus is cleaved into two subunits (S1 and S2). The S1
subunit helps in receptor binding and the S2 subunit facilitates membrane fusion (Bosch et al.,
2003; Li, 2016). The spike glycoproteins of coronoviruses are important determinants of tissue
tropism and host range. In addition the spike glycoproteins are critical targets for vaccine
development (Du et al., 2013). For this reason, the spike proteins represent the most extensively
studied among coronaviruses. We therefore sought to investigate the spike glycoprotein of the
2019-nCoV to understand its evolution, novel features sequence and structural features using
computational tools.
Methodology
Retrieval and alignment of nucleic acid and protein sequences
We retrieved all the available coronavirus sequences (n=55) from NCBI viral genome database
(https://www.ncbi.nlm.nih.gov/) and we used the GISAID (Elbe & Buckland-Merrett,
2017)[https://www.gisaid.org/] to retrieve all available full-length sequences (n=28) of 2019-
nCoV as on 27 Jan 2020. Multiple sequence alignment of all coronavirus genomes was performed
by using MUSCLE software (Edgar, 2004) based on neighbour joining method. Out of 55
coronavirus genome 32 representative genomes of all category were used for phylogenetic tree
development using MEGAX software (Kumar et al., 2018). The closest relative was found to be
SARS CoV. The glycoprotein region of SARS CoV and 2019-nCoV were aligned and visualized
using Multalin software (Corpet, 1988). The identified amino acid and nucleotide sequence were
aligned with whole viral genome database using BLASTp and BLASTn. The conservation of the
nucleotide and amino acid motifs in 28 clinical variants of 2019-nCoV genome were presented by
performing multiple sequence alignment using MEGAX software. The three dimensional structure
of 2019-nCoV glycoprotein was generated by using SWISS-MODEL online server (Biasini et al.,
2014) and the structure was marked and visualized by using PyMol (DeLano, 2002).
Results
Uncanny similarity of novel inserts in the 2019-nCoV spike protein to HIV-1 gp120 and
Gag
Our phylogentic tree of full-length coronaviruses suggests that 2019-nCoV is closely related to
SARS CoV [Fig1]. In addition, other recent studies have linked the 2019-nCoV to SARS CoV.
We therefore compared the spike glycoprotein sequences of the 2019-nCoV to that of the SARS
CoV (NCBI Accession number: AY390556.1). On careful examination of the sequence
alignment we found that the 2019- nCoV spike glycoprotein contains 4 insertions [Fig.2]. To
further investigate if these inserts are present in any other corona virus, we performed a multiple
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
sequence alignment of the spike glycoprotein amino acid sequences of all available
coronaviruses (n=55) [refer Table S.File1] in NCBI refseq (ncbi.nlm.nih.gov) this includes one
sequence of 2019-nCoV[Fig.S1]. We found that these 4 insertions [inserts 1, 2, 3 and 4] are
unique to 2019-nCoV and are not present in other coronaviruses analyzed. Another group from
China had documented three insertions comparing fewer spike glycoprotein sequences of
coronaviruses . Another group from China had documented three insertions comparing fewer
spike glycoprotein sequences of coronaviruses (Zhou et al., 2020).
Figure 1: Maximum likelihood genealogy show the evolution of 2019- nCoV: The evolutionary history
was inferred by using the Maximum Likelihood method and JTT matrix-based model. The tree
with the highest log likelihood (12458.88) is shown. Initial tree(s) for the heuristic search were
obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise
distances estimated using a JTT model, and then selecting the topology with superior log likelihood
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
value. This analysis involved 5 amino acid sequences. There were a total of 1387 positions in the
final dataset. Evolutionary analyses were conducted in MEGA X.
Figure 2: Multiple sequence alignment between spike proteins of 2019-nCoV and SARS. The
sequences of spike proteins of 2019-nCoV (Wuhan-HU-1, Accession NC_045512) and of SARS
CoV (GZ02, Accession AY390556) were aligned using MultiAlin software. The sites of difference
are highlighted in boxes.
We then analyzed all available full-length sequences (n=28) of 2019-nCoV in GISAID (Elbe &
Buckland-Merrett, 2017) as on January 27, 2020 for the presence of these inserts. As most of these
sequences are not annotated, we compared the nucleotide sequences of the spike glycoprotein of
all available 2019-nCoV sequences using BLASTp. Interestingly, all the 4 insertions were
absolutely (100%) conserved in all the available 2019- nCoV sequences analyzed [Fig.S2, Fig.S3].
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
We then translated the aligned genome and found that these inserts are present in all Wuhan 2019-
nCoV viruses except the 2019-nCoV virus of Bat as a host [Fig.S4]. Intrigued by the 4 highly
conserved inserts unique to 2019-nCoV we wanted to understand their origin. For this purpose,
we used the 2019-nCoV local alignment with each insert as query against all virus genomes and
considered hits with 100% sequence coverage. Surprisingly, each of the four inserts aligned with
short segments of the Human immunodeficiency Virus-1 (HIV-1) proteins. The amino acid
positions of the inserts in 2019-nCoV and the corresponding residues in HIV-1 gp120 and HIV-1
Gag are shown in Table 1. The first 3 inserts (insert 1,2 and 3) aligned to short segments of amino
acid residues in HIV-1 gp120. The insert 4 aligned to HIV-1 Gag. The insert 1 (6 amino acid
residues) and insert 2 (6 amino acid residues) in the spike glycoprotein of 2019-nCoV are 100%
identical to the residues mapped to HIV-1 gp120. The insert 3 (12 amino acid residues) in 2019-
nCoV maps to HIV-1 gp120 with gaps [see Table 1]. The insert 4 (8 amino acid residues) maps to
HIV-1 Gag with gaps.
Although, the 4 inserts represent discontiguous short stretches of amino acids in spike glycoprotein
of 2019-nCoV, the fact that all three of them share amino acid identity or similarity with HIV-1
gp120 and HIV-1 Gag (among all annotated virus proteins) suggests that this is not a random
fortuitous finding. In other words, one may sporadically expect a fortuitous match for a stretch of
6-12 contiguous amino acid residues in an unrelated protein. However, it is unlikely that all 4
inserts in the 2019-nCoV spike glycoprotein fortuitously match with 2 key structural proteins of
an unrelated virus (HIV-1).
The amino acid residues of inserts 1, 2 and 3 of 2019-nCoV spike glycoprotein that mapped to
HIV-1 were a part of the V4, V5 and V1 domains respectively in gp120 [Table 1]. Since the 2019-
nCoV inserts mapped to variable regions of HIV-1, they were not ubiquitous in HIV-1 gp120, but
were limited to selected sequences of HIV-1 [ refer S.File1] primarily from Asia and Africa.
The HIV-1 Gag protein enables interaction of virus with negatively charged host surface
(Murakami, 2008) and a high positive charge on the Gag protein is a key feature for the host-virus
interaction. On analyzing the pI values for each of the 4 inserts in 2019-nCoV and the
corresponding stretches of amino acid residues from HIV-1 proteins we found that a) the pI values
were very similar for each pair analyzed b) most of these pI values were 10±2 [Refer Table 1] . Of
note, despite the gaps in inserts 3 and 4 the pI values were comparable. This uniformity in the pI
values for all the 4 inserts merits further investigation.
As none of these 4 inserts are present in any other coronavirus, the genomic region encoding these
inserts represent ideal candidates for designing primers that can distinguish 2019-nCoV from other
coronaviruses.
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Motifs Virus
Glycoprotein Motif Alignment
HIV
protein
and
Variable
region
HIV
Genome
Source
Country/
subtype
Number
of Polar
Residues
Total
Char
ge
pI
Valu
e
Insert
1
2019- nCoV (GP)
HIV1(GP120)
71 76
TNGTKR
TNGTKR
404 409
gp120-
V4
Thailand
*/
CRF01_
AE
5
5
2
2
11
11
Insert
2
2019- nCoV (GP)
HIV1(GP120)
145 150
HKNNKS
HKNNKS
462 467
gp120-
V5
Kenya*/
G
6
6
2
2
10
10
Insert
3
2019- nCoV (GP)
HIV1(GP120)
245 256
RSYL- - - -TPGDSSSG
RTYLFNETRGNSSSG
136 150
gp120-
V1 India*/C
8
10
2
1
10.84
8.75
Insert
4
2019- nCoV (Poly
P)
HIV1(gag)
676 684
QTNS-----------------------PRRA
QTNSSILMQRSNFKG PRRA
366 384
Gag India*/C 6
12
2
4
12.00
12.30
Table 1: Aligned sequences of 2019-nCoV and gp120 protein of HIV-1 with their positions
in primary sequence of protein. All the inserts have a high density of positively charged
residues. The deleted fragments in insert 3 and 4 increase the positive charge to surface area
ratio. *please see Supp. Table 1 for accession numbers
The novel inserts are part of the receptor binding site of 2019-nCoV
To get structural insights and to understand the role of these insertions in 2019-nCoV glycoprotein,
we modelled its structure based on available structure of SARS spike glycoprotein (PDB:
6ACD.1.A). The comparison of the modelled structure reveals that although inserts 1,2 and 3 are
at non-contiguous locations in the protein primary sequence, they fold to constitute the part of
glycoprotein binding site that recognizes the host receptor (Kirchdoerfer et al., 2016) (Figure 4).
The insert 1 corresponds to the NTD (N-terminal domain) and the inserts 2 and 3 correspond to
the CTD (C-terminal domain) of the S1 subunit in the 2019-nCoV spike glycoprotein. The insert
4 is at the junction of the SD1 (sub domain 1) and SD2 (sub domain 2) of the S1 subunit (Ou et
al., 2017). We speculate, that these insertions provide additional flexibility to the glycoprotein
binding site by forming a hydrophilic loop in the protein structure that may facilitate or enhance
virus-host interactions.
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Figure 3. Modelled homo-trimer spike glycoprotein of 2019-nCoV virus. The inserts from HIV
envelop protein are shown with colored beads, present at the binding site of the protein.
Evolutionary Analysis of 2019-nCoV
It has been speculated that 2019-nCoV is a variant of Coronavirus derived from an animal source
which got transmitted to humans. Considering the change of specificity for host, we decided to
study the sequences of spike glycoprotein (S protein) of the virus. S proteins are surface proteins
that help the virus in host recognition and attachment. Thus, a change in these proteins can be
reflected as a change of host specificity of the virus. To know the alterations in S protein gene of
2019-nCoV and its consequences in structural re-arrangements we performed in-sillico analysis of
2019-nCoV with respect to all other viruses. A multiple sequence alignment between the S protein
amino acid sequences of 2019-nCoV, Bat-SARS-Like, SARS-GZ02 and MERS revealed that S
protein has evolved with closest significant diversity from the SARS-GZ02 (Figure 1).
Insertions in Spike protein region of 2019-nCoV
Since the S protein of 2019-nCoV shares closest ancestry with SARS GZ02, the sequence coding
for spike proteins of these two viruses were compared using MultiAlin software. We found four
new insertions in the protein of 2019-nCoV- “GTNGTKR” (IS1), “HKNNKS” (IS2), “GDSSSG”
(IS3) and “QTNSPRRA” (IS4) (Figure 2). To our surprise, these sequence insertions were not only
absent in S protein of SARS but were also not observed in any other member of the Coronaviridae
family (Supplementary figure). This is startling as it is quite unlikely for a virus to have acquired
such unique insertions naturally in a short duration of time.
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Insertions share similarity to HIV
The insertions were observed to be present in all the genomic sequences of 2019-nCoV virus
available from the recent clinical isolates (Supplementary Figure 1). To know the source of these
insertions in 2019-nCoV a local alignment was done with BLASTp using these insertions as query
with all virus genome. Unexpectedly, all the insertions got aligned with Human immunodeficiency
Virus-1 (HIV-1). Further analysis revealed that aligned sequences of HIV-1 with 2019-nCoV were
derived from surface glycoprotein gp120 (amino acid sequence positions: 404-409, 462-467, 136-
150) and from Gag protein (366-384 amino acid) (Table 1). Gag protein of HIV is involved in host
membrane binding, packaging of the virus and for the formation of virus-like particles. Gp120
plays crucial role in recognizing the host cell by binding to the primary receptor CD4.This binding
induces structural rearrangements in GP120, creating a high affinity binding site for a chemokine
co-receptor like CXCR4 and/or CCR5.
Discussion
The current outbreak of 2019-nCoV warrants a thorough investigation and understanding of its
ability to infect human beings. Keeping in mind that there has been a clear change in the preference
of host from previous coronaviruses to this virus, we studied the change in spike protein between
2019-nCoV and other viruses. We found four new insertions in the S protein of 2019-nCoV when
compared to its nearest relative, SARS CoV. The genome sequence from the recent 28 clinical
isolates showed that the sequence coding for these insertions are conserved amongst all these
isolates. This indicates that these insertions have been preferably acquired by the 2019-nCoV,
providing it with additional survival and infectivity advantage. Delving deeper we found that these
insertions were similar to HIV-1. Our results highlight an astonishing relation between the gp120
and Gag protein of HIV, with 2019-nCoV spike glycoprotein. These proteins are critical for the
viruses to identify and latch on to their host cells and for viral assembly (Beniac et al., 2006).
Since surface proteins are responsible for host tropism, changes in these proteins imply a change
in host specificity of the virus. According to reports from China, there has been a gain of host
specificity in case 2019-nCoV as the virus was originally known to infect animals and not humans
but after the mutations, it has gained tropism to humans as well.
Moving ahead, 3D modelling of the protein structure displayed that these insertions are present at
the binding site of 2019-nCoV. Due to the presence of gp120 motifs in 2019-nCoV spike
glycoprotein at its binding domain, we propose that these motif insertions could have provided an
enhanced affinity towards host cell receptors. Further, this structural change might have also
increased the range of host cells that 2019-nCoV can infect. To the best of our knowledge, the
function of these motifs is still not clear in HIV and need to be explored. The exchange of genetic
material among the viruses is well known and such critical exchange highlights the risk and the
need to investigate the relations between seemingly unrelated virus families.
Conclusions
Our analysis of the spike glycoprotein of 2019-nCoV revealed several interesting findings: First,
we identified 4 unique inserts in the 2019-nCoV spike glycoprotein that are not present in any
other coronavirus reported till date. To our surprise, all the 4 inserts in the 2019-nCoV mapped to
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
short segments of amino acids in the HIV-1 gp120 and Gag among all annotated virus proteins in
the NCBI database. This uncanny similarity of novel inserts in the 2019- nCoV spike protein to
HIV-1 gp120 and Gag is unlikely to be fortuitous. Further, 3D modelling suggests that atleast 3 of
the unique inserts which are non-contiguous in the primary protein sequence of the 2019-nCoV
spike glycoprotein converge to constitute the key components of the receptor binding site. Of note,
all the 4 inserts have pI values of around 10 that may facilitate virus-host interactions. Taken
together, our findings suggest unconventional evolution of 2019-nCoV that warrants further
investigation. Our work highlights novel evolutionary aspects of the 2019-nCoV and has
implications on the pathogenesis and diagnosis of this virus.
References
Beniac, D. R., Andonov, A., Grudeski, E., & Booth, T. F. (2006). Architecture of the SARS coronavirus
prefusion spike. Nature Structural and Molecular Biology, 13(8), 751–752.
https://doi.org/10.1038/nsmb1123
Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., Cassarino, T. G.,
Bertoni, M., Bordoli, L., & Schwede, T. (2014). SWISS-MODEL: Modelling protein tertiary and
quaternary structure using evolutionary information. Nucleic Acids Research.
https://doi.org/10.1093/nar/gku340
Bosch, B. J., van der Zee, R., de Haan, C. A. M., & Rottier, P. J. M. (2003). The Coronavirus Spike Protein Is
a Class I Virus Fusion Protein: Structural and Functional Characterization of the Fusion Core
Complex. Journal of Virology, 77(16), 8801–8811. https://doi.org/10.1128/jvi.77.16.8801-
8811.2003
Chan, J. F.-W., Kok, K.-H., Zhu, Z., Chu, H., To, K. K.-W., Yuan, S., & Yuen, K.-Y. (2020). Genomic
characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with
atypical pneumonia after visiting Wuhan. Emerging Microbes & Infections, 9(1), 221–236.
https://doi.org/10.1080/22221751.2020.1719902
Chan, J. F. W., Lau, S. K. P., To, K. K. W., Cheng, V. C. C., Woo, P. C. Y., & Yuen, K.-Y. (2015). Middle East
Respiratory Syndrome Coronavirus: Another Zoonotic Betacoronavirus Causing SARS-Like Disease.
https://doi.org/10.1128/CMR.00102-14
Chan, J., To, K., Tse, H., Jin, D., microbiology, K. Y.-T. in, & 2013, undefined. (n.d.). Interspecies
transmission and emergence of novel viruses: lessons from bats and birds. Elsevier.
Corpet, F. (1988). Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research.
https://doi.org/10.1093/nar/16.22.10881
DeLano, W. L. (2002). The PyMOL Molecular Graphics System, Version 1.1. Schr{ö}dinger LLC.
https://doi.org/10.1038/hr.2014.17
Du, L., Zhao, G., Kou, Z., Ma, C., Sun, S., Poon, V. K. M., Lu, L., Wang, L., Debnath, A. K., Zheng, B.-J., Zhou,
Y., & Jiang, S. (2013). Identification of a Receptor-Binding Domain in the S Protein of the Novel
Human Coronavirus Middle East Respiratory Syndrome Coronavirus as an Essential Target for
Vaccine Development. Journal of Virology, 87(17), 9939–9942. https://doi.org/10.1128/jvi.01048-
13
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Edgar, R. C. (2004). MUSCLE: Multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Research. https://doi.org/10.1093/nar/gkh340
Elbe, S., & Buckland-Merrett, G. (2017). Data, disease and diplomacy: GISAID’s innovative contribution
to global health. Global Challenges. https://doi.org/10.1002/gch2.1018
Kirchdoerfer, R. N., Cottrell, C. A., Wang, N., Pallesen, J., Yassine, H. M., Turner, H. L., Corbett, K. S.,
Graham, B. S., McLellan, J. S., & Ward, A. B. (2016). Pre-fusion structure of a human coronavirus
spike protein. Nature. https://doi.org/10.1038/nature17200
Kumar, S., Stecher, G., Li, M., Knyaz, C., & Tamura, K. (2018). MEGA X: Molecular evolutionary genetics
analysis across computing platforms. Molecular Biology and Evolution.
https://doi.org/10.1093/molbev/msy096
Li, F. (2016). Structure, Function, and Evolution of Coronavirus Spike Proteins. Annual Review of
Virology, 3(1), 237–261. https://doi.org/10.1146/annurev-virology-110615-042301
Murakami, T. (2008). Roles of the interactions between Env and Gag proteins in the HIV-1 replication
cycle. Microbiology and Immunology, 52(5), 287–295. https://doi.org/10.1111/j.1348-
0421.2008.00008.x
Ou, X., Guan, H., Qin, B., Mu, Z., Wojdyla, J. A., Wang, M., Dominguez, S. R., Qian, Z., & Cui, S. (2017).
Crystal structure of the receptor binding domain of the spike glycoprotein of human
betacoronavirus HKU1. Nature Communications. https://doi.org/10.1038/ncomms15216
Snijder, E. J., van der Meer, Y., Zevenhoven-Dobbe, J., Onderwater, J. J. M., van der Meulen, J., Koerten,
H. K., & Mommaas, A. M. (2006). Ultrastructure and origin of membrane vesicles associated with
the severe acute respiratory syndrome coronavirus replication complex. Journal of Virology,
80(12), 5927–5940. https://doi.org/10.1128/JVI.02501-05
Zhou, P., Yang, X.-L., Wang, X.-G., Hu, B., Zhang, L., Zhang, W., Si, H.-R., Zhu, Y., Li, B., Huang, C.-L., Chen,
H.-D., Chen, J., Luo, Y., Guo, H., Jiang, R.-D., Liu, M.-Q., Chen, Y., Shen, X.-R., Wang, X., … Shi, Z.-L.
(2020). Discovery of a novel coronavirus associated with the recent pneumonia outbreak in
humans and its potential bat origin. BioRxiv. https://doi.org/10.1101/2020.01.22.914952
Zhu, N., Zhang, D., Wang, W., Li, X., Yang, B., Song, J., Zhao, X., Huang, B., Shi, W., Lu, R., Niu, P., Zhan, F.,
Ma, X., Wang, D., Xu, W., Wu, G., Gao, G. F., & Tan, W. (2020). A Novel Coronavirus from Patients
with Pneumonia in China, 2019. New England Journal of Medicine, NEJMoa2001017.
https://doi.org/10.1056/NEJMoa2001017
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Fig.S1 Multiple sequence alignment of glycoprotein of coronaviridae family, representing all the
four inserts.
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Fig.S2: All four inserts are present in the aligned 28 Wuhan 2019-nCoV virus genomes obtained
from GISAID. The gap in the Bat-SARS Like CoV in the last row shows that insert 1 and 4 is very
unique to Wuhan 2019-nCoV.
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Fig.S3 Phylogenetic tree of 28 clinical isolates genome of 2019-nCoV including one from bat as a host.
WITHDRAWN
see manuscript DOI for details
author/funder. It is made available under a CC-BY-NC-ND 4.0 International license.
bioRxiv preprint doi: https://doi.org/10.1101/2020.01.30.927871. The copyright holder for this preprint (which was not peer-reviewed) is the
Supplementary Fig 4. Genome alingment of Coronaviridae family. Highlighted black sequences are the
inserts represented here.