cov-lineages/pangolin - GitHub A phylogenetic treeusing RAxML v8.2.8 (ref. Figure 1 (top) shows the distribution of all identified breakpoints (using 3SEQs exhaustive triplet search) by the number of candidate recombinant sequences supporting them. Complete genome sequence data were downloaded from GenBank and ViPR; accession numbers of all 68sequences are available in Supplementary Table 4. The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. Avian influenza a virus (H7N7) epidemic in The Netherlands in 2003: course of the epidemic and effectiveness of control measures. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. SARS-CoV-2 itself is not a recombinant of any sarbecoviruses detected to date, and its receptor-binding motif, important for specificity to human ACE2 receptors, appears to be an ancestral trait shared with bat viruses and not one acquired recently via recombination. 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). We call this approach breakpoint-conservative, but note that this has the opposite effect to the construction of NRR1 in that this approach is the most likely to allow breakpoints to remain inside putative non-recombining regions. Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. 4). P.L. 5 Comparisons of GC content across taxa. Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. You are using a browser version with limited support for CSS. For the current pandemic, the novel pathogen identification component of outbreak response delivered on its promise, with viral identification and rapid genomic analysis providing a genome sequence and confirmation, within weeks, that the December 2019 outbreak first detected in Wuhan, China was caused by a coronavirus3. Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). PubMed If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. NTD, N-terminal domain; CTD, C-terminal domain. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Patino-Galindo, J. Curr. 23, 18911901 (2006). Genetics 176, 10351047 (2007). 2). 84, 31343146 (2010). Python 379 102 pangoLEARN Public Store of the trained model for pangolin to access. MERS-CoV data were subsampled to match sample sizes with SARS-CoV and HCoV-OC43. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. Microbiol. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. Frontiers | Novel Highly Divergent SARS-CoV-2 Lineage With the Spike The Artic Network receives funding from the Wellcome Trust through project no. matics program called Pangolin was developed. Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. Press, 2009). Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? Cell 181, 223227 (2020). Gorbalenya, A. E. et al. 3) clusters with viruses from provinces in the centre, east and northeast of China. Extensive diversity of coronaviruses in bats from China. 30, 21962203 (2020). 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. 6, 8391 (2015). This leaves the insertion of polybasic. Liu, P. et al. This boundary appears to be rarely crossed. 4 TMRCAs for SARS-CoV and SARS-CoV-2. A single 3SEQ run on the genome alignment resulted in 67 out of 68sequences supporting some recombination in the past, with multiple candidate breakpoint ranges listed for each putative recombinant. The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. The shaded region corresponds to the Sprotein. J. Virol. Correspondence to Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Two other bat viruses (CoVZXC21 and CoVZC45) from Zhejiang Province fall on this lineage as recombinants of the RaTG13/SARS-CoV-2 lineage and the clade of Hong Kong bat viruses sampled between 2005 and 2007 (Fig. Removal of five sequences that appear to be recombinants and two small subregions of BFRA was necessary to ensure that there were no phylogenetic incongruence signals among or within the three BFRs. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). We used an uncorrelated relaxed clock model with log-normal distribution for all datasets, except for the low-diversity SARS data for which we specified a strict molecular clock model. with an alignment on which an initial recombination analysis was done. Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. Thank you for visiting nature.com. Sci. Wang, L. et al. Researchers have found that SARS-CoV-2 in humans shares about 90.3% of its genome sequence with a coronavirus found in pangolins (Cyranoski, 2020). Divergence time estimates based on the HCoV-OC43-centred rate prior for the separate BFRs (Supplementary Table 3) show consistency in TMRCA estimates across the genome. Phylogenies of subregions of NRR1 depict an appreciable degree of spatial structuring of the bat sarbecovirus population across different regions (Fig. Evol. PubMed Central Alexandre Hassanin, Vuong Tan Tu, Gabor Csorba, Nicola F. Mller, Kathryn E. Kistler & Trevor Bedford, Jack M. Crook, Ivana Murphy, Diana Bell, Simon Pollett, Matthew A. Conte, Irina Maljkovic Berry, Yatish Turakhia, Bryan Thornlow, Russell Corbett-Detig, Nature Microbiology A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the Spike protein. Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. First, we took an approach that relies on identification of mosaic regions (via 3SEQ14 v.1.7) that are also supported by PI signals19. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Sequences were aligned by MAFTT58 v.7.310, with a final alignment length of 30,927, and used in the analyses below. M.F.B. SARS-like WIV1-CoV poised for human emergence. "This is an extremely interesting . Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. Hu, B. et al. Nature 583, 282285 (2020). J. Gen. Virol. PubMed Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Emergence of SARS-CoV-2 through recombination and strong purifying selection. Biol. T.L. Phylogenetic trees and exact breakpoints for all ten BFRs are shown in Supplementary Figs. 88, 70707082 (2014). Yres, D. L. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist . The Bat, the Pangolin and the City: A Tale of COVID-19 Nature 579, 270273 (2020). Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence. Evol. Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. 1c). SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. These are in general agreement with estimates using NRR2 and NRA3, which result in divergence times of 1982 (19482009) and 1948 (18791999), respectively, for SARS-CoV-2, and estimates of 1952 (19061989) and 1970 (19321996), respectively, for the divergence time of SARS-CoV from its closest known bat relative. Centre for Genomic Pathogen Surveillance. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. The coronavirus genome that these researchers had assembled, from pangolin lung-tissue samples, contained some gene regions that were ninety-nine per cent similar to equivalent parts of the SARS . Nat. Overview of the SARS-CoV-2 genotypes circulating in Latin America Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. These residues are also in the Pangolin Guangdong 2019 sequence. Except for specifying that sequences are linear, all settings were kept to their defaults. Slider with three articles shown per slide. Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). A second breakpoint-conservative approach was conservative with respect to breakpoint identification, but this means that it is accepting of false-negative outcomes in breakpoint inference, resulting in less certainty that a putative NRR truly contains no breakpoints. Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). There are outstanding evolutionary questions on the recent emergence of human coronavirus SARS-CoV-2 including the role of reservoir species, the role of recombination and its time of divergence from animal viruses. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. 56, 152179 (1992). CAS These rate priors are subsequently used in the Bayesian inference of posterior rates for NRR1, NRR2, and NRA3 as indicated by the solid arrows. Bryant, D. & Moulton, V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. Hon, C. et al. Zhou, H. et al. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. 82, 18191826 (2008). 25, 3548 (2017). Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. 21, 255265 (2004). Coronavirus origins: genome analysis suggests two viruses may have combined This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. Sarbecovirus, HCoV-OC43 and SARS-CoV data were assembled from GenBank to be as complete as possible, with sampling year as an inclusion criterion. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nat. Proc. MC_UU_1201412). Trafficked pangolins can carry coronaviruses closely related to The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . Lancet 383, 541548 (2013). Dudas, G., Carvalho, L. M., Rambaut, A. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. 1) and thus likely to be the product of recombination, acquiring a divergent variable loop from a hitherto unsampled bat sarbecovirus28. R. Soc. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. Softw. A new coronavirus associated with human respiratory disease in China. Two exceptions can be seen in the relatively close relationship of Hong Kong viruses to those from Zhejiang Province (with two of the latter, CoVZC45 and CoVZXC21, identified as recombinants) and a recombinant virus from Sichuan for which part of the genome (regionB of SC2018 in Fig. These differences reflect the fact that rate estimates can vary considerably with the timescale of measurement, a frequently observed phenomenon in viruses known as time-dependent evolutionary rates41,43,44. Microbes Infect. D.L.R. While there is involvement of other mammalian speciesspecifically pangolins for SARS-CoV-2as a plausible conduit for transmission to humans, there is no evidence that pangolins are facilitating adaptation to humans. Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 A tag already exists with the provided branch name. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. The command line tool is open source software available under the GNU General Public License v3.0. PLoS ONE 5, e10434 (2010). eLife 7, e31257 (2018). Sliding window analysis of changes in the patterns of sequence similarity between human SARS-CoV-2, and pangolin and bat coronaviruses as described further in Fig. Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. The origins we present in Fig. Sequencing from Malayan pangolins collected during anti-smuggling operations in southern China detected coronavirus lineages related to SARS-CoV-2. PubMed Central In regionA, we removed subregion A1 (ntpositions 3,8724,716 within regionA) and subregion A4 (nt1,6422,113) because both showed PI signals with other subregions of regionA. The existing diversity and dynamic process of recombination amongst lineages in the bat reservoir demonstrate how difficult it will be to identify viruses with potential to cause major human outbreaks before they emerge. All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. Identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in China. J. Virol. 3) to examine the sensitivity of date estimates to this prior specification.