This presentation bit.ly/sarscov2-selection Galaxy-ELIXIR webinar series Evolution of SARS-CoV-2 covid19.galaxyproject.org github.com/veg/SARS-CoV-2 Sergei L Kosakovsky Pond (spond@temple.edu / @sergeilkp / lab.hyphy.org) Natural Selection • Mutation, recombination and other processes introduce variation into genomes of organisms • The fitness of an organism describes how well it can survive/grow/function/replicate in a given environment, or how well it can pass on its genetic material to future generations • Any particular mutation can be • Neutral: no or little change in fitness (the majority of genetic variation falls into this class according to the neutral theory) • Deleterious: reduced fitness • Adaptive: increased fitness • The same mutation can have different fitness costs in different environments (fitness landscape), and different genetic backgrounds (epistasis) What does selection in viruses look like? • Necessary conditions • Selective pressure (immune, drug, other host factors) • Time • Have we had those in SARS-CoV-2? • No clear evidence so far, which is not unexpected. • ~6 months? What does selection in viruses look like? • To detect selection we need • Sufficiently many sequences • Divergence/diversity • Repeated substitutions (diversifying) • Change in frequency (directional) Note the scale 40 years of HA evolution in HA of H3N2 showing branches with repeated selective events at a canonical antigenic site A_BILTHOVEN_2271_197 A A C N Node1399 A A C N A_TEXAS_1_1977_37753 A A C N A_TEXAS_1_1977_36548 A A C N A_ENGLAND_321_1977_1 A A C N Node25 A A C N A_BILTHOVEN_2271_197 A A C N Node24 A A C N A_VICTORIA_3_1975_13 A A C N Node1440 A A C N A_VICTORIA_3_1975_13 A A C N Node1439 A A C N A_VICTORIA_1968_2420 A A C N Node1438 A A C N A_VICTORIA_3_1975_36 Fixation after a A A C N Node1406 A A C N few years in A_BEIJING_39_75_9040 A A C N Node1408 A A C N human hosts A_ALBANY_15_1976_613 Node1410 A A C A A C N N for H3N2 HA A_MEMPHIS_137_1976_9 A_BILTHOVEN_6545_197 A A C A A C N N A_MEMPHIS_103_1976_9 A A C N A_BILTHOVEN_3895_197 A A C N A_ALBANY_4_1977_5968 A A C N A_BILTHOVEN_5657_197 A A C N A_ROTTERDAM_8179_197 A A T N Node1424 A A C N A_ROTTERDAM_5828_197 A A C N A_AMSTERDAM_1609_197 A A C N A_BILTHOVEN_1761_197 A A C N A_BILTHOVEN_5029_197 A A C N A_MEMPHIS_106_1976_9 A A C N A_ALBANY_1_1976_6163 A A C N A_HONG_KONG_43_1975_ A A C N Node23 A A C N A_ALBANY_42_1975_596 A A C N Node1446 A A C N A_MEMPHIS_101_1974_9 A A C N A_MEMPHIS_103_1974_9 A A C N Node1445 A A C N A_BILTHOVEN_1843_197 A A C N Node1451 A A C N A_BILTHOVEN_334_1975 A A C N A_BILTHOVEN_7398_197 A A C N Node1454 A A C N A_BILTHOVEN_5930_197 A A C N Node1456 A A C N A_BILTHOVEN_5931_197 A A C N A_BILTHOVEN_9459_197 A A C N Node20 A A C N A_PORT_CHALMERS_1_19 A A C N Node1462 A A C N A_PORT_CHALMERS_1_19 A A C N A_HONG_KONG_49_1974_ G A C D Node1465 G A C D A_HONG_KONG_33_1973_ G A C D A_MEMPHIS_101_1972_9 G A C D Node1477 G A C D A_MEMPHIS_102_72_790 G A C D Node14 G A C D A_MEMPHIS_105_1972_9 G A C D Node13 G A C D A_ENGLAND_42_1972_36 G A C D Node1511 G A C D A_ENGLAND_72_230566 G A C D A_ENGLAND_42_1972_38 G A C D Node12 G A C D A_BILTHOVEN_21801_19 G A C D Sufficiently many sequences • Divergence/diversity For SARS-CoV-2 we have… • Repeated substitutions (diversifying) • Change in frequency (directional) https://observablehq.com/@stevenweaver/case-vs-sequence-count Sufficiently many sequences For SARS-CoV-2 we have… Divergence/diversity • Repeated substitutions (diversifying) Mean of ~7 Little divergence… genome wide differences • Change in frequency (directional) from reference https://observablehq.com/@spond/current-state-of-sars-cov-2-evolution Sufficiently many sequences For SARS-CoV-2 we have… Divergence/diversity Mean of ~9 genome- • Repeated substitutions wide pairwise (diversifying) Little diversity… differences in contemporaneous • Change in frequency (directional) strains https://observablehq.com/@spond/current-state-of-sars-cov-2-evolution BUT… Sufficiently many sequences Divergence/diversity There is EXTENSIVE apparent genome variation at population level • Repeated substitutions (diversifying) • Change in frequency (directional) https://observablehq.com/@spond/summary-of-sars-cov-2-genomic-diversity BUT… Sufficiently many sequences Divergence/diversity There is EXTENSIVE apparent genome variation at population level • Repeated substitutions (diversifying) • Change in frequency (directional) Shared a/a variants Any variants https://observablehq.com/@spond/summary-of-sars-cov-2-genomic-diversity Sufficiently many sequences MOST VARIANTS ARE RARE Divergence/diversity • Repeated substitutions (diversifying) [Some/extensive] Sequencing error • Change in frequency (directional) [Some] RNA editing [Majority] Neutral/slightly deleterious intra-host variants [Minority] Important variation ≤1% >1% https://observablehq.com/@spond/summary-of-sars-cov-2-genomic-diversity Conservation Measles, rinderpest, and peste-de-petite ruminant viruses nucleoprotein. Nucleotides Aminoacids Diversification An antigenic site in H3N2 IAV hemagglutinin Nucleotides Aminoacids Molecular signatures of selection Because synonymous substitutions do not alter the protein, we often assume that they are neutral The rate of accumulation of synonymous substitutions (dS) can serve as the neutral background evolutionary rate We can compare the rate of accumulation of non-synonymous substitutions (dN), which alter the protein sequence, to dS and use their ratio to classify the nature of the evolutionary process number of fixed synonymous mutations dS ⇠ proportion of random mutations that are synonymous number of fixed non-synonymous mutations dN ⇠ proportion of random mutations that are non-synonymous Molecular signatures of selection Over the last 15 years, my lab and collaborators have been developing a collection of methods for estimating dN/dS and interpreting evidence for selection Methods are implemented in the HyPhy (hyphy.org) software package and also available in Galaxy. Molecular signatures of selection Also available on a standalone server https:// observablehq.com/ (datamonkey.org), which people have been using @stevenweaver/ datamonkey-and- to study SARS-CoV-2 sars-cov-2-related- analyses Sufficiently many sequences Divergence/diversity Standard dN/dS analyses • Repeated substitutions (diversifying) Won’t work too well at the moment… • Change in frequency (directional) • Will be influenced by “noise” and variation at the tips • Focus only on internal branches. • Has very little signal to operate on • Even on 15,000+ sequences, total branch lengths (~power) is minuscule: ~0.1 substitution across the entire tree per site. • Power to infer selection on internal branches is low What are the alternatives? • Give up (wait until something more obvious happens) • Go back to the 1980s (look at sequence data and count differences) • Use several additional sources of information! • Intra-host variation • Evolution in other beta-CoV • Temporal and functional annotation data Current “selective” state of SARS-CoV-2 https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 Eleven high priority sites https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 Positive selection based on dN/ dS (FEL and or MEME, internal branches only) https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 Multiple internal branches flagged as supporting selective episodes (MEME) https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 High frequency variants https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 Increasing frequency trend week over week https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 Same variants observed at intermediate frequencies in intra-host samples (variant calling from Illumina and ONT data) https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 RNA editing (or contamination or RNA degradation) is a thing… And many of these variants are found in assembled genomes Overlaps predicted CTL epitopes https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 SARS-CoV-2 Non-neutral/differential evolution observed in related beta-CoV https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 SARS-CoV-2 “Unexpected” residue (F) based on the evolution of related (and distant) beta-CoV https://observablehq.com/@spond/natural-selection-analysis-of-sars-cov-2-covid-19 Prioritization • At this point there is no compelling reason to argue that any particular residue is of functional importance • However, we can identify sets of genomic positions, where multiple lines of evidence suggest that evolution may be non neutral. • dN/dS • Intra-host • Beta-CoV • Temporal trends • Functional annotation/prediction https://observablehq.com/@spond/summary-of-sars-cov-2-genomic-diversity https://observablehq.com/@spond/summary-of-sars-cov-2-genomic-diversity Longer genes / ORFs : more variants Consistent with largely neutral / noisy variants Predicted HLA epitopes Not an overwhelming factor for selection at the moment Sequencing errors / neutral mutations General trend for stronger purifying selection on more frequent variants Summary • There are ~10 interesting genomic positions • Mostly in non-structural proteins and ORF3a • Most of these variants have been consistently identified since March (when we started these analyses) • Our pipeline is public, generates machine readable results, comes with visualization tools and is run on all public and semi-public (GISAID) data regularly covid19.galaxyproject.org github.com/veg/SARS-CoV-2 Pipeline/workflow Currently under development / release soon 0. Get GISAID sequences and metadata 1. QC and map (codon-aware), splitting into ORFs/genes 2. Collapse to unique sequences (comparative methods only use those) 3. Build a crude tree (was RaXML, now RapidNJ [G]) 4. Compute pairwise distances (TN93[G]) for divergence and diversity 5. Run selection analyses (HyPhy [G]) 6. Collate results (Python [in progress]) 7. Viz results (D3, Vegalite, phylotree.js, observable [viz module in progress])
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-