Mapping evolutionary pathways of HIV-1 drug resistance using conditional selection pressure. Christopher Lee, UCLA

Similar documents
Mapping evolutionary pathways of HIV-1 drug resistance using conditional selection pressure. Christopher Lee, UCLA

Mapping Evolutionary Pathways of HIV-1 Drug Resistance. Christopher Lee, UCLA Dept. of Chemistry & Biochemistry

The HIV positive selection mutation database

Lamei Chen, Alla Perlina and Christopher J. Lee /JVI

Phylogenetic Methods

Immune pressure analysis of protease and reverse transcriptase genes of primary HIV-1 subtype C isolates from South Africa

An Evolutionary Story about HIV

LESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication

Models of HIV during antiretroviral treatment

Distinguishing epidemiological dependent from treatment (resistance) dependent HIV mutations: Problem Statement

To test the possible source of the HBV infection outside the study family, we searched the Genbank

Evolutionary origin of correlated mutations in protein sequence alignments

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

HIV Drug Resistance. Together, we can change the course of the HIV epidemic one woman at a time.

EVOLUTION. Reading. Research in my Lab. Who am I? The Unifying Concept in Biology. Professor Carol Lee. On your Notecards please write the following:

Mapping the Antigenic and Genetic Evolution of Influenza Virus

Principles of phylogenetic analysis

Characterizing intra-host influenza virus populations to predict emergence

Patterns of hemagglutinin evolution and the epidemiology of influenza

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution.

Host Genomics of HIV-1

Evolution of influenza

Modeling the Antigenic Evolution of Influenza Viruses from Sequences

Rajesh Kannangai Phone: ; Fax: ; *Corresponding author

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

Molecular Evolution and the Neutral Theory

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

NRL EQAS for NAT: Assessing the variability and performance of molecular assays for clinical pathogens

Sex, viruses, and the statistical physics of evolution

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

HIV and drug resistance Simon Collins UK-CAB 1 May 2009

CONSTRUCTION OF PHYLOGENETIC TREE USING NEIGHBOR JOINING ALGORITHMS TO IDENTIFY THE HOST AND THE SPREADING OF SARS EPIDEMIC

Practice Problems 8. a) What do we define as a beneficial or advantageous mutation to the virus? Why?

Clinical utility of NGS for the detection of HIV and HCV resistance

Case Study. Dr Sarah Sasson Immunopathology Registrar. HIV, Immunology and Infectious Diseases Department and SydPath, St Vincent's Hospital.

Virological failure to Protease inhibitors in Monotherapy is linked to the presence of signature mutations in Gag without changes in HIV-1 replication

Reliable Prediction of Viral RNA Structures

YUMI YAMAGUCHI-KABATA AND TAKASHI GOJOBORI* Center for Information Biology, National Institute of Genetics, Mishima , Japan

Evolutionary distances between proteins of the Influenza A Virus

Transmission Fitness of Drug- Resistant HIV Revealed in the United States National Surveillance System

On an individual level. Time since infection. NEJM, April HIV-1 evolution in response to immune selection pressures

The BLAST search on NCBI ( and GISAID

HIV-1 subtype C in Karonga District, Malawi. Simon Travers

Deep-Sequencing of HIV-1

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

Genetics and Genomics in Medicine Chapter 8 Questions

Self reported ethnicity

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Evolution of hepatitis C virus in blood donors and their respective recipients

CS2220 Introduction to Computational Biology

11.1 Genetic Variation Within Population. KEY CONCEPT A population shares a common gene pool.

HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS

Origins and evolutionary genomics of the novel avian-origin H7N9 influenza A virus in China: Early findings

SEX. Genetic Variation: The genetic substrate for natural selection. Sex: Sources of Genotypic Variation. Genetic Variation

Multiple sequence alignment

PRINCIPLES and TRENDS in MANAGEMENT of HIV DISEASE: PROBLEMS OF DRUG RESISTANCE in VIRUSES of DIFFERENT SUBTYPES

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

NEXT GENERATION SEQUENCING OPENS NEW VIEWS ON VIRUS EVOLUTION AND EPIDEMIOLOGY. 16th International WAVLD symposium, 10th OIE Seminar

Mutants and HBV vaccination. Dr. Ulus Salih Akarca Ege University, Izmir, Turkey

Research Strategy: 1. Background and Significance

Evaluation of Dried Blood Spots (DBS) for Human Immunodeficiency Virus (HIV-1) Drug Resistance Testing

Dan Koller, Ph.D. Medical and Molecular Genetics

Reverse transcriptase and protease inhibitor resistant mutations in art treatment naïve and treated hiv-1 infected children in India A Short Review

From Mosquitos to Humans: Genetic evolution of Zika Virus

SUPPLEMENTARY INFORMATION

Host Dependent Evolutionary Patterns and the Origin of 2009 H1N1 Pandemic Influenza

COMPARISON OF HIV DRUG-RESISTANT MUTANT DETECTION BY NGS WITH AND WITHOUT UNIQUE MOLECULAR IDENTIFIERS (UMI)

Molecular Biology (BIOL 4320) Exam #2 May 3, 2004

Citation for published version (APA): Von Eije, K. J. (2009). RNAi based gene therapy for HIV-1, from bench to bedside

Received 4 August 2005/Accepted 7 December 2005

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

HIV replication and selection of resistance: basic principles

HBV. Next Generation Sequencing, data analysis and reporting. Presenter Leen-Jan van Doorn

Management of NRTI Resistance

New technologies reaching the clinic

Nature Medicine: doi: /nm.4322

Student Handout Bioinformatics

7.014 Problem Set 7 Solutions

Finding protein sites where resistance has evolved

Section 9. Junaid Malek, M.D.

ChIP-seq data analysis

High Failure Rate of the ViroSeq HIV-1 Genotyping System for Drug Resistance Testing in Cameroon, a Country with Broad HIV-1 Genetic Diversity

HIV Informatics Service

Bioinformation by Biomedical Informatics Publishing Group

Package CorMut. January 21, 2019

Viral Genetics. BIT 220 Chapter 16

AIDS - Knowledge and Dogma. Conditions for the Emergence and Decline of Scientific Theories Congress, July 16/ , Vienna, Austria

De Novo Viral Quasispecies Assembly using Overlap Graphs

Table S1: Kinetic parameters of drug and substrate binding to wild type and HIV-1 protease variants. Data adapted from Ref. 6 in main text.

Identification of Mutation(s) in. Associated with Neutralization Resistance. Miah Blomquist

Peptide hydrolysis uncatalyzed half-life = ~450 years HIV protease-catalyzed half-life = ~3 seconds

PROTOCOL FOR INFLUENZA A VIRUS GLOBAL SWINE H1 CLADE CLASSIFICATION

Host-Specific Modulation of the Selective Constraints Driving Human Immunodeficiency Virus Type 1 env Gene Evolution

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Study the Evolution of the Avian Influenza Virus

HIV-1 Subtypes: An Overview. Anna Maria Geretti Royal Free Hospital

Computational Simulations of the Human Immunodeficiency Virus Rev-RRE RRE Complex

Transcription:

Mapping evolutionary pathways of HIV-1 drug resistance using conditional selection pressure Christopher Lee, UCLA

Stalemate: We React to them, They React to Us E.g. a virus attacks us, so we develop a drug, so the virus evolves drug resistance mutations Each step is a reaction to a past attack, based on limited, local information, without considering how the enemy will respond to our actions in the future. What if we had a global picture of HIV s possible evolutionary responses?

Rapid Evolution of Drug Resistance in the Global AIDS Epidemic 40 million cases, doubling every 10 years. 60% of patients fail first-line treatment, due to drug resistance; complex multi-drug therapies (HAART) are required. After introduction of a new drug, resistant HIV strains appear within weeks. Worldwide at least 10 14 new mutations per day.

HIV-1 Protease and RT: anti-retroviral drug targets protease RT Protease: responsible for the posttranslational processing of the viral polyproteins to yield the structural proteins and enzymes of the virus Reverse transcriptase (RT): responsible for DNA polymerization

Automated Mutation Analysis Previously developed Bayesian method for measuring evidence for mutations (single nucleotide polymorphisms) from chromatogram data, considering peak height, separation, contextsensitive sequencing error rates, uncertainty in alignment, uncertainty in allele frequency, etc. Completely automated. Irizarry et al., Nature Genet. 26, 233 (2000) Hu et al., Pharmacogenomics J., 2, 236 (2002)

HIV Has Peformed a Saturating Mutagenesis Experiment for Us 2.3 million high-confidence mutations, from 60,000+ AIDS patient samples (Specialty Labs Inc). 31.96 mutations per kb (compared with ~1 per kb in human). Detected 5.5 out of 9 possible SNP per codon, throughout protease+rt. Each mutation detected in an average of 360 different samples.

Selection Pressure Mapping: Build an Atlas of HIV Evolution Selection pressure measures whether an amino acid mutation is selected for (Ka/Ks>1) or against (Ka/Ks<1) by evolution, vs. synonymous mutations. Dataset: sequencing of 60,000 HIV clinical samples by Specialty Labs. Inc. 30-fold higher density of polymorphism information than human sequences. Goal: construct a selection pressure map of how HIV is evolving, where the virus is going, to evade our drugs.

Use HIV Evolution to Identify Natural Selection for Amino Acid Mutants Negative selection: amino acid changes reduce fitness, and are less frequent than expected. This is the normal situation for a stable gene. Positive selection: amino acid changes improve fitness, and are more frequent than expected. Very uncommon; indicates evolutionary change. Ka / Ks: standard metric for positive selection (Ka / Ks > 1), measured for a gene. Li, W.H. J.Mol.Evol. 36, 96 (1993)

Calculating Ka / Ks per Residue v v s t t s v v a t t a s a s a f n f n f n f n N N K K,,,, + + = observed #amino acid changes vs. synonymous mutations at this codon expected #amino acid changes vs. synonymous mutations at this codon ( ) i N i N N i s a a q q i N K K q N N i p LOD a! =! " # $ % & ' =! = (! = ) 1 log 1),, ( log 10 10 Confidence that Ka/Ks>1 is statistically significant: log-odds score LOD 2 means 99% confidence, LOD 3 means 99.9% confidence. Maximum Likelihood estimation of Ka/Ks using PAML takes into account inferred phylogeny, f t /f v ratio, mutation rates, etc.

50 45 HIV Protease Positive Selection Chen, Perlina & Lee, J. Virol. (2004) 40 35 30 Ka/Ks 25 20 15 10 5 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Codon Position Positive selection mapping automatically discovers causes of drug resistance!

Selection Pressure Varies Among Different Amino Acid Mutations at One Codon Whole Codon Position Ka/Ks 20 0.49 47 1.03 85 1.21 LOD 10.64 0.29 2.13 Individual Amino Acid Mutation Mutation K20T K20M K20R K20Q K20N I47V I47R I47T I47M I85V I85L I85S Ka/Ks 2.44 2.07 0.79 0.01 0.00 3.52 0.10 0.01 0.01 3.08 0.09 0.04 LOD >300 >300 0.00 66.37 139.36 142.44 0.00 0.00 0.00 171.05 10.08 11.53 K20T, I47V, I85V: known associated with drug resistance

Positive Selection Mapping Identifies Drug Resistance Mutations Correctly identified 19 of 22 known drug resistance mutation positions. Compared with multi-year research process (clinical, biochemical, genetics) that was previously required, this analysis is completely automatic; works directly on output from sequencing machines.

Our Positive Selection Results Match Experimental Phenotypic Fitness Data Loeb et al. constructed ~50% of all possible point mutants of HIV-1 subtype B protease and assayed their biochemical activity. Nature 340, 397 (1989) All mutations tested Positive selected mutations 160 104 68 4 28 active intermediate inactive P-value of this match is 10-16.6

Build a Reaction Rate Diagram of HIV s Global Evolution A network diagram of the rates of transition between all possible genotypes. Ka/Ks is proportional to rate of increase of a mutation. Shows the speeds of all possible paths of evolution the viral population will follow, under the pressure of current drug treatments.

Selection Pressure is like an Evolutionary Velocity For wildtype, synonymous mutant, and amino acid mutant allele frequencies f o =1, f s =0, f a =0 initially, equal amino acid and synonymous mutation frequency λ, and reproduction rates r o, r a, after one unit of time f o " r o f o 1# 2$ ( ); f s " r o $f o ; f a " r a $f o Assuming λ<<1, Ka/Ks and the normalized f a will be: K a K s = f a f s = r a r o ; "f a = f a f o + f s + f a # $ r a r o = $ K a K s So initially the rate of change of the amino acid mutant allele frequency df a /dt is proportional to the selection pressure Ka/Ks.

What does Ka/Ks really calculate? 10 82 0.5 6.3 90 16.3 WT 3.5 45 0.3 30 Ka/Ks calculates selection pressure for mutation X, assuming wildtype (WT) as the starting genotype. So we should write Ka/Ks as being conditioned on WT as the starting point: ( = K a / K s ) X ( Ka / K s ) X WT WT Wildtype protein sequence 30 A single mutant at codon 30 Ka/Ks graph gives rates of flow of HIV population to different mutations.

Ka/Ks Ignores Interactions Because different sites interact, the effect of an amino acid mutation at one site can depend on the amino acid at other sites. But Ka/Ks does not consider these interactions; in effect, it assumes that all mutations take place in the wildtype sequence as a completely fixed background.

Conditional Ka/Ks Reveals Complete Mutation Network 82 12 3 WT 0.5 0.4 10 30 22 We can generalize Ka/Ks to measure the selection pressure for a mutation Y conditioned on the presence of a previous mutation X. We define this as the conditional Ka/Ks: 0.2 48 90 0.3 0.6 45 84 7 4 71 88 0.6 ( K a / Ks) Y X Each edge represents one conditional Ka/Ks value.

Fast Mutation Paths of HIV Protease

Different Paths to the Same Genotype can Differ in Speed Pathway to double mutant: simplest possible subgraph of the conditional Ka/Ks network. WT 11.69 90 0.52 5.37 10 34.50 10/90 Chen & Lee, Biol. Dir. (2006) Conditional Ka/Ks rate shown on each edge. 90 mutations are known to directly cause drug resistance but lower stability; 10 is site of compensatory mutations that improve stability. Our analysis distinguishes them: faster path first introduces the drug resistance mutation, then the stabilizing mutation. NB: speed of a multistep path is generally controlled by its slowest step.

Reproducible Results in Independent Datasets Specialty Stanford-Treated Stanford-Untreated WT 11.69 90 WT 10.81 90 WT 0.23 90 0.52 5.37 0.88 4.66 0.21 1.97 10 34.50 10/90 10 13.32 10/90 10 2.35 10/90 Highly reproducible in independent studies of different patients. They indicate real patterns of drug-associated selection pressure within the HIV population in the wild.

Quantitative Reproducibility Conditional Ka/Ks: Specialty vs. Treated Chen & Lee, Biol. Dir. (2006) 40 35 30 Primary drug resistance Accessory drug resistance 25 Function unknown Treated 20 15 10 5 0 0 5 10 15 20 25 30 35 40 For codons with sufficient counts (N Xa >400), the results match the Specialty results surprisingly well. Specialty

How can we distinguish Selection Pressure from Linkage Disequilibrium? Statistical correlations between mutations could reflect functional selection, or linkage disequilibrium between mutations. Ordinarily, no easy way to distinguish these two models, because they reflect different assumptions.

Co-occurrence of mutations may just reflect their evolutionary history When mutations occur infrequently during evolution, they will initially be strongly linked to pre-existing mutations on that branch (e.g. AR) A R W AR AR S SW SI S S I Two forces weaken linkage: high mutation rate (if the same mutation occurs many times independently during evolution); high recombination rate ( shuffles the deck )

Tools for Separating Selection from Linkage using Synonymous Mutations Synonymous mutations should not be under selection, so they should give a direct readout of pure linkage. So, compare pairwise correlations of amino acid mutations (A) and synonymous mutations (S): AA, AS, SS

AA Covariation in HIV is due to Selection, not background LD (AS, SS) AA AS SS Specialty Dataset

Reproducible Result in Independent Stanford-Treated Dataset AA AS SS Stanford-Treated Dataset

Evidence of AA Selection Pressure Vanishes in absence of drug treatment Stanford-Untreated Dataset

Phylogeny-based Estimation of Mutation Events vs. Star topology consensus A T A A samples T T A A A Number of T mutations Based on star topology N = 2 Number of T mutations based on phylogeny N_tree = 1

Phylip Neighbor-joining Tree for Specialty HIV 2002 Data (2500 seqs) Fits star topology

Na Ns Year 1999 ~ 6,000 samples (Zoom In) All codons Na_tree Ns_tree Na Ns Excluding codons that have 4-fold degenerate sites Na_tree Ns_tree

Distribution of Slope in 1000 Bootstrap Samples Na/Na_tree Ns/Ns_tree (Na/Ns)/(Na_tree/Ns_tree) Na/Na_tree Ns/Ns_tree Excluding codons that have 4-fold degenerate sites

Mixture Detection Alternative Method: use detection of mixtures in the sequencing chromatogram to catch mutation in the act (instead of inferring mutations from phylogenetic tree). Makes the (Ka/Ks) Y X truly a directed edge, in that now we measure it by counting Y 0 a mixtures, conditioned on X a unmixed.

Mixture-based Conditional Ka/Ks Asymmetric definition of conditional Ka/Ks based on observing a mixture ( mutation in progress ) at site Y, conditioning on presence (100%, not a mixture) of mutation at site X.

Na/Ns - mna/mns for each amino acid mutation log(mna/mns) Log(Na/Ns)

Experiment: Which Mutation Happened First in Patients? (Shafer, Stanford) Take multiple samples at different time points during each patient s treatment. Identified pairs of mutations that cooccur, then re-examined previous samples to see which mutation occurred first.

Different Paths to the Same Genotype Can Differ in Speed WT 0.16 88D 1.05 876 30N 57 30N/88D Slow step is the first step of each path: 30N path is about six-fold faster.. Pan et al. Nucleic Acids Res. 35: D371-D375 (2007) http://www.bioinformatics.ucla.edu/hiv

88D / 30N Comparison with Longitudinal Studies WT 0.16 88D 1.05 876 30N 57 30N/88D v(pathx) : v(pathy) = 1.05: 0.16 = 6.6 PathX 30N->30N88D 13 patients PathY 88D->30N88D 2 patients Longitudinal Data n(pathx): n(pathy) =13: 2

Different Paths to the Same Genotype Can Differ in Speed WT 171 90M 0.37 21 73S >300 73S/90M 90M mutations known to directly cause drug resistance but reduce protein stability; 73S mutations stabilize the mutant protein. Our analysis distinguishes them: faster path first introduces the drug resistance mutation, then the stabilizing mutation.

90M / 73S Comparison with Longitudinal Studies WT 171 90M 0.37 21 73S >300 73S/90M v(pathx) : v(pathy) = 0.37: 21 Longitudinal Data PathX 73S 73S90M 3 patients PathY 90M 73S90M 31 patients n(pathx): n(pathy) =3: 31

Shafer Longitudinal Results For 23 mutation pairs with a preferred path predicted by our conditional Ka/Ks data (p=0.01), 20 match the preferred kinetic pathway observed in Shafer s longitudinal patient studies. Statistically significant match, p=0.0002

Danger: Fast Paths to Multi-Drug Resistance Multiple resistance: the combination of three mutations (at codons 82 (V82A/T/S), 84 (I84V), and 90 (L90M)) is resistant to most available protease inhibitors Rapid evolution of this triple mutant is a serious threat to individual treatment and to control of the global AIDS epidemic. Our map shows where the fast paths to this combination are. Don t want to go there!

Reveals Accelerated Paths to Multi-Drug Resistance The path that includes mutation at codons 74 and 71 is 3 times faster than the direct path, and 7 times faster in its first step: WT 2.12 84 49.30 90 5.16 82 16.28 90 12.42 9.04 74 82 15.26 71 7.26 84 Use the order in which drugs are given (which in turn can select for one mutation over another) to pick a slower path!

Amino Acid Pairs Showing Strong Negative Selection Are Neighbors Atomic Distance (A) 40 20 0 1/100 1/10 Selection 1

5 Mutation Associations (Protease, AA) Wang & Lee, PLoS. ONE (2007) A 1 1 A

Correlated Synonymous Mutation Pairs Reveal RNA Secondary Structure A B Nine strongly significant pairs of synonymous mutations identified by covariance (p<10-10 ) form diagonals A and B suggesting RNA stemloop secondary structure. 1 >=10 Covariance Score

All Nine Mutation Pairs Preserve Watson-Crick Base Pairing Nucelotide 219 observed A G C U(wt) Nucelotide 321 G 0 0 50 166 T 4 1 0 3 C 0 4 0 3 A(wt) 42 76 93 20584 Nucelotide 222 observed G U C A(wt) Nucelotide 318 A 0 7 0 50 G 0 0 7 56 C 12 0 0 26 U(wt) 54 6 13 20798 Strong evidence for RNA secondary structure (p<2.3 x 10-28 )

250 260 U - A U - A Covariance Pairs Exactly Match RNA Stem-Loop Structure Predicted by mfold, RNAfold, and RNAz 240 Stem B Stem A 230 270 310 300 280 G - C 290 This structure is conserved over all members of the HIV Group M family. This newly discovered structure may help explain a recombination hotspot previously reported in this region of the HIV genome. 220 320 Wang & Lee, submitted. G - C U - A C - G C - G A - U G - C

The Future of SNP Analysis? It should be emphasized that the only input to this analysis was single nucleotide polymorphism discovery from chromatograms. No information about drug treatment, time, etc. Yet conditional Ka/Ks yields drugresistance mutations; kinetics & temporal order; pathways.

A New Level of Strategic Intelligence A global picture of how HIV will respond in the future to our drug treatments. Ka/Ks velocities tell us where HIV population is going, detectable even while mutations still rare. Moreover, since these selection pressures are due to our actions (drugs), they are manipulable. Even slowing DR evolution two-fold could make a big difference for control of the epidemic.

HIV Positive Selection Database http://www.bioinformatics.ucla.edu/hiv Atlases of HIV drug resistance evolution from Specialty dataset. Analysis tools (snpindex).

Acknowledgements Lamei Chen: Ka/Ks analysis Qi Wang: linkage analysis Calvin Pan: new analyses of cond. Ka/Ks Alexander Alekseyenko: Nested List Specialty Laboratories, Inc. Alla Perlina, Beatrisa Boyadzhyan UCLA Collaborators: Christina Kitchen, Paul Krogstad http://www.bioinformatics.ucla.edu/hiv