Mapping evolutionary pathways of HIV-1 drug resistance using conditional selection pressure Christopher Lee, UCLA
Stalemate: We React to them, They React to Us E.g. a virus attacks us, so we develop a drug, so the virus evolves drug resistance mutations Each step is a reaction to a past attack, based on limited, local information, without considering how the enemy will respond to our actions in the future. What if we had a global picture of HIV s possible evolutionary responses?
Rapid Evolution of Drug Resistance in the Global AIDS Epidemic 40 million cases, doubling every 10 years. 60% of patients fail first-line treatment, due to drug resistance; complex multi-drug therapies (HAART) are required. After introduction of a new drug, resistant HIV strains appear within weeks. Worldwide at least 10 14 new mutations per day.
HIV-1 Protease and RT: anti-retroviral drug targets protease RT Protease: responsible for the posttranslational processing of the viral polyproteins to yield the structural proteins and enzymes of the virus Reverse transcriptase (RT): responsible for DNA polymerization
Automated Mutation Analysis Previously developed Bayesian method for measuring evidence for mutations (single nucleotide polymorphisms) from chromatogram data, considering peak height, separation, contextsensitive sequencing error rates, uncertainty in alignment, uncertainty in allele frequency, etc. Completely automated. Irizarry et al., Nature Genet. 26, 233 (2000) Hu et al., Pharmacogenomics J., 2, 236 (2002)
HIV Has Peformed a Saturating Mutagenesis Experiment for Us 2.3 million high-confidence mutations, from 60,000+ AIDS patient samples (Specialty Labs Inc). 31.96 mutations per kb (compared with ~1 per kb in human). Detected 5.5 out of 9 possible SNP per codon, throughout protease+rt. Each mutation detected in an average of 360 different samples.
Selection Pressure Mapping: Build an Atlas of HIV Evolution Selection pressure measures whether an amino acid mutation is selected for (Ka/Ks>1) or against (Ka/Ks<1) by evolution, vs. synonymous mutations. Dataset: sequencing of 60,000 HIV clinical samples by Specialty Labs. Inc. 30-fold higher density of polymorphism information than human sequences. Goal: construct a selection pressure map of how HIV is evolving, where the virus is going, to evade our drugs.
Use HIV Evolution to Identify Natural Selection for Amino Acid Mutants Negative selection: amino acid changes reduce fitness, and are less frequent than expected. This is the normal situation for a stable gene. Positive selection: amino acid changes improve fitness, and are more frequent than expected. Very uncommon; indicates evolutionary change. Ka / Ks: standard metric for positive selection (Ka / Ks > 1), measured for a gene. Li, W.H. J.Mol.Evol. 36, 96 (1993)
Calculating Ka / Ks per Residue v v s t t s v v a t t a s a s a f n f n f n f n N N K K,,,, + + = observed #amino acid changes vs. synonymous mutations at this codon expected #amino acid changes vs. synonymous mutations at this codon ( ) i N i N N i s a a q q i N K K q N N i p LOD a! =! " # $ % & ' =! = (! = ) 1 log 1),, ( log 10 10 Confidence that Ka/Ks>1 is statistically significant: log-odds score LOD 2 means 99% confidence, LOD 3 means 99.9% confidence. Maximum Likelihood estimation of Ka/Ks using PAML takes into account inferred phylogeny, f t /f v ratio, mutation rates, etc.
50 45 HIV Protease Positive Selection Chen, Perlina & Lee, J. Virol. (2004) 40 35 30 Ka/Ks 25 20 15 10 5 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 Codon Position Positive selection mapping automatically discovers causes of drug resistance!
Selection Pressure Varies Among Different Amino Acid Mutations at One Codon Whole Codon Position Ka/Ks 20 0.49 47 1.03 85 1.21 LOD 10.64 0.29 2.13 Individual Amino Acid Mutation Mutation K20T K20M K20R K20Q K20N I47V I47R I47T I47M I85V I85L I85S Ka/Ks 2.44 2.07 0.79 0.01 0.00 3.52 0.10 0.01 0.01 3.08 0.09 0.04 LOD >300 >300 0.00 66.37 139.36 142.44 0.00 0.00 0.00 171.05 10.08 11.53 K20T, I47V, I85V: known associated with drug resistance
Positive Selection Mapping Identifies Drug Resistance Mutations Correctly identified 19 of 22 known drug resistance mutation positions. Compared with multi-year research process (clinical, biochemical, genetics) that was previously required, this analysis is completely automatic; works directly on output from sequencing machines.
Our Positive Selection Results Match Experimental Phenotypic Fitness Data Loeb et al. constructed ~50% of all possible point mutants of HIV-1 subtype B protease and assayed their biochemical activity. Nature 340, 397 (1989) All mutations tested Positive selected mutations 160 104 68 4 28 active intermediate inactive P-value of this match is 10-16.6
Build a Reaction Rate Diagram of HIV s Global Evolution A network diagram of the rates of transition between all possible genotypes. Ka/Ks is proportional to rate of increase of a mutation. Shows the speeds of all possible paths of evolution the viral population will follow, under the pressure of current drug treatments.
Selection Pressure is like an Evolutionary Velocity For wildtype, synonymous mutant, and amino acid mutant allele frequencies f o =1, f s =0, f a =0 initially, equal amino acid and synonymous mutation frequency λ, and reproduction rates r o, r a, after one unit of time f o " r o f o 1# 2$ ( ); f s " r o $f o ; f a " r a $f o Assuming λ<<1, Ka/Ks and the normalized f a will be: K a K s = f a f s = r a r o ; "f a = f a f o + f s + f a # $ r a r o = $ K a K s So initially the rate of change of the amino acid mutant allele frequency df a /dt is proportional to the selection pressure Ka/Ks.
What does Ka/Ks really calculate? 10 82 0.5 6.3 90 16.3 WT 3.5 45 0.3 30 Ka/Ks calculates selection pressure for mutation X, assuming wildtype (WT) as the starting genotype. So we should write Ka/Ks as being conditioned on WT as the starting point: ( = K a / K s ) X ( Ka / K s ) X WT WT Wildtype protein sequence 30 A single mutant at codon 30 Ka/Ks graph gives rates of flow of HIV population to different mutations.
Ka/Ks Ignores Interactions Because different sites interact, the effect of an amino acid mutation at one site can depend on the amino acid at other sites. But Ka/Ks does not consider these interactions; in effect, it assumes that all mutations take place in the wildtype sequence as a completely fixed background.
Conditional Ka/Ks Reveals Complete Mutation Network 82 12 3 WT 0.5 0.4 10 30 22 We can generalize Ka/Ks to measure the selection pressure for a mutation Y conditioned on the presence of a previous mutation X. We define this as the conditional Ka/Ks: 0.2 48 90 0.3 0.6 45 84 7 4 71 88 0.6 ( K a / Ks) Y X Each edge represents one conditional Ka/Ks value.
Fast Mutation Paths of HIV Protease
Different Paths to the Same Genotype can Differ in Speed Pathway to double mutant: simplest possible subgraph of the conditional Ka/Ks network. WT 11.69 90 0.52 5.37 10 34.50 10/90 Chen & Lee, Biol. Dir. (2006) Conditional Ka/Ks rate shown on each edge. 90 mutations are known to directly cause drug resistance but lower stability; 10 is site of compensatory mutations that improve stability. Our analysis distinguishes them: faster path first introduces the drug resistance mutation, then the stabilizing mutation. NB: speed of a multistep path is generally controlled by its slowest step.
Reproducible Results in Independent Datasets Specialty Stanford-Treated Stanford-Untreated WT 11.69 90 WT 10.81 90 WT 0.23 90 0.52 5.37 0.88 4.66 0.21 1.97 10 34.50 10/90 10 13.32 10/90 10 2.35 10/90 Highly reproducible in independent studies of different patients. They indicate real patterns of drug-associated selection pressure within the HIV population in the wild.
Quantitative Reproducibility Conditional Ka/Ks: Specialty vs. Treated Chen & Lee, Biol. Dir. (2006) 40 35 30 Primary drug resistance Accessory drug resistance 25 Function unknown Treated 20 15 10 5 0 0 5 10 15 20 25 30 35 40 For codons with sufficient counts (N Xa >400), the results match the Specialty results surprisingly well. Specialty
How can we distinguish Selection Pressure from Linkage Disequilibrium? Statistical correlations between mutations could reflect functional selection, or linkage disequilibrium between mutations. Ordinarily, no easy way to distinguish these two models, because they reflect different assumptions.
Co-occurrence of mutations may just reflect their evolutionary history When mutations occur infrequently during evolution, they will initially be strongly linked to pre-existing mutations on that branch (e.g. AR) A R W AR AR S SW SI S S I Two forces weaken linkage: high mutation rate (if the same mutation occurs many times independently during evolution); high recombination rate ( shuffles the deck )
Tools for Separating Selection from Linkage using Synonymous Mutations Synonymous mutations should not be under selection, so they should give a direct readout of pure linkage. So, compare pairwise correlations of amino acid mutations (A) and synonymous mutations (S): AA, AS, SS
AA Covariation in HIV is due to Selection, not background LD (AS, SS) AA AS SS Specialty Dataset
Reproducible Result in Independent Stanford-Treated Dataset AA AS SS Stanford-Treated Dataset
Evidence of AA Selection Pressure Vanishes in absence of drug treatment Stanford-Untreated Dataset
Phylogeny-based Estimation of Mutation Events vs. Star topology consensus A T A A samples T T A A A Number of T mutations Based on star topology N = 2 Number of T mutations based on phylogeny N_tree = 1
Phylip Neighbor-joining Tree for Specialty HIV 2002 Data (2500 seqs) Fits star topology
Na Ns Year 1999 ~ 6,000 samples (Zoom In) All codons Na_tree Ns_tree Na Ns Excluding codons that have 4-fold degenerate sites Na_tree Ns_tree
Distribution of Slope in 1000 Bootstrap Samples Na/Na_tree Ns/Ns_tree (Na/Ns)/(Na_tree/Ns_tree) Na/Na_tree Ns/Ns_tree Excluding codons that have 4-fold degenerate sites
Mixture Detection Alternative Method: use detection of mixtures in the sequencing chromatogram to catch mutation in the act (instead of inferring mutations from phylogenetic tree). Makes the (Ka/Ks) Y X truly a directed edge, in that now we measure it by counting Y 0 a mixtures, conditioned on X a unmixed.
Mixture-based Conditional Ka/Ks Asymmetric definition of conditional Ka/Ks based on observing a mixture ( mutation in progress ) at site Y, conditioning on presence (100%, not a mixture) of mutation at site X.
Na/Ns - mna/mns for each amino acid mutation log(mna/mns) Log(Na/Ns)
Experiment: Which Mutation Happened First in Patients? (Shafer, Stanford) Take multiple samples at different time points during each patient s treatment. Identified pairs of mutations that cooccur, then re-examined previous samples to see which mutation occurred first.
Different Paths to the Same Genotype Can Differ in Speed WT 0.16 88D 1.05 876 30N 57 30N/88D Slow step is the first step of each path: 30N path is about six-fold faster.. Pan et al. Nucleic Acids Res. 35: D371-D375 (2007) http://www.bioinformatics.ucla.edu/hiv
88D / 30N Comparison with Longitudinal Studies WT 0.16 88D 1.05 876 30N 57 30N/88D v(pathx) : v(pathy) = 1.05: 0.16 = 6.6 PathX 30N->30N88D 13 patients PathY 88D->30N88D 2 patients Longitudinal Data n(pathx): n(pathy) =13: 2
Different Paths to the Same Genotype Can Differ in Speed WT 171 90M 0.37 21 73S >300 73S/90M 90M mutations known to directly cause drug resistance but reduce protein stability; 73S mutations stabilize the mutant protein. Our analysis distinguishes them: faster path first introduces the drug resistance mutation, then the stabilizing mutation.
90M / 73S Comparison with Longitudinal Studies WT 171 90M 0.37 21 73S >300 73S/90M v(pathx) : v(pathy) = 0.37: 21 Longitudinal Data PathX 73S 73S90M 3 patients PathY 90M 73S90M 31 patients n(pathx): n(pathy) =3: 31
Shafer Longitudinal Results For 23 mutation pairs with a preferred path predicted by our conditional Ka/Ks data (p=0.01), 20 match the preferred kinetic pathway observed in Shafer s longitudinal patient studies. Statistically significant match, p=0.0002
Danger: Fast Paths to Multi-Drug Resistance Multiple resistance: the combination of three mutations (at codons 82 (V82A/T/S), 84 (I84V), and 90 (L90M)) is resistant to most available protease inhibitors Rapid evolution of this triple mutant is a serious threat to individual treatment and to control of the global AIDS epidemic. Our map shows where the fast paths to this combination are. Don t want to go there!
Reveals Accelerated Paths to Multi-Drug Resistance The path that includes mutation at codons 74 and 71 is 3 times faster than the direct path, and 7 times faster in its first step: WT 2.12 84 49.30 90 5.16 82 16.28 90 12.42 9.04 74 82 15.26 71 7.26 84 Use the order in which drugs are given (which in turn can select for one mutation over another) to pick a slower path!
Amino Acid Pairs Showing Strong Negative Selection Are Neighbors Atomic Distance (A) 40 20 0 1/100 1/10 Selection 1
5 Mutation Associations (Protease, AA) Wang & Lee, PLoS. ONE (2007) A 1 1 A
Correlated Synonymous Mutation Pairs Reveal RNA Secondary Structure A B Nine strongly significant pairs of synonymous mutations identified by covariance (p<10-10 ) form diagonals A and B suggesting RNA stemloop secondary structure. 1 >=10 Covariance Score
All Nine Mutation Pairs Preserve Watson-Crick Base Pairing Nucelotide 219 observed A G C U(wt) Nucelotide 321 G 0 0 50 166 T 4 1 0 3 C 0 4 0 3 A(wt) 42 76 93 20584 Nucelotide 222 observed G U C A(wt) Nucelotide 318 A 0 7 0 50 G 0 0 7 56 C 12 0 0 26 U(wt) 54 6 13 20798 Strong evidence for RNA secondary structure (p<2.3 x 10-28 )
250 260 U - A U - A Covariance Pairs Exactly Match RNA Stem-Loop Structure Predicted by mfold, RNAfold, and RNAz 240 Stem B Stem A 230 270 310 300 280 G - C 290 This structure is conserved over all members of the HIV Group M family. This newly discovered structure may help explain a recombination hotspot previously reported in this region of the HIV genome. 220 320 Wang & Lee, submitted. G - C U - A C - G C - G A - U G - C
The Future of SNP Analysis? It should be emphasized that the only input to this analysis was single nucleotide polymorphism discovery from chromatograms. No information about drug treatment, time, etc. Yet conditional Ka/Ks yields drugresistance mutations; kinetics & temporal order; pathways.
A New Level of Strategic Intelligence A global picture of how HIV will respond in the future to our drug treatments. Ka/Ks velocities tell us where HIV population is going, detectable even while mutations still rare. Moreover, since these selection pressures are due to our actions (drugs), they are manipulable. Even slowing DR evolution two-fold could make a big difference for control of the epidemic.
HIV Positive Selection Database http://www.bioinformatics.ucla.edu/hiv Atlases of HIV drug resistance evolution from Specialty dataset. Analysis tools (snpindex).
Acknowledgements Lamei Chen: Ka/Ks analysis Qi Wang: linkage analysis Calvin Pan: new analyses of cond. Ka/Ks Alexander Alekseyenko: Nested List Specialty Laboratories, Inc. Alla Perlina, Beatrisa Boyadzhyan UCLA Collaborators: Christina Kitchen, Paul Krogstad http://www.bioinformatics.ucla.edu/hiv