Phylogenetic Analysis of HIV Samples from a Single Host

Size: px
Start display at page:

Download "Phylogenetic Analysis of HIV Samples from a Single Host"

Transcription

1 Phylogenetic Analysis of HIV Samples from a Single Host Master Thesis Rounak Vyas November 20, 2011 Advisors: Prof. Niko Beerenwinkel, Dr. Osvaldo Zagordi Computational Biology Group, ETH Zürich

2 Contents Contents i 1 Introduction AIDS HIV Longitudinal Studies HIV Sequencing Recent Studies of HIV Materials and Methods Patient History Data Pre-processing Entropy Analysis Recombination Analysis Molecular Clock Estimation Poisson Fitter Sliding MinPD Rate of synonymous and nonsynonymous substitutions Results Entropy Analysis Molecular Clock Rate Estimation and Phylogenetic Tree Construction Founder Virus Analysis Demographic Reconstruction Sliding MinPD Conclusions Bibliography 40 i

3 Chapter 1 Introduction The earliest well documented incident of AIDS dates back to the 1980 s. Since then, more than 25 million people have died from AIDS [4]. The World Health Organization has now declared AIDS a pandemic and sincere efforts are underway around the world to identify an effective cure for this condition. 1.1 AIDS As the name suggests, Acquired Immunodeficiency Syndrome (AIDS) is a condition wherein the patient s immune system becomes severely compromised, enabling opportunistic infections such as pneumonia, tuberculosis, herpes, and others. These infections ultimately result in the death of the patient. Clinically, AIDS is described as an advanced stage in an infection caused by Human Immunodeficiency Virus (HIV) wherein the CD4+ cell count drops below the critical level. CD4+ cells are a special class of white blood cells which play a major role in recognizing foreign antigens (like bacteria and viruses) within the body [12]. In their absence, the immune system is not able to recognize and clear these foreign agents leading to sustained infections. HIV infection can only be contracted from another infected individual through exchange of body fluids like blood, genital fluids and breast milk. Chance exchange of these takes place when the individuals share needles of an injection or engage in unprotected sexual intercourse. The infection cannot be acquired by ingestion of the virus. An individual may remain HIV positive for several years before becoming an AIDS patient. After the onset of AIDS, the patient s life span shortens to 8 to 12 months [19]. Current therapies are able to significantly increase the life span of the infected individuals by delaying the onset of AIDS. Most of these 1

4 1.2. HIV therapies act by interfering with one of the crucial steps in replication, entry or release of the virus from the infected cell. However, due to the unusually high rate of mutation in HIV, it is able to quickly develop resistance against these therapies and flourish again. Hence there is presently no cure for AIDS [29]. AIDS is not a disease that targets a specific organ, but a condition characterized by progressive immune failure leading to infections in several organs. To develop an effective therapy, it is imperative to gain insight on the processes through which the virus evolves to establish a nonperishable population within the host. Of particular interest are the evolutionary pattern the virus undergoes while subjected to selective pressures of the host immune system, and the development of viral drug resistance. A detailed understanding of these processes may offer insight into the development of effective treatment strategies. 1.2 HIV The Human Immunodeficiency Virus belongs to the family of retroviruses [3], RNA viruses that use reverse transcriptase to encode their genetic material into DNA only within a host cell. It is known to cause Acquired Immunodeficiency Syndrome [34]. There are two prominent types known as HIV-1 and HIV-2, differing in their virulence, infectivity, and prevalence [15]. These have originated from the Simian Immunodeficiency Virus subtypes cpz and smm and infect chimpanzees and old world monkeys respectively. In this report we focus on HIV-1 as this is the virus type with which both patients were infected. Figure 1.1: Diagram of Human Immunodeficiency Virus [1] The virus has of two identical copies of the complete genome encoded on two positive single RNA strands and consists of nine genes encoding 19 vi- 2

5 1.2. HIV ral proteins. The viral core is composed of the Capsid Protein (CA, p24), Matrix protein (MA, p17) and P6. Following reverse transcription of the viral genome by reverse transcriptase subsequent to host infection, the newly produced DNA is incorporated into the host genome by viral integrase [25]. The pre-proteins encoded by the viral genome are converted to fully functioning HIV proteins by protease. Following host infection RNAse H breaks down the retroviral genome. The HIV genome codes for a series of proteins serving structural and regulatory functions. Structural proteins include gp120 which lies outside the virus particle and gp41 just inside the membrane, with gp41 serving as a membrane anchor for gp120. Tat (transactivator) is a regulatory gene that accelerates the production of viral progenies and is known to be a crucial protein for HIV replication. Rev stimulates the production of HIV proteins but suppresses the expression of other regulatory genes of HIV. Nef (negative replication factor gene) encodes for proteins that are exposed to the cytoplasm of the host cell and are necessary for viral spread and disease progression by down-regulating the CD4 count. Vif encodes the Viral Infectivity Factor found inside the virus and is responsible the rapid spread of the virus. Vpr (Viral protein R) accelerates the production of HIV proteins and interferes with the host cell cycle thus inhibiting the cell division. Vpu (Viral protein U) helps in assembling new virus particles, budding out from the host cell, and accelerates the degradation of CD4 proteins. Figure 1.2: HIV-1 genome HXB2 strain [17] HIV Life Cycle HIV enters lymphocytes by binding to the chemokine and CD4 receptors present on the cell surface [10, 40]. This binding is facilitated by viral gp160 protein (gp120 and gp41 proteins) [10, 40]. Following binding, the viral envelope fuses with the cell membrane and releases the HIV capsid into the cell. Besides CD4+ cells, HIV can also infect macrophages and dendritic cells [10, 40]. Once the viral capsid enters the cell, the viral RNA is reverse transcribed to a cdna molecule [25]. This process is facilitated by the reverse transcriptase enzyme which is extremely error prone and also lacks the proof reading 3

6 1.3. Longitudinal Studies capacity leading to a misincorporation rate of 10 4 to 10 5 per base or approximately one mis-incorporation per genome per replication cycle [38]. The cdna and its complement form a double stranded viral DNA which is then transported into the cell nucleus where it is subsequently integrated into the host genome with the help of integrase viral enzyme [25]. Once incorporated, viral DNA requires cellular transcription factors to encode for viral proteins. During viral replication, the pro-viral DNA is transcribed into mrna which is spliced and then transported to the cytoplasm where it is translated into viral proteins (mainly Tat and Rev Proteins). Rev protein accumulates in the nucleus and inhibits mrna splicing and the unspliced full length mrna leave the nucleus to enter the cytoplasm [32]. The full length mrna is actually the viral genome which binds to Gag protein and is packaged into new viral packets. After processing by the host endoplasmic reticulum gp160 is transported to the plasma membrane where gp41 anchors gp120 to the membrane of the infected host cell. The viral capsid then assembles and buds out of the cell to infect other cells [25]. The high rate of mutation in HIV is due to the high error rate of the reverse transcriptase enzyme while transcribing the viral RNA genome into a DNA sequence that can be incorporated in the host cell genome for coding viral proteins [5]. Along with a high misincorporation rate of approximately one base per replication, reverse transcriptase also lacks proof-reading activity rendering it unable to check and rectify copy mistakes, often resulting in several slightly different copies of HIV within a single patient. Distinct viral populations are referred to as quasispecies, and each genetically distinct individual is referred as a haplotype. 1.3 Longitudinal Studies Due to the prohibitively expensive nature of prospective HIV screening, HIV studies are generally only performed on high risk population groups, such as within a prison. Combined with additional ethical considerations, the result is that most studies enroll patients that are already symptomatic. In already symptomatic populations the viral load is already established and thus is of little insight into the dynamics of the pre-seroconversion phase, i.e. before any viral antibody production has taken place. To develop insight into the temporal evolution of the virus within a host, longitudinal studies of patient populations are of critical importance. Longitudinal studies combine data collected at multiple examinations at intervals between minutes and years to afford a more comprehensive insight into the viral dynamics than is possible through examinations at a single time point alone. 4

7 1.3. Longitudinal Studies Figure 1.3: HIV Life Cycle [6] While longitudinal studies are clearly desirable, they also present technical challenges such as censoring of events due to the relocation, death or disenrollment of cohort members. Additionally, changing patient habits and a lifestyle choice can complicate analysis. Lastly, longitudinal studies are necessarily more involving and therefore more expensive than single time-point studies. 5

8 1.4. HIV Sequencing 1.4 HIV Sequencing Traditional Sanger-based sequencing method only read the consensus genomic sequence of heterogeneous viral populations [36], obfuscating the genomic variability present in the population which is of potential importance for identifying gradually-fixating resistance mutations. Next Generation Sequencing (NGS) technologies represent an improvement over Sanger sequencing and facilitate the sequencing of distinct haplotypes within a sample [30]. However, NGS reads are error-prone and require sophisticated processing techniques to create error-free haplotype reconstruction and frequency estimation. Currently, haplotypes with frequencies as low as 0.05% can be estimated with 99% confidence [24]. 1.5 Recent Studies of HIV-1 In 1999 R. Shankarappa et al. conducted a landmark study investigating the evolution of HIV within an infected individual prior to the onset of AIDS [22]. They studied the evolution of C2-V5, a high mutation region of the HIV-1 env gene, in nine patients over six to twelve years. They estimated the viral diversity within and between time points, identified mutations conferring the viral strain any fitness advantage, and characterized the existence of three distinct phases: the early phase with linear increase in diversity and divergence from the founder virus strain, an intermediate phase with linear increase in divergence but stabilization or decline in diversity, and a late phase with stabilization in the divergence and continued decrease in diversity. More recently, Poon et al, used longitudinal deep sequencing data with coalescent analysis to estimate the date of HIV infection [26]. Time of infection was estimated using the time to most recent common ancestor (TMRCA) of a time-calibrated phylogenetic tree relating sequences from all time points. This is justified by the argument that most HIV infections are established by a single viral strain due to bottlenecks during transmission [39, 13]. 19 HIV positive individuals were followed and 7 genomic regions were analyzed. The authors compared the estimated time since infection from experimental methods to TMRCA estimates obtained with the BEAST software library. They observed a stronger correlation between clinical and computational estimates for TMRCA in highly variable regions of HIV genome (such as env) relative to that in conserved regions such as pol. The reduced correlation in the conserved regions is thought to be due to a possible overestimation of time scales due to the increased sensitivity of the coalescent based methods towards the sampled genetic variation. Consequently, sequences with high divergence were found to be ideal for calibrating the evolutionary clock. In the case of a multiple founder virus infection, this method was found to 6

9 1.5. Recent Studies of HIV-1 overestimate the infection time. In the same month another interesting study was published by a different author, Suzanne English, et al. [23]. This discussed the construction of the transmission history of HIV-1 infected individuals using Phylogenetic methods. It showed that the diversity is fairly limited in the early phase of the infection and is even compatible with the transmission of a single viral variant. It also provided evidence to support the idea that a single donor can in principle transmit two distinct variants to two different individuals in a small time span of few hours. The transmission history was constructed using the Bayesian and Maximum likelihood approach. Env, gag and pol regions were used for this analysis. The inter host genetic diversity predictions proportionately varied depending on the extent of conservation observed in the region used for its prediction. Highest diversity was predicted using env gene (least conserved) followed by pol gene and then gag gene. Transmission history was constructed using the inter-host variation observed in these three regions. BEAST software was used for carrying out this analysis. A temporal study on HIV-1 was undertaken by G. Achaz et al and published in 2004 [11]. This study was conducted using gag-pol sequence data collected over time from two chronically infected individuals to estimate the population structure and the neutral mutation rate of this region per site per generation. Neutral coalescent models were used for the analysis. For the genealogy construction, coalescent approach identical to the one proposed by Felsenstein 1999 was used. 19 time points collected over a period of 4 years were used for the analysis. This compensated for the low mutation rate in the sequences. A longitudinal study to understand the viral evolution in early Acute Hepatitis C Virus infection was carried out by Bull RA, Luciani F, McElroy K, Gaudieri S, Pham ST, et al. published in 2011 [9]. We discuss this paper in greater detail due to the parallelism with our study. This study aimed at identifying genetic variants as low as 0.1% frequency and subsequently quantify them over the course of infection. They also identified two sequential bottlenecks that occurred early in infection. BEAST software was used to estimate the changes in the effective population size of the virus population over time. It was also used to construct ancestor descendant relationships with the viral samples from different time points. The rate of evolution for the virus was also estimated during this analysis. In depth nonsynonymous and synonymous substitution analysis was carried out to identify any visible pattern of change. Entropy changes were measured across the whole genome and across patients which indicated non-uniform evolution of HCV across the genome and over time. Single founder virus hypothesis were tested for infection using the freely available tool called as Poisson Fitter on the HIV database. 7

10 1.5. Recent Studies of HIV-1 Several other time series data analysis on HIV positive patients have been carried out for identifying/understanding the order in which resistance mutations are accumulated when the patient is placed under a drug therapy. However, we do not discuss these since our patients did not show any drug resistance even after the therapy was discontinued. 8

11 Chapter 2 Materials and Methods 2.1 Patient History HIV samples were collected from two patients enrolled at the department of infectious diseases at Universitatspital Zurich. The protease coding region of HIV was deep-sequenced and analyzed. This region was chosen for the study since both the patients were treated with a protease inhibitor drug. Patient I.D.123 Figure 2.1: Viral load in patient I.D. 123 This patient was a part of the Primary HIV Infection study which emphasizes on beginning the treatment in the early phase of infection and then discontinuing it. It is based on the assumption that the patient is likely to control the virus when the treatment is started early. However, most patients 9

12 2.1. Patient History Table 2.1: Sample collection time points, patient I.D.123 Sr.No. Sample Name Sample Collection Date 1 PR PR PR PR suffer from a viral rebound, like patient 123. As can be seen in figure 2.1, four samples over a period of 3.74 years were sequenced from the patient after being tested as HIV positive. These samples have been marked in red. The regions marked as ART in figure 2.1 show the periods of treatment with Lopinavir, an anti-retroviral drug. The exact dates of sample collection can be seen in table 2.1 Patient I.D.181 Figure 2.2: Viral load in patient I.D.181 The patient remained untreated until almost an year after being tested HIV positive. During this phase, the viral load in blood plasma was regularly monitored. Samples from three time points marked with red in figure 2.2 were deep-sequenced and analyzed. The exact dates of sample collection have been mentioned in table

13 2.2. Data Pre-processing Table 2.2: Sample collection time points, Patient I.D.181 Sr.No. Sample Name Sample Collection Date 1 PR PR PR Data Pre-processing Haplotype reconstruction and error correction was performed using ShoRAH [24]. The output file contained haplotype sequences in FASTA format. The header of each haplotype contained two numbers. First number showed our confidence in the haplotype sequence on a scale of 0 to 1. The other number could be used to calculate the frequency of the haplotype in the sample population. It showed the number of times the sequences constituting the haplotype were sequenced. It is known as the average read number of a haplotype. These files often contained over a hundred sequences with only a few having a high read count and confidence. For a meaningful analysis, these files were filtered to select sequences with a confidence of over 0.9. This reduced the number of haplotypes to one quarter or less. The threshold was chosen to optimize the number of sequences for analysis. Too few sequences would not contain enough information for the analysis and too many would certainly add noise to the result. This cutoff returned a reasonable number of haplotypes. Since a functional protease is fundamental for HIV, any gaps present in the haplotype sequences were assumed to be sequencing errors. The haplotypes from a single run were used to first construct a consensus sequence using the Biopython EMBOSS tool known as Cons. Any gaps present in the consensus sequence were filled using HXB2 protease reference sequence. The consensus sequence was then used to fill the gaps present in the haplotype sequences. The reading frame of every haplotype was also ensured to start at the first nucleotide position. A python script was written for performing all the above tasks. 2.3 Entropy Analysis HIV constantly accumulates mutations to cope with the selective forces being exerted by the immune system and drug treatments. The nucleotide sites that accumulate these mutations are mainly responsible for rendering the virus resistant to different therapies. In order to improve our current 11

14 2.4. Recombination Analysis methods, it would be fruitful to identify these sites and also have an insight on how these sites maintain diversity in the viral population. For this purpose we calculate entropy for every time point dataset and try to identify any visible spatial or temporal patterns. Entropy of a position in a sequence depicts the uncertainty associated with the nucleotide present at the site. High entropy indicates that the site can have variable nucleotides. Let X be a discrete random variable ( bases while considering nucleotides, amino acids while considering proteins), taking a finite number of possible values x 1, x 2,..., x n with probabilities p 1, p 2,..., p n such that p i 0, i = 1, 2..., n and n i=1 p i = 1. The entropy is then given by H n (p 1, p 2,..., p n ) = n i=1 p i log b p i Here b is the base of the logarithm. A simple python script was written to calculate and plot the entropy at every position in the alignment, we used natural logarithm for our calculations. When deep-sequencing data was submitted as an input, the script could take into account the average number of reads while calculating the entropy at every position. 2.4 Recombination Analysis Recombination plays a crucial part in the evolution of retroviruses and is more prevalent in conserved regions [7]. Since we use a fairly conserved HIV region for our analysis, we performed a recombination detection study. If an alignment contains recombinant sequences then relationships between different segments of the alignment cannot be described using a single phylogenetic tree. To unfold the true evolutionary relationships, it is imperative to identify the recombination break points and partition the alignment into the number of observed recombinant sets and then depict the evolutionary relationships in each of these partitions using a separate phylogenetic tree. If recombination events are not taken into account during a phylogenetic analysis, then the results are most likely to be meaningless. We used Recombination Identification program [37] which has been developed to specifically detect recombinants in HIV-1 nucleotide sequences. It accepts a set of nucleotide sequences from a single viral genomic region collected from a single patient as an input. The program requires a background sequence which is essentially the consensus sequence of the genomic region that is to be analyzed. This can be selected from the available list in the program; alternatively the user is free to submit a consensus sequence with 12

15 2.5. Molecular Clock Estimation the nucleotide data. In the latter case, the consensus should be aligned to the rest of the sequences. This program detects recombinants by sliding a window of pre-specified length along the alignment and calculating the hamming distance of the query sequence from all other sequences. The best match within every window is qualified and the confidence in each match is calculated using a z-test. If two neighboring windows on the same sequence have best matches with different sequences then it is considered as a recombinant. The program implicitly assumes each site to be evolving independently but according to the same process. It also approximates the binomial distribution of the hamming distances by a normal distribution. 2.5 Molecular Clock Estimation This section closely follows The Evolutionary Analysis of Measurably Evolving Populations using Serially Sampled gene sequences by Allen Rodrigo, et al [21] and Estimating Divergence times by J.L Throne, H.Kishino [28] Our interest in estimating the rate of evolution comes from our desire to construct rooted, time scaled phylogenetic trees using serially sampled data. A phylogenetic tree using only contemporary sequences can be constructed using standard approaches like maximum parsimony and N-J method which assume that all the input sequences belong to a single time point and are therefore equally distant from the root of the tree [35]. This is not the case with serially sampled sequences and care must be taken to scale the branches according to their time of sampling. Rate of mutation is required for this scaling of branches. Unlike the standard tree construction techniques where the branch lengths are calculated using a composite parameter µt, where µ is the substitution rate and t is the sampling time, with serially sampled data these two parameters can be decoupled into time and substitution rate and the tree branches can be expressed in units of either. The rate of molecular evolution is an outcome of a complex interplay between the biological systems and their surroundings. Since these systems and their surroundings change over time, it is inherent that their evolutionary rates would also fluctuate. These fluctuations in rates over different periods are best described as the relaxed molecular clock. In the case of HIV, the rate of evolution is influenced by the rate of mutation, the generation length as well as the probability of fixation of the mutation in the viral population. All these factors depend intricately on the biology as well as the population size of HIV. When the population size fluctuates, so does the fixation probability of a mutation resulting in the change of selection pressure on the virus. Hence changes in the population size are necessary to be taken into account while deciphering phylogenetic relationships. This is done using co- 13

16 2.5. Molecular Clock Estimation alescent based models that use sequence data to determine the population genetic parameters (e.g. population size, etc) which in turn determines the shape of the genealogy. Coalescent theory describes the dependence of a phylogenetic tree that represents the shared ancestry of sampled genes (i.e. genealogies) on the change in population size and structure [33]. BEAST implements variable population size coalescent model which allows determining the past population dynamics. This option is known as the Bayesian skyline plot. It is a non-parametric model which makes use of the time calibrated sequence data to estimate demographic model parameters using the Bayesian methods [14]. It can estimate the evolutionary rate, substitution model parameters, phylogeny and ancestral population dynamics within a single run. It then plots the past population evolution over time. The plot begins from the estimated root age of the phylogenetic tree. It can also be argued that depending on the period of observation, the evolutionary rates can be assumed to be approximately constant implying a strict molecular clock. Such an assumption facilitates evolutionary studies but one should always keep in mind the scenarios when the weakness of this assumption out competes its convenience. The molecular clock model selection and rate estimation was performed using a Java based tool, BEAST [27]. It was the natural choice for performing the analysis since it implements substitution models, insertion deletion models, demographic models for performing a series of coherent analysis. It can also explicitly model the rate of molecular evolution on every branch of the phylogenetic tree. This rate can be constrained to be constant over all branches or can be allowed to freely vary along different lineages. This molecular clock model can be readily combined with other models that allow the rate of substitution to vary along the alignment while sharing some common parameters such as the rate of transition or transversion. Since several models can be combined, many unnecessary simplifying assumptions can be avoided. BEAST provides Bayesian framework for testing hypothesis on biological data. Its three main genera of analysis are constructing rooted and time measured phylogenies, estimating population change over time using coalescent based models and demo-geographic sequence analysis. We constructed time calibrated phylogenies after estimating the clock rate and population evolution plots, hence we will be discussing these two methods in detail. Demogeographic analysis uses the location of sample collection and includes this information while drawing statistical inferences. BEAST is one of the few available platforms which can deal with time stamped data and make use of relaxed or strict molecular clock models to construct rooted trees and calibrate internal node ages in absolute time scales. It makes use of the Metropolis-Hastings Markov Chain Monte Carlo 14

17 2.5. Molecular Clock Estimation algorithm to provide sample based estimates of the posterior distributions of the evolutionary parameters given a set of sequence data. It facilitates analysis of multi-locus data since the data can be appropriately partitioned and the evolutionary parameters can be linked/unlinked between partitions. This feature can be extremely helpful when dealing with viral sequences with genes e.g. Pol and Env which have different rates of mutations. In such a situation, the demographic model parameters can be shared between partitions assuming exponential or logistic growth while the substitution model parameters can be unlinked across different partitions. Model Summary The model first estimates a phylogenetic tree to explain the relationship between n contemporaneous sequences. This is the genealogy, denoted by g. The coalescent events are then assumed to occur only on internal nodes of the tree, i.e. there can be maximum of n 1 coalescent events occurring on the tree. The population might change or remain the same after the occurrence of a coalescent event. The indicator function I c (i) is used to denote whether the i th event is a coalescent. The times at which the coalescent events occur are denoted using a vector u = (u 1, u 2,..., u n 1 ). The period where the population size remains unchanged is called as an interval and the vector used to denote the number of coalescent events in each interval is A = (a 1, a 2,..., a m ). Here m is the total number of such intervals with 1 < m < n 1. The time at which each grouped interval ends is denoted by w = (w 1,..., w m ) and the vector of effective populations sizes is denoted using Θ = (θ 1, θ 2,..., θ m ). The vectors denoting the effective population size together with the genealogy g and the vector of number of coalescent events in each interval A constitute to the demographic and coalescent time parameters. The probability of the genealogy can be easily calculted and is denoted by f G (g θ, A). BEAST uses a fixed number of coalescent events m since the resulting posterior demographic function is consistent for a large range of its values. The vector of effective population size are sampled using a MCMC algorithm. Each new population size is sampled from a exponential distribution with a mean equal to the previous population size. This formulation represents our belief that the population size is autocorrelated through time. The posterior distribution sampled is the product of the likelihood of piecewise demographic model and the priors f het (Θ, A, Ω, g, µ D) = 1 Z Pr(D µ, g) f G(g Θ, A) f Θ (Θ 1 )X f A (A) f Ω (Ω) f µ (µ) where, f Θ (Θ 1 ) is the scale invariant prior for the first effective population size and the rest are drawn from an exponential distribution which is cen- 15

18 2.6. Poisson Fitter tered around the size of previous population. Ω contains the parameters of the substitution model and µ is the mutation rate that scales the genealogies (phylogenetic tree) from units of mutations per site to units of time. The sampled posterior distribution is the product of the likelihood of piecewise demographic model and the priors. If the sampled substitution model parameters and mutation rates are ignored, then we get a list of states associated with a genealogy and demographic parameters. Then the demographic history can be constructed as a piecewise function of time for each of the states. The marginal posterior distribution of the population size is calculated for each time point till the time to most recent common ancestor along with the 95% confidence interval that accounts for phylogenetic and demographic uncertainty. The population estimates are usually smooth due to the averaging effect of the sampling procedure in use. 2.6 Poisson Fitter Freely available on Studies in [39, 13] showed that HIV undergoes genetic bottlenecks when the mode of transmission is sexual (horizontal transfer) or mother to child (vertical transfer). This primarily results in new infections being initiated by homogeneous viral strains. Once the infection is established by a single viral strain, it is expected to grow exponentially until the host immune system initiates a response. This is a case of neutral evolution where the mutation counts are expected to follow a poisson process. Once the host immune system triggers a response against the infection or when the patient is placed under therapy, the virus population does not grow exponentially and the accumulated mutations are no longer random and the Poisson distribution cannot be used for describing the pairwise Hamming Distance frequency distribution. Poisson Fitter [16] analyzes a set of HIV sequences assumed to be collected close to the time of infection to estimate whether the infection was initiated by a single of a multiple founder viruses. It is based on maximum likelihood approach which first tests the hypothesis of a single founder virus strain initiating the infection and if this condition is met then the time of infection is estimated with 95% confidence interval, provided the sample has been drawn before the virus population was subject to any selection pressure. Poisson Fitter can read deep-sequencing datasets and so was the natural choice for performing this analysis. Another reason for selecting this tool was that it is specially designed for working with HIV and Hepatitis C virus datasets and makes use of their default substitution rate. It has been used in some other longitudinal studies that have been discussed in the literature review section. 16

19 2.7. Sliding MinPD This tool compares the sample genetic diversity with the diversity expected under the neutral growth model, i.e infection by a single viral strain accumulating random mutations, by performing statistical tests on the Hamming Distance and fitting a Poisson distribution to the same using the maximum likelihood method. It tests whether the phylogenetic tree for the sequences shows a star topology. The Poisson distribution shape parameter is then found to be λ = n i=0 iy i i=0 n Y = E(Y) i where Y = (Y 0,..., Y n ) are the number of pairs of sequences that have a hamming distance equal to the subscript n. The model assumes a generation time of 2 days and a mutation rate of per site per replication with a basic reproductive ratio R 0 = 6 based on the findings from [39, 13, 18]. When the sequence data shows a star phylogeny, one finds that E(Y i ) = Y i. Once this condition is satisfied, the age of the root of the tree is the same as the time of HIV transmission to the patient. When the goodness of fit is low, it might indicate that the sample was collected after the initiation of the selection pressures or the infection was initiated by multiple founder viruses. Deep sequencing data can also be analyzed using Poisson Fitter and the plots are then on a log scale since the number of identical sequences are much more than the ones that differ and this information gets masked on a linear scale. 2.7 Sliding MinPD This section describes the methods in [8] The traditional phylogenetic approaches treat all the sequence data as contemporaneous data and deal with serially sampled data by merely scaling the tips of the leaves. These methods are also not able to account for recombination events. Furthermore, when the data is collected from quickly evolving viruses which exhibit complex substitution patterns, phylogenetic trees are not able to depict all the information. In such a situation, an evolutionary network can be used to depict the ancestor descendant relationships and recombination events. Sliding MinPD constructs an evolutionary network using serially sampled data and detects recombination events using a sliding window approach. It is based on the minimum pairwise distance approach combined with the sliding window method and recombination detection techniques. The algorithm consists of three phases. In the first phase, every sequence that does not belong to the first time point is deemed as the query sequence and its pairwise distance is calculated against every other sequence from 17

20 2.8. Rate of synonymous and nonsynonymous substitutions the previous time point. In the second phase, the breakpoints in the recombinant sequences and their donor sequences are identified using the sliding window approach where the best match is identified for every window along the alignment. In the final step, potential ancestors from previous time points are identified. For the non-recombinant sequences these are the ones which had the shortest calculated distance in first step. The results of this program were found to be extremely sensitive to the specified window length. Hence the analysis was carried out for only a single patient. 2.8 Rate of synonymous and nonsynonymous substitutions This section closely follows [31], the chapter Neutral and adaptive protein evolution by Ziheng Yang in [41] and the Hypothesis Testing for Phylogenies manual [20] The rate of nonsynonymous and synonymous substitution provides an insight on the type of selection pressure acting on the viral population. When the ratio of the rates of nonsynonymous and synonymous substitutions is greater than one for a genomic region, then that region is said to be under positive selection, e.g. when a HIV patient is placed under a drug therapy, the virus shows concerted substitutions towards acquiring a particular residue which eventually fixates in the population making the virus drug resistant. This type of evolution is known as positive directional selection. Another kind of positive selection is to maintain the amino acid diversity at certain sites which are potential targets of the host immune system. This is commonly known as diversifying positive selection. When the genomic region accumulates synonymous and nonsynonymous substitutions at the same rate, then it is said to be under neutral evolution. In negative selection the rate of nonsynonymous substitutions is much lower than that of synonymous substitutions causing selective removal of alleles that are deleterious. It is also commonly known as purifying selection. A substitution behaves synonymous or nonsynonymous depending on the codon in which it occurs and on the position within the codon. For example, GGX GGY is always a synonymous substitution whereas CAX CAY is synonymous if X Y is a transition and nonsynonymous otherwise. Hence while dealing with coding sequences, it is always meaningful to use codons as the units for selection analysis. We used Mega software for calculating the rate of synonymous and nonsynonymous mutations which is based on the method described by M. Nei and T. Gojobori in Here we describe the method in detail. First the 18

21 2.8. Rate of synonymous and nonsynonymous substitutions number of synonymous and nonsynonymous sites for each codon present in the sequence is computed. Let S be the number of synonymous sites for each codon S= 3 i=1 f i, where f i is the fraction of synonymous changes at the i th position in a codon. Then the number of non-synonymous sites S for each codon can be calculated as N= 3 S This can be understood by a simple example. codes for leucine In the case of TTA which f 1 = 1 3 (T C), f 2 = 0, f 3 = 1 3 (A G) and so S = 2 3, N = 7 3 The total number of synonymous and nonsynonymous sites in a sequence of r codons is therefore given by S = r i=1 f i and N = (3r S). The number of nonsynonymous and synonymous nucleotide differences between a pair of sequences is calculated by comparing the sequences codon by codon and counting the number of synonymous and nonsynonymous nucleotide differences for each pair of compared codons. This can be easily done when the codons are differing at only a single position. When they differ at two nucleotide positions then there are two possible ways through which this difference could have occurred. Both the paths are considered then with equal probability and the number of synonymous and nonsynonymous substitutions are counted and S d andn d are updated. For example: If TTT codon is compared against GTA, then the two pathways are 1. TTT(Phe) GTT(Val) GTA(Val), one synonymous and one nonsynonymous substitution 2. TTT(Phe) TTA(Leu) GTA(Val), two nonsynonymous substitution The value of S d becomes 0.5 and N d becomes 1.5 respectively. Similarly, when there are three nucleotide differences then six possible pathways between the codons with three mutational steps within each pathway are considered. The proportion of synonymous and nonsynonymous differences are then calculated using the equations p s = S d S and p n = N d N where S and N are the average number of synonymous and nonsynonymous sites for the two compared sequences. Further the per site substitutions are calculated using the Jukes and Cantor (1969) formula [31]: d = 3 4 ln(1 4 3 p) Where p is p s and p n for synonymous and nonsynonymous substitutions respectively. This method gives approximate estimates and the formula is not applicable to two and threefold degenerate sites. The program used 19

22 2.8. Rate of synonymous and nonsynonymous substitutions by us for this analysis makes use of this method for calculating the rate of synonymous and nonsynonymous changes. 20

23 Chapter 3 Results 3.1 Entropy Analysis Patient I.D. 123 The entropy was calculated and plotted for every position in the alignment for all the four datasets, as shown in figures 3.1, 3.2, 3.3 and 3.4. A set of constant peaks can be observed around position 50 and 290 in all the four plots. The sequence region around these sites was explored and summarized in table 3.1. The base number shows the nucleotide position whose neighboring sequence is being viewed. The following four columns show the entropy of the base at different time points. The last column shows the neighboring sequence of the base. Constant high entropies were found in the homo-polymeric regions. These were most likely sequencing errors since the 454 sequencing technique (used in our analysis) is known to suffer from high base mis-incorporation rate in the homopolymeric regions. These regions of the alignment were manually curated to remove the anomalies. Figure 3.1: Patient I.D. 123: Entropy plot for samples collected in

24 3.1. Entropy Analysis Figure 3.2: Patient I.D. 123: Entropy plot for samples collected in 2005 Figure 3.3: Patient I.D. 123: Entropy plot for samples collected in 2006 Figure 3.4: Patient I.D. 123: Entropy plot for samples collected in

25 3.2. Molecular Clock Rate Estimation and Phylogenetic Tree Construction Table 3.1: Patient I.D. 123: Neighboring sequences of high entropy sites Base No. Time1 Time2 Time3 Time4 Sequence preceeding the site GGGGGG (43-48) GGGGGGC (43-49) TTTAAATTTT ( ) In general, sequences from the first time point showed the highest entropy measure which gradually decreased over time. Patient I.D. 181 High entropy was measured in the several regions including few homopolymeric regions. Regions suspected with sequencing errors have been listed in the table 3.2. These region were manually corrected to remove the sequencing errors. There was no spatial or temporal trend observed for change in entropy. Table 3.2: Patient I.D. 181: Neighboring sequences of high entropy sites Base No. Time1 Time2 Time3 Sequence preceding the site AAACCAAAAA ( ) AAACCAAAAA ( ) TTTAAATTTT ( ) 3.2 Molecular Clock Rate Estimation and Phylogenetic Tree Construction Sequences from all the data points were used to simultaneously estimate the substitution rate and for constructing the phylogenetic tree depicting ancestral descendant relationship between sequences from all time points. BEAST software was used for this analysis. After performing a number of test runs to understand the effect of every parameter on the run, the following setting was found to provide the optimal results in terms of high log-likelihood value of the estimated parameters, fast convergence of the MCMC chain and low standard deviation in the distribution of parameters. The phylogenetic tree construction runs for both the patients were performed with the settings specified in table 3.3. The priors specified for the BEAST run have been summarized in table 3.4. The operator setting used to explore the sample space for the parameters has been summarized in table

26 3.2. Molecular Clock Rate Estimation and Phylogenetic Tree Construction Table 3.3: Patient I.D. 123 and 181: Analysis Settings. fig: 3.7, 3.9 Site Models Parameter 1. Substitution Model HKY 2. Base Frequencies Estimated 3. Site Heterogeneity Model Gamma+Invariant Sites 4. Partition in codon position None Results shown in Clock Model 1. Model Strict Clock 2.Estimate Rate Yes Demographic model 1. Tree Prior Constant size 2. Starting Tree Randomly Generated Table 3.4: Patient I.D. 123 and 181: Priors for the BEAST run Parameter Prior Bound Description Kappa Lognormal[1,1.25] [0,inf] HKY transition transversion parameter Frequencies uniform[0,1] [0,1] base frequencies alpha uniform[0,10] [0,1000] gammma base frequencies pinv uniform[0,1] [0,1] proportion of invariant sites parameter clock.rate uniform[5.4e-5,1] [0,inf] substitution rate rootheight Using Tree Prior [3.761,inf] root height of the tree const.popsize 1/x [0,inf] coalescent population size parameter Let us now briefly discuss how the current choice of parameters was formulated. A series of runs were set up with parameter rich substitution models like general time reversible model. These initial runs took long to converge. As a result, simpler substitution model was selected for the analysis like HKY model. This made a significant difference in the decreasing the convergence time of the chain. All initial test runs were made with the uncorrelated lognormal clock which draws the rate of each branch from an underlying lognormal distribution. The standard deviation (i.e. ucld.stdev parameter) estimate of the clock rate was close to zero for most of these runs. A value close to zero for this parameter indicated clock-like behavior of the dataset [2]. Thereafter a set of runs were made with different coalescent models like expansion growth model, exponential growth model and constant growth model. The Bayes Factor was used for selecting the best fitting model. For 24

27 3.2. Molecular Clock Rate Estimation and Phylogenetic Tree Construction Table 3.5: Choice of operator values for the rate estimation and phylogenetic tree construction, Patient I.D. 123 and 181 Operates on Type Tuning Weight Description Kappa scale HKY transition-transversion parameter of partition Frequencies deltaexchange frequencies Alpha scale gamma shape parameter of partition Clock.rate scale substitution rate of partition Tree subtreeslide Performs the subtree-slide rearrangement of the tree Tree wideexchange n/a 3.0 Performs global rearrangements of the tree Tree wilsonbalding n/a 3.0 Performs the Wilson-Balding rearrangement of the tree Tree narrowexchange n/a 15.0 Performs local rearrangements of the tree rootheight scale root height of the tree of partition Internal node uniform n/a 30.0 Draws new internal node heights heights uniformly popsize scale coalescent population size parameter of partition growthrate randomwalk exponential.growthrate Substitution rate and heights updown Scales substitution rates inversely to node heights of the tree both the patients, the constant population model could not be rejected. Table 3.6 shows the statistics for this parameter. Table 3.7 summarizes the statistics for patient I.D The substitution rate in HIV protease coding region for this patient was found to be slightly higher. The shape of the distribution was similar to the one of patient I.D The phylogenetic tree topologies and parameters were also sampled over the MCMC chain. Tree and parameter values were logged once every 10,000 steps and the chain was run till the effective sample size, i.e the number of independent draws from the posterior distribution exceeded 250 [2]. In our case the effective sample sizes were well over 1000 for the relevant parameters. Figure 3.7 shows the ancestor descendant relationship between sequences 25

28 3.2. Molecular Clock Rate Estimation and Phylogenetic Tree Construction Table 3.6: Patient I.D. 123: Clock rate statistics mean E-3 stderr of mean 1.922E-5 median E-3 geometric mean E-3 95% HPD lower E-4 95% HPD upper E-3 auto-correlation time (ACT) effective sample size (ESS) Figure 3.5: Patient I.D. 123: Substitution rate distribution. 95% confidence interval has been marked in blue. The red region shows the rate sampling outside the interval Frequency E-3 0 1E-3 2E-3 3E-3 4E-3 5E-3 clock.rate Figure 3.6: Patient I.D. 181: Substitution rate distribution. The 95% confidence interval has been marked in blue. The red region shows the rate sampling outside the interval Frequency E E-3 5E-3 7.5E-3 1E E-2 1.5E E-2 2E-2 clock.rate 26

29 3.2. Molecular Clock Rate Estimation and Phylogenetic Tree Construction Table 3.7: Patient I.D. 181: Clock rate statistics mean E-3 stderr of mean E-5 median E-3 geometric mean E-3 95% HPD lower E-4 95% HPD upper E-2 auto-correlation time (ACT) effective sample size (ESS) collected at different times from patient I.D Most of the sequences from the first time point share nodes with the sequences from the second and the third time point. However, the sequences from the final time point (i.e. the ones collected in 2007) appear like a segregated clade from the rest of the tree showing only a faint relationship with low frequency haplotypes from the first time point. In a more rigorous analysis, it was found that haplotype number 25 (not shown in this tree as the frequency of the haplotype was 0.02%) from the first sampling point was identical to a low frequency haplotype number 5 from the final time point. None of other sequences from all the other time points were found 100% identical to any of the haplotypes from the final time point. There were a series of identical haplotypes found over time. Starting from one side of the tree, we see that haplotype 6 from 2003 branches out to haplotype 2 of 2005 which further branches to haplotype 3 from These three sequences were found to be identical over time and their frequencies constantly dwindled between 0.8% to 1% after which the haplotype was not visible. There were 9 such haplotypes from the first time point that appeared again in the later stages. These have been shown in the figure 3.8. We see in this figure that the frequency of the haplotypes decreases from the first to the second time point but it again increases in the next time point. The sequences from the final time point show little similarity with the ones from previous time points. A pairwise sequence alignment was performed between the majority haplotype with a frequency of 65% from the the third time point and the haplotype from the final time point with a frequency of 86%. These two sequences showed 97% similarity. Another tree incorporating haplotypes with frequency as low as 0.2% was constructed to trace any relationship of the sequences from the final time point with low frequency variants from the second and the third time point. However there were no variants from 2005 and 2006 sharing a node with the sequences sampled in From a total of eight haplotypes with frequency greater than 0.5% in the first 27

30 3.3. Founder Virus Analysis time point, four were found to be present in the second time point. Note that the second sample was collected after almost an year of the anti-retroviral therapy. This leads us to conclude that the viral haplotypes were latent during the therapy period but were quick to rebound when the therapy was stopped. So, even though an year passed in terms of absolute time scale nothing much happened in terms of evolutionary time scale. For patient I.D. 181, the haplotypes with frequency greater than 1% progressively increased over time. While there were only 3 haplotypes from the first time point with frequency greater than 1%, the number changed to 23 for the final time point. All the sequences from the first time point could be traced over the three time points. The haplotype with the highest frequency from the first time point remained to be the one with highest frequency over the next two time points as well but the frequency decreased over time from 87% to 38%. This can be clearly seen in figure Founder Virus Analysis The analysis was carried out to detect if the infection was initiated by a single founder virus haplotype. This knowledge is often necessary in tracing the source of infection. It is also useful for explaining the pattern of genetic diversity observed in the patient. If the phylogenetic tree constructed using sequences from only the first time point shows a star like phylogeny then the infection is likely to have been initiated by a single virus. When we see distinct clades in the tree, the infection can be assumed to be initiated by multiple founder viruses. Another reason for the phylogenetic tree to not show a star like topology can be that the sequences are from a time where the immune selection already started shaping the viral evolution. Care must be taken to use samples that have been collected from a time very close to the time of infection so that the intra-sample diversity is not higher than 10%. When the sample shows a star like phylogeny with low intra-sample diversity, then the time of infection can be estimated using the Poisson Fitter tool. Figure 3.11 and 3.12 show the phylogenetic trees constructed using the samples from the first time points for patient I.D. 123 and 181 respectively. These trees clearly do not show a star topology that would indicate a single founder virus infection. Since the exact dates of infection are unknown, it might be the case that the samples used for the analysis were from viruses that were evolving under selective pressure exerted by the immune system. The sample from patient 181 that was used for the analysis was collected after a few months of infection ( this can be seen by looking at the patient infection time line shown in figure 2.2), hence this result might be misleading. We see a single virus with frequency equal to 86% in the first time point. It 28

31 3.3. Founder Virus Analysis Figure 3.7: Patient I.D. 123: Phylogenetic Tree showing relationship between sequences with frequency greater than 0.5%.Blue marks sequences from 2003, Green marks sequences from 2005, While red and orange show sequences from 2006 and 2007 respectively 29

32 3.3. Founder Virus Analysis Figure 3.8: Patient I.D.123: Haplotypes traced over time with their measured frequencies 30

33 3.3. Founder Virus Analysis Figure 3.9: Patient I.D. 181: Phylogenetic Tree showing relationship between sequences with frequency greater than 0.5%.Blue marks sequences sampled on , Green marks sequences from , While orange shows sequences sampled on

34 3.3. Founder Virus Analysis Figure 3.10: Patient I.D. 181: Haplotypes traced over time with their measured frequencies 32

35 3.3. Founder Virus Analysis Figure 3.11: Patient I.D. 123: Phylogenetic tree with sequences from 2003 with frequency greater than 0.5% is likely that the infection was started by a single founder virus but due to the selective immune pressure, the founder virus showed a divergent evolutionary pattern. Poisson Fitter analysis showed that the hamming distance distribution of the first time point sample did not confirm with the distribution expected under neutral evolution from a single founder virus. Another explanation for the observation could be that the samples were not from a time point close to the initiation of infection. Figure 3.12: Patient I.D. 181: Phylogenetic tree with sequences from 2005 with frequency greater than 0.5% 33

36 3.4. Demographic Reconstruction Figure 3.13: Patient I.D. 123: Demographic construction of the viral population with frequency greater than 0.5% 3.4 Demographic Reconstruction The effective population size of the virus was plotted over the period of infection for patient I.D. 123 figure 3.13 and patient 181 figure The time points where the sequences were collected have been marked in the plot. We see a slight increase in the viral population over time during course of infection for patient I.D The results of this analysis were robust to the use of different model settings. This indicated that the sequence data was informative and not too sensitive towards slight model mis-specification. However, since the model could not be informed about the anti-retroviral therapy period, we do not know if the results would be sensitive to this information. As previously concluded from figure 3.8 that the virus was under a latency period during the therapy, one would assume that the results of the demographic should not be affected by the therapy phase. For patient I.D. 181, we see no change in the effective population size of the virus over the first year of infection. The times of sample collection have been marked in the figure As a proof of principle of the coalescent model used in our study, another analysis was performed on a sequence dataset collected over a period of 3 years from an influenza epidemic. This demographic plot was in agreement with the variation observed during the epidemics. The data contained sequences of length 1700 base pairs and there were samples collected over every few months giving a high coverage over a period of three years. The results are not shown since these are freely available on the BEAST website. 34

37 3.5. Sliding MinPD Figure 3.14: Patient I.D. 181: Demographic construction of the viral population with frequency greater than 0.5% 3.5 Sliding MinPD The evolutionary network was constructed for the patient I.D. 123 using the Sliding MinPD software. Sequences from 2005, 2006 and 2007 were used in the analysis. The samples from the first time point were ignored since the majority haplotypes were same between the first and the second time point. A set of runs were set up with differing window sizes and sliding window sizes. The results of the runs were found to be sensitive to slight changes in the window length. Due to lack of confidence in the results of this analysis, this was not carried out for patient I.D.181. The results of the run have been shown in figure 3.15 and The parameters for the runs except the window and the sliding window size has been mentioned in table 3.8. Table 3.8: Parameters used for constructing the evolutionary network using Sliding MinPD Active Recombination Detection Yes Recombination Detection Option Bootscan RIP Crossover option Many PCC threshold p0.4 bootstrap recomb. tiebreaker option Yes bootscan seed E-3 bootscan threshold TN93 substitution model gamma shape - rate heterogeneity alpha 0.5 show bootstrap values No markers for clustering Yes clustering distance threshold T0.001 clustering option by bases but post 35

Fayth K. Yoshimura, Ph.D. September 7, of 7 HIV - BASIC PROPERTIES

Fayth K. Yoshimura, Ph.D. September 7, of 7 HIV - BASIC PROPERTIES 1 of 7 I. Viral Origin. A. Retrovirus - animal lentiviruses. HIV - BASIC PROPERTIES 1. HIV is a member of the Retrovirus family and more specifically it is a member of the Lentivirus genus of this family.

More information

Human Immunodeficiency Virus

Human Immunodeficiency Virus Human Immunodeficiency Virus Virion Genome Genes and proteins Viruses and hosts Diseases Distinctive characteristics Viruses and hosts Lentivirus from Latin lentis (slow), for slow progression of disease

More information

(ii) The effective population size may be lower than expected due to variability between individuals in infectiousness.

(ii) The effective population size may be lower than expected due to variability between individuals in infectiousness. Supplementary methods Details of timepoints Caió sequences were derived from: HIV-2 gag (n = 86) 16 sequences from 1996, 10 from 2003, 45 from 2006, 13 from 2007 and two from 2008. HIV-2 env (n = 70) 21

More information

Citation for published version (APA): Von Eije, K. J. (2009). RNAi based gene therapy for HIV-1, from bench to bedside

Citation for published version (APA): Von Eije, K. J. (2009). RNAi based gene therapy for HIV-1, from bench to bedside UvA-DARE (Digital Academic Repository) RNAi based gene therapy for HIV-1, from bench to bedside Von Eije, K.J. Link to publication Citation for published version (APA): Von Eije, K. J. (2009). RNAi based

More information

BEAST Bayesian Evolutionary Analysis Sampling Trees

BEAST Bayesian Evolutionary Analysis Sampling Trees BEAST Bayesian Evolutionary Analysis Sampling Trees Introduction Revealing the evolutionary dynamics of influenza This tutorial provides a step-by-step explanation on how to reconstruct the evolutionary

More information

Julianne Edwards. Retroviruses. Spring 2010

Julianne Edwards. Retroviruses. Spring 2010 Retroviruses Spring 2010 A retrovirus can simply be referred to as an infectious particle which replicates backwards even though there are many different types of retroviruses. More specifically, a retrovirus

More information

LESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication

LESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication DEFINITIONS OF TERMS Eukaryotic: Non-bacterial cell type (bacteria are prokaryotes).. LESSON 4.4 WORKBOOK How viruses make us sick: Viral Replication This lesson extends the principles we learned in Unit

More information

LESSON 4.6 WORKBOOK. Designing an antiviral drug The challenge of HIV

LESSON 4.6 WORKBOOK. Designing an antiviral drug The challenge of HIV LESSON 4.6 WORKBOOK Designing an antiviral drug The challenge of HIV In the last two lessons we discussed the how the viral life cycle causes host cell damage. But is there anything we can do to prevent

More information

For all of the following, you will have to use this website to determine the answers:

For all of the following, you will have to use this website to determine the answers: For all of the following, you will have to use this website to determine the answers: http://blast.ncbi.nlm.nih.gov/blast.cgi We are going to be using the programs under this heading: Answer the following

More information

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants Pandit and de Boer Retrovirology 2014, 11:56 RESEARCH Open Access Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

More information

7.014 Problem Set 7 Solutions

7.014 Problem Set 7 Solutions MIT Department of Biology 7.014 Introductory Biology, Spring 2005 7.014 Problem Set 7 Solutions Question 1 Part A Antigen binding site Antigen binding site Variable region Light chain Light chain Variable

More information

Lecture 2: Virology. I. Background

Lecture 2: Virology. I. Background Lecture 2: Virology I. Background A. Properties 1. Simple biological systems a. Aggregates of nucleic acids and protein 2. Non-living a. Cannot reproduce or carry out metabolic activities outside of a

More information

Distinguishing epidemiological dependent from treatment (resistance) dependent HIV mutations: Problem Statement

Distinguishing epidemiological dependent from treatment (resistance) dependent HIV mutations: Problem Statement Distinguishing epidemiological dependent from treatment (resistance) dependent HIV mutations: Problem Statement Leander Schietgat 1, Kristof Theys 2, Jan Ramon 1, Hendrik Blockeel 1, and Anne-Mieke Vandamme

More information

Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution.

Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution. Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution. Brian T. Foley, PhD btf@lanl.gov HIV Genetic Sequences, Immunology, Drug Resistance and Vaccine Trials

More information

An Evolutionary Story about HIV

An Evolutionary Story about HIV An Evolutionary Story about HIV Charles Goodnight University of Vermont Based on Freeman and Herron Evolutionary Analysis The Aids Epidemic HIV has infected 60 million people. 1/3 have died so far Worst

More information

Finding protein sites where resistance has evolved

Finding protein sites where resistance has evolved Finding protein sites where resistance has evolved The amino acid (Ka) and synonymous (Ks) substitution rates Please sit in row K or forward The Berlin patient: first person cured of HIV Contracted HIV

More information

HIV INFECTION: An Overview

HIV INFECTION: An Overview HIV INFECTION: An Overview UNIVERSITY OF PAPUA NEW GUINEA SCHOOL OF MEDICINE AND HEALTH SCIENCES DIVISION OF BASIC MEDICAL SCIENCES DISCIPLINE OF BIOCHEMISTRY & MOLECULAR BIOLOGY PBL MBBS II SEMINAR VJ

More information

Phylogenetic Methods

Phylogenetic Methods Phylogenetic Methods Multiple Sequence lignment Pairwise distance matrix lustering algorithms: NJ, UPM - guide trees Phylogenetic trees Nucleotide vs. amino acid sequences for phylogenies ) Nucleotides:

More information

Virology Introduction. Definitions. Introduction. Structure of virus. Virus transmission. Classification of virus. DNA Virus. RNA Virus. Treatment.

Virology Introduction. Definitions. Introduction. Structure of virus. Virus transmission. Classification of virus. DNA Virus. RNA Virus. Treatment. DEVH Virology Introduction Definitions. Introduction. Structure of virus. Virus transmission. Classification of virus. DNA Virus. RNA Virus. Treatment. Definitions Virology: The science which study the

More information

HIV Immunopathogenesis. Modeling the Immune System May 2, 2007

HIV Immunopathogenesis. Modeling the Immune System May 2, 2007 HIV Immunopathogenesis Modeling the Immune System May 2, 2007 Question 1 : Explain how HIV infects the host Zafer Iscan Yuanjian Wang Zufferey Abhishek Garg How does HIV infect the host? HIV infection

More information

Rama Nada. - Malik

Rama Nada. - Malik - 2 - Rama Nada - - Malik 1 P a g e We talked about HAV in the previous lecture, now we ll continue the remaining types.. Hepatitis E It s similar to virus that infect swine, so its most likely infect

More information

numbe r Done by Corrected by Doctor

numbe r Done by Corrected by Doctor numbe r 5 Done by Mustafa Khader Corrected by Mahdi Sharawi Doctor Ashraf Khasawneh Viral Replication Mechanisms: (Protein Synthesis) 1. Monocistronic Method: All human cells practice the monocistronic

More information

The BLAST search on NCBI ( and GISAID

The BLAST search on NCBI (    and GISAID Supplemental materials and methods The BLAST search on NCBI (http:// www.ncbi.nlm.nih.gov) and GISAID (http://www.platform.gisaid.org) showed that hemagglutinin (HA) gene of North American H5N1, H5N2 and

More information

Tutorial using BEAST v2.4.6 Prior-selection Veronika Bošková, Venelin Mitov and Louis du Plessis

Tutorial using BEAST v2.4.6 Prior-selection Veronika Bošková, Venelin Mitov and Louis du Plessis Tutorial using BEAST v2.4.6 Prior-selection Veronika Bošková, Venelin Mitov and Louis du Plessis Prior selection and clock calibration using Influenza A data. 1 Background In the Bayesian analysis of sequence

More information

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein

Exploring HIV Evolution: An Opportunity for Research Sam Donovan and Anton E. Weisstein Microbes Count! 137 Video IV: Reading the Code of Life Human Immunodeficiency Virus (HIV), like other retroviruses, has a much higher mutation rate than is typically found in organisms that do not go through

More information

MedChem 401~ Retroviridae. Retroviridae

MedChem 401~ Retroviridae. Retroviridae MedChem 401~ Retroviridae Retroviruses plus-sense RNA genome (!8-10 kb) protein capsid lipid envelop envelope glycoproteins reverse transcriptase enzyme integrase enzyme protease enzyme Retroviridae The

More information

Viral Genetics. BIT 220 Chapter 16

Viral Genetics. BIT 220 Chapter 16 Viral Genetics BIT 220 Chapter 16 Details of the Virus Classified According to a. DNA or RNA b. Enveloped or Non-Enveloped c. Single-stranded or double-stranded Viruses contain only a few genes Reverse

More information

The Swarm: Causes and consequences of HIV quasispecies diversity

The Swarm: Causes and consequences of HIV quasispecies diversity The Swarm: Causes and consequences of HIV quasispecies diversity Julian Wolfson Dept. of Biostatistics - Biology Project August 14, 2008 Mutation, mutation, mutation Success of HIV largely due to its ability

More information

Rajesh Kannangai Phone: ; Fax: ; *Corresponding author

Rajesh Kannangai   Phone: ; Fax: ; *Corresponding author Amino acid sequence divergence of Tat protein (exon1) of subtype B and C HIV-1 strains: Does it have implications for vaccine development? Abraham Joseph Kandathil 1, Rajesh Kannangai 1, *, Oriapadickal

More information

Under the Radar Screen: How Bugs Trick Our Immune Defenses

Under the Radar Screen: How Bugs Trick Our Immune Defenses Under the Radar Screen: How Bugs Trick Our Immune Defenses Session 7: Cytokines Marie-Eve Paquet and Gijsbert Grotenbreg Whitehead Institute for Biomedical Research HHV-8 Discovered in the 1980 s at the

More information

Sequence Analysis of Human Immunodeficiency Virus Type 1

Sequence Analysis of Human Immunodeficiency Virus Type 1 Sequence Analysis of Human Immunodeficiency Virus Type 1 Stephanie Lucas 1,2 Mentor: Panayiotis V. Benos 1,3 With help from: David L. Corcoran 4 1 Bioengineering and Bioinformatics Summer Institute, Department

More information

Fayth K. Yoshimura, Ph.D. September 7, of 7 RETROVIRUSES. 2. HTLV-II causes hairy T-cell leukemia

Fayth K. Yoshimura, Ph.D. September 7, of 7 RETROVIRUSES. 2. HTLV-II causes hairy T-cell leukemia 1 of 7 I. Diseases Caused by Retroviruses RETROVIRUSES A. Human retroviruses that cause cancers 1. HTLV-I causes adult T-cell leukemia and tropical spastic paraparesis 2. HTLV-II causes hairy T-cell leukemia

More information

I. Bacteria II. Viruses including HIV. Domain Bacteria Characteristics. 5. Cell wall present in many species. 6. Reproduction by binary fission

I. Bacteria II. Viruses including HIV. Domain Bacteria Characteristics. 5. Cell wall present in many species. 6. Reproduction by binary fission Disease Diseases I. Bacteria II. Viruses including are disease-causing organisms Biol 105 Lecture 17 Chapter 13a Domain Bacteria Characteristics 1. Domain Bacteria are prokaryotic 2. Lack a membrane-bound

More information

Human Immunodeficiency Virus. Acquired Immune Deficiency Syndrome AIDS

Human Immunodeficiency Virus. Acquired Immune Deficiency Syndrome AIDS Human Immunodeficiency Virus Acquired Immune Deficiency Syndrome AIDS Sudden outbreak in USA of opportunistic infections and cancers in young men in 1981 Pneumocystis carinii pneumonia (PCP), Kaposi s

More information

YUMI YAMAGUCHI-KABATA AND TAKASHI GOJOBORI* Center for Information Biology, National Institute of Genetics, Mishima , Japan

YUMI YAMAGUCHI-KABATA AND TAKASHI GOJOBORI* Center for Information Biology, National Institute of Genetics, Mishima , Japan JOURNAL OF VIROLOGY, May 2000, p. 4335 4350 Vol. 74, No. 9 0022-538X/00/$04.00 0 Copyright 2000, American Society for Microbiology. All Rights Reserved. Reevaluation of Amino Acid Variability of the Human

More information

Search for the Mechanism of Genetic Variation in the pro Gene of Human Immunodeficiency Virus

Search for the Mechanism of Genetic Variation in the pro Gene of Human Immunodeficiency Virus JOURNAL OF VIROLOGY, Oct. 1999, p. 8167 8178 Vol. 73, No. 10 0022-538X/99/$04.00 0 Copyright 1999, American Society for Microbiology. All Rights Reserved. Search for the Mechanism of Genetic Variation

More information

HIV-1 acute infection: evidence for selection?

HIV-1 acute infection: evidence for selection? HIV-1 acute infection: evidence for selection? ROLLAND Morgane University of Washington Cohort & data S6 S5 T4 S4 T2 S2 T1 S1 S7 T3 DPS (days post symptoms) 3 (Fiebig I) 7 (Fiebig I) 13 (Fiebig V) 14 (Fiebig

More information

8/13/2009. Diseases. Disease. Pathogens. Domain Bacteria Characteristics. Bacteria Shapes. Domain Bacteria Characteristics

8/13/2009. Diseases. Disease. Pathogens. Domain Bacteria Characteristics. Bacteria Shapes. Domain Bacteria Characteristics Disease Diseases I. Bacteria II. Viruses including Biol 105 Lecture 17 Chapter 13a are disease-causing organisms Domain Bacteria Characteristics 1. Domain Bacteria are prokaryotic 2. Lack a membrane-bound

More information

7.013 Spring 2005 Problem Set 7

7.013 Spring 2005 Problem Set 7 MI Department of Biology 7.013: Introductory Biology - Spring 2005 Instructors: Professor Hazel Sive, Professor yler Jacks, Dr. Claudette Gardel 7.013 Spring 2005 Problem Set 7 FRIDAY May 6th, 2005 Question

More information

Section 9. Junaid Malek, M.D.

Section 9. Junaid Malek, M.D. Section 9 Junaid Malek, M.D. Mutation Objective: Understand how mutations can arise, and how beneficial ones can alter populations Mutation= a randomly produced, heritable change in the nucleotide sequence

More information

Virion Genome Genes and proteins Viruses and hosts Diseases Distinctive characteristics

Virion Genome Genes and proteins Viruses and hosts Diseases Distinctive characteristics Hepadnaviruses Virion Genome Genes and proteins Viruses and hosts Diseases Distinctive characteristics Hepatitis viruses A group of unrelated pathogens termed hepatitis viruses cause the vast majority

More information

Lecture Readings. Vesicular Trafficking, Secretory Pathway, HIV Assembly and Exit from Cell

Lecture Readings. Vesicular Trafficking, Secretory Pathway, HIV Assembly and Exit from Cell October 26, 2006 1 Vesicular Trafficking, Secretory Pathway, HIV Assembly and Exit from Cell 1. Secretory pathway a. Formation of coated vesicles b. SNAREs and vesicle targeting 2. Membrane fusion a. SNAREs

More information

19 Viruses BIOLOGY. Outline. Structural Features and Characteristics. The Good the Bad and the Ugly. Structural Features and Characteristics

19 Viruses BIOLOGY. Outline. Structural Features and Characteristics. The Good the Bad and the Ugly. Structural Features and Characteristics 9 Viruses CAMPBELL BIOLOGY TENTH EDITION Reece Urry Cain Wasserman Minorsky Jackson Outline I. Viruses A. Structure of viruses B. Common Characteristics of Viruses C. Viral replication D. HIV Lecture Presentation

More information

HIV/AIDS. Biology of HIV. Research Feature. Related Links. See Also

HIV/AIDS. Biology of HIV. Research Feature. Related Links. See Also 6/1/2011 Biology of HIV Biology of HIV HIV belongs to a class of viruses known as retroviruses. Retroviruses are viruses that contain RNA (ribonucleic acid) as their genetic material. After infecting a

More information

Patterns of hemagglutinin evolution and the epidemiology of influenza

Patterns of hemagglutinin evolution and the epidemiology of influenza 2 8 US Annual Mortality Rate All causes Infectious Disease Patterns of hemagglutinin evolution and the epidemiology of influenza DIMACS Working Group on Genetics and Evolution of Pathogens, 25 Nov 3 Deaths

More information

Mapping Evolutionary Pathways of HIV-1 Drug Resistance. Christopher Lee, UCLA Dept. of Chemistry & Biochemistry

Mapping Evolutionary Pathways of HIV-1 Drug Resistance. Christopher Lee, UCLA Dept. of Chemistry & Biochemistry Mapping Evolutionary Pathways of HIV-1 Drug Resistance Christopher Lee, UCLA Dept. of Chemistry & Biochemistry Stalemate: We React to them, They React to Us E.g. a virus attacks us, so we develop a drug,

More information

Models of HIV during antiretroviral treatment

Models of HIV during antiretroviral treatment Models of HIV during antiretroviral treatment Christina M.R. Kitchen 1, Satish Pillai 2, Daniel Kuritzkes 3, Jin Ling 3, Rebecca Hoh 2, Marc Suchard 1, Steven Deeks 2 1 UCLA, 2 UCSF, 3 Brigham & Womes

More information

, virus identified as the causative agent and ELISA test produced which showed the extent of the epidemic

, virus identified as the causative agent and ELISA test produced which showed the extent of the epidemic 1 Two attributes make AIDS unique among infectious diseases: it is uniformly fatal, and most of its devastating symptoms are not due to the causative agent Male to Male sex is the highest risk group in

More information

How HIV Causes Disease Prof. Bruce D. Walker

How HIV Causes Disease Prof. Bruce D. Walker How HIV Causes Disease Howard Hughes Medical Institute Massachusetts General Hospital Harvard Medical School 1 The global AIDS crisis 60 million infections 20 million deaths 2 3 The screen versions of

More information

Micropathology Ltd. University of Warwick Science Park, Venture Centre, Sir William Lyons Road, Coventry CV4 7EZ

Micropathology Ltd. University of Warwick Science Park, Venture Centre, Sir William Lyons Road, Coventry CV4 7EZ www.micropathology.com info@micropathology.com Micropathology Ltd Tel 24hrs: +44 (0) 24-76 323222 Fax / Ans: +44 (0) 24-76 - 323333 University of Warwick Science Park, Venture Centre, Sir William Lyons

More information

Chronic HIV-1 Infection Frequently Fails to Protect against Superinfection

Chronic HIV-1 Infection Frequently Fails to Protect against Superinfection Chronic HIV-1 Infection Frequently Fails to Protect against Superinfection Anne Piantadosi 1,2[, Bhavna Chohan 1,2[, Vrasha Chohan 3, R. Scott McClelland 3,4,5, Julie Overbaugh 1,2* 1 Division of Human

More information

Transmission of Single and Multiple Viral Variants in Primary HIV-1 Subtype C Infection

Transmission of Single and Multiple Viral Variants in Primary HIV-1 Subtype C Infection Transmission of Single and Multiple Viral Variants in Primary HIV-1 Subtype C Infection The Harvard community has made this article openly available. Please share how this access benefits you. Your story

More information

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination

SMPD 287 Spring 2015 Bioinformatics in Medical Product Development. Final Examination Final Examination You have a choice between A, B, or C. Please email your solutions, as a pdf attachment, by May 13, 2015. In the subject of the email, please use the following format: firstname_lastname_x

More information

Chapter 13 Viruses, Viroids, and Prions. Biology 1009 Microbiology Johnson-Summer 2003

Chapter 13 Viruses, Viroids, and Prions. Biology 1009 Microbiology Johnson-Summer 2003 Chapter 13 Viruses, Viroids, and Prions Biology 1009 Microbiology Johnson-Summer 2003 Viruses Virology-study of viruses Characteristics: acellular obligate intracellular parasites no ribosomes or means

More information

7.012 Problem Set 6 Solutions

7.012 Problem Set 6 Solutions Name Section 7.012 Problem Set 6 Solutions Question 1 The viral family Orthomyxoviridae contains the influenza A, B and C viruses. These viruses have a (-)ss RNA genome surrounded by a capsid composed

More information

Hepadnaviruses: Variations on the Retrovirus Theme

Hepadnaviruses: Variations on the Retrovirus Theme WBV21 6/27/03 11:34 PM Page 377 Hepadnaviruses: Variations on the Retrovirus Theme 21 CHAPTER The virion and the viral genome The viral replication cycle The pathogenesis of hepatitis B virus A plant hepadnavirus

More information

HIV & AIDS: Overview

HIV & AIDS: Overview HIV & AIDS: Overview UNIVERSITY OF PAPUA NEW GUINEA SCHOOL OF MEDICINE AND HEALTH SCIENCES DIVISION OF BASIC MEDICAL SCIENCES DISCIPLINE OF BIOCHEMISTRY & MOLECULAR BIOLOGY PBL SEMINAR VJ TEMPLE 1 What

More information

Last time we talked about the few steps in viral replication cycle and the un-coating stage:

Last time we talked about the few steps in viral replication cycle and the un-coating stage: Zeina Al-Momani Last time we talked about the few steps in viral replication cycle and the un-coating stage: Un-coating: is a general term for the events which occur after penetration, we talked about

More information

Some living things are made of ONE cell, and are called. Other organisms are composed of many cells, and are called. (SEE PAGE 6)

Some living things are made of ONE cell, and are called. Other organisms are composed of many cells, and are called. (SEE PAGE 6) Section: 1.1 Question of the Day: Name: Review of Old Information: N/A New Information: We tend to only think of animals as living. However, there is a great diversity of organisms that we consider living

More information

Evolution of influenza

Evolution of influenza Evolution of influenza Today: 1. Global health impact of flu - why should we care? 2. - what are the components of the virus and how do they change? 3. Where does influenza come from? - are there animal

More information

De Novo Viral Quasispecies Assembly using Overlap Graphs

De Novo Viral Quasispecies Assembly using Overlap Graphs De Novo Viral Quasispecies Assembly using Overlap Graphs Alexander Schönhuth joint with Jasmijn Baaijens, Amal Zine El Aabidine, Eric Rivals Milano 18th of November 2016 Viral Quasispecies Assembly: HaploClique

More information

VIRUSES. 1. Describe the structure of a virus by completing the following chart.

VIRUSES. 1. Describe the structure of a virus by completing the following chart. AP BIOLOGY MOLECULAR GENETICS ACTIVITY #3 NAME DATE HOUR VIRUSES 1. Describe the structure of a virus by completing the following chart. Viral Part Description of Part 2. Some viruses have an envelope

More information

HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS

HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS Somda&a Sinha Indian Institute of Science, Education & Research Mohali, INDIA International Visiting Research Fellow, Peter Wall Institute

More information

Evolutionary interactions between haemagglutinin and neuraminidase in avian influenza

Evolutionary interactions between haemagglutinin and neuraminidase in avian influenza Ward et al. BMC Evolutionary Biology 2013, 13:222 RESEARCH ARTICLE Open Access Evolutionary interactions between haemagglutinin and neuraminidase in avian influenza Melissa J Ward 1*, Samantha J Lycett

More information

11/15/2011. Outline. Structural Features and Characteristics. The Good the Bad and the Ugly. Viral Genomes. Structural Features and Characteristics

11/15/2011. Outline. Structural Features and Characteristics. The Good the Bad and the Ugly. Viral Genomes. Structural Features and Characteristics Chapter 19 - Viruses Outline I. Viruses A. Structure of viruses B. Common Characteristics of Viruses C. Viral replication D. HIV II. Prions The Good the Bad and the Ugly Viruses fit into the bad category

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Bas. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 18 th 2016 Protein alignments We have seen how to create a pairwise alignment of two sequences

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) Superinfection with drug-resistant HIV is rare and does not contribute substantially to therapy failure in a large European cohort Bartha, I.; Assel, M.; Sloot, P.M.A.;

More information

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING John Archer Faculty of Life Sciences University of Manchester HIV Dynamics and Evolution, 2008, Santa Fe, New Mexico. Overview

More information

Principles of phylogenetic analysis

Principles of phylogenetic analysis Principles of phylogenetic analysis Arne Holst-Jensen, NVI, Norway. Fusarium course, Ås, Norway, June 22 nd 2008 Distance based methods Compare C OTUs and characters X A + D = Pairwise: A and B; X characters

More information

CDC site UNAIDS Aids Knowledge Base http://www.cdc.gov/hiv/dhap.htm http://hivinsite.ucsf.edu/insite.jsp?page=kb National Institute of Allergy and Infectious Diseases http://www.niaid.nih.gov/default.htm

More information

number Done by Corrected by Doctor Ashraf

number Done by Corrected by Doctor Ashraf number 4 Done by Nedaa Bani Ata Corrected by Rama Nada Doctor Ashraf Genome replication and gene expression Remember the steps of viral replication from the last lecture: Attachment, Adsorption, Penetration,

More information

Retroviruses. ---The name retrovirus comes from the enzyme, reverse transcriptase.

Retroviruses. ---The name retrovirus comes from the enzyme, reverse transcriptase. Retroviruses ---The name retrovirus comes from the enzyme, reverse transcriptase. ---Reverse transcriptase (RT) converts the RNA genome present in the virus particle into DNA. ---RT discovered in 1970.

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Grade Level: Grades 9-12 Estimated Time Allotment Part 1: One 50- minute class period Part 2: One 50- minute class period

Grade Level: Grades 9-12 Estimated Time Allotment Part 1: One 50- minute class period Part 2: One 50- minute class period The History of Vaccines Lesson Plan: Viruses and Evolution Overview and Purpose: The purpose of this lesson is to prepare students for exploring the biological basis of vaccines. Students will explore

More information

HIV Infection and Epidemiology: Can There Be a Cure? Dr. Nedwidek

HIV Infection and Epidemiology: Can There Be a Cure? Dr. Nedwidek HIV Infection and Epidemiology: Can There Be a Cure? Dr. Nedwidek The Viral Life Cycle A typical virus (DNA or RNA + protein) enters the host cell, makes more of itself, and exits. There are two major

More information

To test the possible source of the HBV infection outside the study family, we searched the Genbank

To test the possible source of the HBV infection outside the study family, we searched the Genbank Supplementary Discussion The source of hepatitis B virus infection To test the possible source of the HBV infection outside the study family, we searched the Genbank and HBV Database (http://hbvdb.ibcp.fr),

More information

Immunodeficiencies HIV/AIDS

Immunodeficiencies HIV/AIDS Immunodeficiencies HIV/AIDS Immunodeficiencies Due to impaired function of one or more components of the immune or inflammatory responses. Problem may be with: B cells T cells phagocytes or complement

More information

The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates

The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates The Genealogical Population Dynamics of HIV-1 in a Large Transmission Chain: Bridging within and among Host Evolutionary Rates Bram Vrancken 1 *, Andrew Rambaut 2,3, Marc A. Suchard 4,5,6, Alexei Drummond

More information

Did the surgeon give hepatitis C to his patient?

Did the surgeon give hepatitis C to his patient? ForensicEA Lite Tutorial Did the surgeon give hepatitis C to his patient? In a recent issue of the Journal of Medical Virology, R. Stephan Ross and colleagues (2002) report the story of a German surgeon

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Running Head: AN UNDERSTANDING OF HIV- 1, SYMPTOMS, AND TREATMENTS. An Understanding of HIV- 1, Symptoms, and Treatments.

Running Head: AN UNDERSTANDING OF HIV- 1, SYMPTOMS, AND TREATMENTS. An Understanding of HIV- 1, Symptoms, and Treatments. Running Head: AN UNDERSTANDING OF HIV- 1, SYMPTOMS, AND TREATMENTS An Understanding of HIV- 1, Symptoms, and Treatments Benjamin Mills Abstract HIV- 1 is a virus that has had major impacts worldwide. Numerous

More information

Building complexity Unit 04 Population Dynamics

Building complexity Unit 04 Population Dynamics Building complexity Unit 04 Population Dynamics HIV and humans From a single cell to a population Single Cells Population of viruses Population of humans Single Cells How matter flows from cells through

More information

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise!

Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise! Name: Due on Wensday, December 7th Bioinformatics Take Home Exam #9 Pick one most correct answer, unless stated otherwise! 1. What process brought 2 divergent chlorophylls into the ancestor of the cyanobacteria,

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

Inference Methods for First Few Hundred Studies

Inference Methods for First Few Hundred Studies Inference Methods for First Few Hundred Studies James Nicholas Walker Thesis submitted for the degree of Master of Philosophy in Applied Mathematics and Statistics at The University of Adelaide (Faculty

More information

Hands-on Activity Viral DNA Integration. Educator Materials

Hands-on Activity Viral DNA Integration. Educator Materials OVERVIEW This activity is part of a series of activities and demonstrations focusing on various aspects of the human immunodeficiency virus (HIV) life cycle. HIV is a retrovirus. Retroviruses are distinguished

More information

Overview: Chapter 19 Viruses: A Borrowed Life

Overview: Chapter 19 Viruses: A Borrowed Life Overview: Chapter 19 Viruses: A Borrowed Life Viruses called bacteriophages can infect and set in motion a genetic takeover of bacteria, such as Escherichia coli Viruses lead a kind of borrowed life between

More information

Part Of A Virus That Contains The Instructions For Making New Viruses

Part Of A Virus That Contains The Instructions For Making New Viruses Part Of A Virus That Contains The Instructions For Making New Viruses A hidden virus. Becomes part of the host cell's generic material. A virus's contains the instructions for making new viruses. Genetic

More information

HS-LS4-4 Construct an explanation based on evidence for how natural selection leads to adaptation of populations.

HS-LS4-4 Construct an explanation based on evidence for how natural selection leads to adaptation of populations. Unit 2, Lesson 2: Teacher s Edition 1 Unit 2: Lesson 2 Influenza and HIV Lesson Questions: o What steps are involved in viral infection and replication? o Why are some kinds of influenza virus more deadly

More information

Natural Scene Statistics and Perception. W.S. Geisler

Natural Scene Statistics and Perception. W.S. Geisler Natural Scene Statistics and Perception W.S. Geisler Some Important Visual Tasks Identification of objects and materials Navigation through the environment Estimation of motion trajectories and speeds

More information

HIV and drug resistance Simon Collins UK-CAB 1 May 2009

HIV and drug resistance Simon Collins UK-CAB 1 May 2009 HIV and drug resistance Simon Collins UK-CAB 1 May 2009 slides: thanks to Prof Clive Loveday, Intl. Clinical Virology Centre www.icvc.org.uk Tip of the iceberg = HIV result, CD4, VL Introduction: resistance

More information

From Mosquitos to Humans: Genetic evolution of Zika Virus

From Mosquitos to Humans: Genetic evolution of Zika Virus Article: From Mosquitos to Humans: Genetic evolution of Zika Virus Renata Pellegrino, PhD Director, Sequencing lab Center for Applied Genomics The Children s Hospital of Philadelphia Journal Club Clinical

More information

7.012 Quiz 3 Answers

7.012 Quiz 3 Answers MIT Biology Department 7.012: Introductory Biology - Fall 2004 Instructors: Professor Eric Lander, Professor Robert A. Weinberg, Dr. Claudette Gardel Friday 11/12/04 7.012 Quiz 3 Answers A > 85 B 72-84

More information

227 28, 2010 MIDTERM EXAMINATION KEY

227 28, 2010 MIDTERM EXAMINATION KEY Epidemiology 227 April 28, 2010 MIDTERM EXAMINATION KEY Select the best answer for the multiple choice questions. There are 64 questions and 9 pages on the examination. Each question will count one point.

More information

Section 6. Junaid Malek, M.D.

Section 6. Junaid Malek, M.D. Section 6 Junaid Malek, M.D. The Golgi and gp160 gp160 transported from ER to the Golgi in coated vesicles These coated vesicles fuse to the cis portion of the Golgi and deposit their cargo in the cisternae

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

Fig. 1: Schematic diagram of basic structure of HIV

Fig. 1: Schematic diagram of basic structure of HIV UNIVERSITY OF PAPUA NEW GUINEA SCHOOL OF MEDICINE AND HEALTH SCIENCES DIVISION OF BASIC MEDICAL SCIENCES DISCIPLINE OF BIOCHEMISTRY & MOLECULAR BIOLOGY PBL SEMINAR HIV & AIDS: An Overview What is HIV?

More information

arxiv: v2 [q-bio.pe] 21 Jan 2008

arxiv: v2 [q-bio.pe] 21 Jan 2008 Viral population estimation using pyrosequencing Nicholas Eriksson 1,, Lior Pachter 2, Yumi Mitsuya 3, Soo-Yon Rhee 3, Chunlin Wang 3, Baback Gharizadeh 4, Mostafa Ronaghi 4, Robert W. Shafer 3, and Niko

More information

Lentiviruses: HIV-1 Pathogenesis

Lentiviruses: HIV-1 Pathogenesis Lentiviruses: HIV-1 Pathogenesis Human Immunodeficiency Virus, HIV, computer graphic by Russell Kightley Tsafi Pe ery, Ph.D. Departments of Medicine and Biochemistry & Molecular Biology NJMS, UMDNJ. e-mail:

More information