Science Supporting Online Material

Science Supporting Online Material Cynthia A. Derdeyn, Julie M. Decker, Frederic Bibollet-Ruche, John L. Mokili, Mark Muldoon, Scott A. Denham, Marintha L. Heil, Francis Kasolo, Rosemary Musonda, Beatrice H. Hahn, George M. Shaw, Bette T. Korber, Susan Allen, and Eric Hunter, Envelope-Constrained Neutralization-Sensitive HIV-1 After Heterosexual Transmission. Materials and Methods Study subjects. The Lusaka cohort, established in 1994, provides voluntary HIV-1 testing and counseling, long-term monitoring, and health care to cohabitating heterosexual couples in the capital city of Zambia (1-3). Of the >12,000 couples whose HIV-1 status has been determined to date, approximately 21% were HIV-1 discordant, 26% were concordant HIV-1 positive, and 53% were concordant HIV-1 negative. Between February 1994 and October 2000, a total of 1,022 discordant couples (535 with HIV-1 infected men and 487 with HIV-1 infected women) were enrolled in a prospective study of the incidence and predictors of heterosexual transmission. These couples were monitored for seroconversion of the negative partner at three month intervals, at which time the participants also received preventative counseling and condoms. The eight transmission pairs studied here were derived from this larger cohort. Blood samples for preparation of plasma or genomic DNA were collected by venipuncture into acid citrate dextrose tubes from both partners when seroconversion of an HIV-negative partner was detected. Informed consent and human subjects protocols were approved by the University of Alabama at Birmingham Institutional Review Board and the University of Zambia School of Medicine, Research Ethics Committee. Amplification and cloning of HIV-1 env genes. For amplification of proviral sequences, genomic DNA was extracted from ficoll-purified, uncultured peripheral blood mononuclear cells (PBMC) using the QIAamp Blood Kit (Qiagen, Valencia, CA) and used as the template for nested PCR amplification of full-length (gp160) or partial (gp120 V1-V5) env sequences. For analysis of plasma virus, RNA was purified from plasma S1

using the QIAamp Viral RNA Mini Kit (Qiagen, Valencia, CA) and was reverse transcribed into cdna using SuperScript II according to the manufacturer s instructions (Invitrogen, Carlsbad, CA). Following reverse transcription, full-length (gp160) or partial (gp120 V1-V5) env sequences were amplified by nested PCR amplification of the cdna template. To ensure that the molecular env clones we obtained were representative of the virus populations present in each compartment, replicate PCR reactions were performed at varying endpoint concentrations of template DNA, on different days, and using multiple sets of nested PCR primers. The RT reactions contained between 1,200 and 35,000 approximate copies of RNA, and reverse transcription was primed with oligo-dt or an HIV-specific primer (OFM19, see below). cdna or DNA samples were subjected to nested PCR amplification using three different primer sets. For full-length gp160 amplification from genomic DNA and plasma (Set 1): Outer sense primer Vif1 (5 -GGGTTTATTACAGGGACAGCAGAG - 3 ), outer antisense primer OFM19 (5 - GCACTCAAGGCAAGCTTTATTGAGGCTTA - 3 ), inner sense primer EnvA (5 - GCCTTAGGCATCTCCTATGGCAGGAAGAA - 3 ), and inner antisense primer EnvN (5 - CTGTCAATCAGGGAAGTAGCCTTGTGT - 3 ). For full-length gp160 amplification from plasma (Set 2): outer sense primer EnvA (5 - GCTTAGGCATCTCCTATGGCAGGAAGAA - 3 ), outer antisense primer EnvN (5 - CTGTCAATCAGGGAAGTAGCCTTGTGT - 3 ), inner sense primer EnvA2 (5 - GCATTTCCTATGGCAGGAAGGAAGC - 3 ) and inner antisense primer EnvM ( 5 - TAGCCCTTCCAGTCCCCCCTTTTCTTTT - 3 ). For partial amplification (gp120 V1- V5) from genomic DNA and plasma (Set 3): outer sense primer EnvA (5 - GCTTAGGCATCTCCTATGGCAGGAAGAA - 3 ), outer antisense primer EnvN (5 - CTGTCAATCAGGGAAGTAGCCTTGTGT - 3 ), inner sense primer ED5 (5 - ATGGGATCAAAGCCTAAAGCC - 3 ), and inner antisense primer ED12 (5 - AGTGCTTCCTGCTGCTCCCAAG - 3 ). Expand High Fidelity polymerase (Hoffman - LaRoche, Nutley, NJ) was used for all PCR amplifications, the majority of which contained standard amplification conditions (1.5 mm MgCl 2, 20 pmol of each primer, and 0.20 to 0.25 M dntps). Generally, between 0.1 and 1 µg of total DNA served as a template for the first round of PCR amplification and 1 µl of the first round was used as a S2

template for the second round PCR. PCR conditions for Set 1 were as follows: the reaction volume for the first round was 50 µl and the cycling parameters were 10 cycles of 95 C for 15 sec., 54 C for 30 sec., 68 C for 4 min. followed by 25 cycles of 95 C for 15 sec., 54 C for 30 sec., 68 C for 4 min. plus 5 sec. per cycle; the reaction volume for the second round was 100 µl and the cycling parameters were 35 cycles of 95 C for 15 sec., 55 C for 30 sec., 72 C for 2.5 min. Manual hot-start was utilized for both amplifications using Set 1 primers. PCR conditions for Set 2 were as follows: the reaction volume for the first round was 50 µl and the cycling parameters were 1 cycle of 94 C for 2 min., followed by 35 cycles of 94 C for 30 sec., 51 C to 62 C for 30 sec., 72 C for 2.5 min. plus 5 sec. per cycle, and 1 cycle of 72 C for 7 min.; the reaction volume for the second round was 50 µl and the cycling parameters were 1 cycle of 94 C for 2 min., followed by 35 cycles of 94 C for 30 sec., 50 C to 60 C for 30 sec., 72 C for 2.5 min., and 1 cycle of 72 C for 7 min. PCR conditions for Set 3 are as follows: the reaction volume for the first round was 100 µl and the cycling parameters were 1 cycle of 95 C for 2 min, 35 cycles of 95 C for 15 sec., 55 C for 30 sec., 72 C for 2.5 min., and 1 cycle of 72 C for 7 min; the reaction volume for the second round was 100 µl and the cycling parameters were 1 cycle of 95 C for 2 min, 35 cycles of 95 C for 15 sec., 55 C for 30 sec., 72 C for 2.5 min., and 1 cycle of 72 C for 7 min. PCR products were gel-purified using the QIAquick Gel Extraction Kit (Qiagen, Valencia, CA). Partial-length env genes were T/A cloned into pgem-t or pgem-te (Promega, Madison, WI) and full-length env genes were T/A cloned into pcr3.1 or pcdna3.1 Topo/TA (Invitrogen, Carlsbad, CA) for CMV promoter-driven expression. Ligations were performed according to manufacturer s instructions and transformed into maximum efficiency JM109 competent cells (Promega, Madison, WI). To identify biologically functional env clones, colony PCR was performed in a 96-well format to screen for transformants that contained inserts in the correct orientation with respect to the CMV-promoter. Briefly, colonies were simultaneously inoculated onto an agar plate and into a 50 µl PCR reaction containing 1 unit of Taq polymerase (Eppendorf, Hamburg, Germany), 0.2 M dntps, 3.0 mm MgOAc 2, and 10 pmol of each primer. S3

The sense primer was T7 (5 - TAATACGACTCACTATAGGG - 3 ) and the antisense primer was ED5Rev (5 - GGCTTTAGGCTTTGATCCCAT - 3 ), producing a 500 to 600 bp PCR product. The cycling parameters were 1 cycle of 95 C for 5 min., 30 cycles of 95 C for 30 sec., 55 C for 30 sec., 72 C for 30 sec., and 1 cycle of 72 C for 10 min. All correctly oriented env clones were prepared using the QIAprep Spin Miniprep Kit (Qiagen, Valencia, CA). These env clones were screened for biological function using the following protocol: 0.1 µg of DNA was co-transfected into an 80% confluent monolayer of 5 x 10 4 293T cells (in 48-well tissue culture dishes) along with 0.2 µg of an Env-deficient subtype B proviral plasmid, HXB2 Env, using Fugene-6 according to the manufacturer s instructions (Hoffman-La Roche, Nutley, NJ). Seventy-two hours later, 100 µl of transfection supernatant was used to infect JC53-BL cells as described previously (4, 5). At 48 hours post-infection, β-gal staining was performed and each well was scored positive or negative for blue foci. Pseudotyped env clones that produced blue foci were re-transfected on a larger scale (described below), harvested at 72 hours to produce a working stock, and titered as described below. Between 10% and 60% of the total number of correctly oriented clones for each individual were biologically functional. DNA sequencing of env clones was carried out by Lone Star Labs, Inc. (Houston, TX) utilizing the ABI Prism Automated DNA sequencer 377XL and Big Dye Terminator Ready Reaction Cycle Sequencing Kit (Applied Biosystems, Foster City, CA) or the UAB CFAR DNA Sequence Analysis Core using an ABI 3100 Genetic Analyzer and dideoxy methodology (Applied Biosystems, Foster City, CA). Phylogenetic tree analysis. Neighbor joining tree. Nucleotide sequences were initially aligned using HMMER (version 2.3.1, by Sean Eddy, http://hmmer.wustl.edu/), then codon aligned using GeneCutter (by Brian Gaschen, http://www.hiv.lanl.gov/content/hivdb/gene_cutter/cutter.html), and hand-aligned to optimize. After gap-stripping, 626 positions remained. Phylogenetic relationships were estimated using the F84 evolutionary model implemented in the Phylip Neighbor program (Phylogeny Inference Package, by Joe Felsenstein, http://evolution.genetics.washington.edu/phylip.html) with a TT ratio of 1.3. Two unrelated reference sequences (UCD83003, UCD90121) were used S4

as an outgroup. The reliability of branching orders was assessed by bootstrap analysis with 100 replicates (6). Maximum Likelihood Tree. A maximum likelihood tree was constructed for each transmission pair, which included two unrelated outgroup sequences from the same clade (C or G, as appropriate). After gap-stripping, Modeltest was used to identify the optimal evolutionary model (7), and for each tree the model selected included base frequencies, a general reversible model for base substitution, and differences in rate variation at different sites assigned according to a gamma distribution with four categories of rates and invariant sites. A further comparison was made to a REV model with maximum likelihood assigned rate variation at different sites, but the additional parameters were not justified in a likelihood ratio test for this data, and so the Modeltest model was used. The specific values for each of the model parameters were estimated for each of the eight trees separately. PAUP was then used to construct a likelihood tree for each transmission pair. BranchLength.pl (B Korber, www.santafe.edu/~btk/sciencepaper/bette.html) was used to calculate the branch length to the ancestral node for each donor and recipient sequence. Sequences are available under accession numbers AY423908-AY424198. A protein alignment of the V1-V4 region for each recipient consensus and its most closely related donor sequence is shown (Fig. S3). Neutralization Assay. Plasma samples were assayed for neutralizing antibody activity against virions pseudotyped with donor and recipient Envs using a modification of an HIV-1 entry assay previously described (4, 5, 8, 9). JC53BL-13 cells are derived from HeLa cells and have been genetically engineered to stably express high levels of CD4 and CCR5 as well as CXCR4. The JC53BL-13 cells express firefly luciferase and Escherichia coli β-galactosidase under the transcriptional control of the HIV-1 LTR and can be used for single round infection assays. Env-pseudotyped virus stocks were generated using the following method: 2 µg of the Env expression plasmid was cotransfected with 4 µg of an Env-deficient HIV-1 subtype B proviral plasmid, psg3 Env, into an 80% confluent monolayer of 1 x 10 6 293T cells using the Fugene-6 reagent according to the manufacturer s instructions (Hoffman- La Roche, Nutley, NJ). Supernatants were collected at 72 hours post-transfection, clarified by low speed centrifugation, aliquotted into 0.5 ml or less portions, and stored at 70 o C. The titer of S5

each pseudotyped stock was determined by infecting JC53BL-13 cells with 5-fold serial dilutions of virus as described previously. Infectious units per µl were determined for each pseudotyped Env by directly counting blue foci in the infected monolayers at 48 hours post-infection. To test pseudotyped viruses for neutralization susceptibility, 4x10 4 JC53BL-13 cells were plated onto 24-well tissue culture plates (Falcon) and cultured overnight in Dulbecco modified Eagle medium (DMEM) supplemented with 10% fetal calf serum (FCS). 1,000 infectious units of pseudotyped virus were combined in a total volume of 125 µl with five-fold dilutions of heat-inactivated test plasma beginning at 10% vol/vol in DMEM plus 1% FCS and incubated for 1 hour at 37 C. Normal heatinactivated human plasma was added as necessary to maintain an overall 10% concentration. Virus was then added to JC53BL-13 cells in an equal volume (125 µl) of DMEM plus 1% FCS and 80 µg/ml DEAE dextran. This brought the total concentration of NHP with or without test plasma to 5%. The cells were then incubated for 2 hours at 37 C. Thereafter, 400 µl of DMEM plus 10% FCS and test plasma (in corresponding dilutions) were added, again keeping the test plasma concentration constant, and the cells were incubated at 37 C for 2 days. In most experiments, the cells were not washed free of virus. After two days, cells were lysed and the luciferase activity of each well was measured using a luciferase assay reagent (Promega, Madison, WI) and a LUMIstar luminometer (BMG Lab Technologies, Offenburg, Germany). Background luminescence was determined in uninfected wells and subtracted from all experimental wells. Cell viability and toxicity was monitored in uninfected cultures that were treated with test and control plasma by basal levels of luciferase expression and visual inspection. Relative infectivity (% of control) was calculated by dividing the mean number of luciferase units at each dilution by the mean values in wells containing no test plasma. Controls included test plasma versus normal human plasma, with and without virus. Serial dilution doseresponse curves were fit using the growth function of Excel for virus infectivity curves. To determine 50% inhibitory concentrations (IC50) of donor or pooled subtype C plasma, a log quadratic model was constructed for each pseudotype to enhance the model fit and preserve the normality assumption. The established dose-response curve was then used to calculate the IC50 dose for each Env-plasma combination. Results shown were replicated at least twice. S6

Statistical Data Analysis. Comparison of V1-V4 length, number of N-linked glycosylation sites, and neutralization sensitivity between donor and recipient was performed using the following non-parametric statistical test (M. Muldoon and B. Korber). Briefly, the length (the number of amino acids between the start of V1 and end of V4), the number of N-gly sites, or neutralization IC50s were computed for each donor sequence, the median donor value was calculated, and the frequency of sequences greater than, equal to, or below the median was determined. Since the recipient sequences were highly conserved, this distribution was compared to a single value of the same parameter that was considered representative of the recipient (when multiple forms were present in the recipient, the dominant form was used). Given the distribution and frequency of donor values, the exact probability of finding such a large number of sub-median values as observed in the recipients (a one-sided, sign-like test) was computed. There were a high proportion of values that fell on the median for the donors, and there were generally fewer sub-median values than the 50% one would typically expect from a continuously varying distribution. Thus, the probability of transmitting a variant shorter than the median length was generally much less than 0.50. That these infrequently sampled forms in the donor tended to be favored for transmission became statistically significant when all eight transmission pairs were examined in aggregate, by comparing against the null hypothesis that the recipient sequences were equally likely to be drawn from any sequence in their donor pool. In addition, a one-sided Wilcoxon test, a non-parametric rank order statistic, was also performed using the data from the eight transmission pairs to determine whether the donor and recipient sequences were significantly different in these parameters (SPLUS v6.2.1 2002, Insightful Corporation). For only one pair was the p-value of the rank-comparison of the single recipient form significantly lower than the distribution of forms found in the donor (pair 135, p = 0.04). However, the eight independent transmission pairs were also analyzed collectively, producing an aggregated Z-score by summing the eight separate scores and dividing the sum by the square root of eight. In each case, the two statistical tests agreed. One-sided tests were used as we considered them most appropriate; our hypothesis was that extending variable loops throughout progression would facilitate immune escape during the course of infection, but at a fitness cost that might result in selection for shorter sequences at transmission. S7

Non-parametric tests were used, as the distributions of values were clearly non-gaussian for most of the parameters under study. A two-sided Wilcoxon test was performed to determine whether the median length and number of glycosylation sites of the donor and recipient sequences differed significantly from the subtype C sequences in the HIV database, and to compare branch length to the ancestral node between donor and recipient sequences. A linear regression analysis was utilized to evaluate the correlation between neutralization by antibodies in donor and pooled plasma (SPLUS v6.2.1 2002, Insightful Corporation; R, Copyright 2002, The R Development Core Team). Supporting Figures Fig. S1. Analysis of intersubtype recombination. Based on the topology of the maximum likelihood tree (Fig. 2), the most distant nucleic acid sequence from the common ancestral node of the quasispecies obtained from each transmission pair was subjected to recombination analysis using the Recombinant Identification Program from the HIV Sequence Database (http://www.hiv.lanl.gov/content/hiv-db/ripper). The selected sequence was compared with the consensus of the reference sequences (A, B, C, D, F1-F2, G, H, and O) represented by the corresponding colors. The fraction of matching positions between the selected sequence and each HIV-1 subtype is shown. Sequences from seven pairs are best matched with the subtype C consensus sequence; the sequence from pair 71 matches that of subtype G. No evidence of inter-subtype recombination was detected. S8

Fig. S1 S9

Fig. S2. Neutralization of pseudotyped virus by antibodies in pooled subtype C plasma. The distribution of neutralization sensitivity to pooled subtype C plasma for donor (green) and recipient (blue) Env pseudotypes for five transmission pairs is shown, with IC50 values for pooled plasma plotted on the horizontal axis. The fraction of Env pseudotypes with a given IC50 is indicated on the vertical axis. M F = Male to Female transmission, F M = Female to Male transmission. Fig. S2 S10

Fig. S3. Protein alignment of donor and recipient V1-V4 regions. Nucleotide sequences were initially aligned using HMMER (version 2.3.1, by Sean Eddy, http://hmmer.wustl.edu/), then codon aligned using GeneCutter (by Brian Gaschen, http://www.hiv.lanl.gov/content/hiv-db/gene_cutter/cutter.html), and hand-aligned to optimize. Shown is a comparison of the entire V1- V4 region of a subtype C consensus sequence (top), and a recipient consensus (.CON) with the most closely related donor sequence for each pair. Clone numbers are indicated for the donor sequences. Potential N-linked glycosylation sites are indicated by red text. Glycosylation sites that are not conserved within a donor-recipient pair are boxed. Dashes indicate conserved residues relative to the subtype C consensus; dots indicate deleted residues; substitutions are indicated by the amino acid letter. Fig. S3 S11

Table S1. Characteristics of the heterosexual transmission pairs. Samples collected from eight heterosexual transmission pairs were studied. The numerical pair ID is shown in the first column and the direction of transmission, male to female (MTF) or female to male (FTM), is indicated in the second column. The source of the env clones, PBMC (PB) or plasma (PL), is indicated in the third column. Enroll (shown in years) indicates the amount of time that passed between the first seropositive test of the donor (i.e. enrollment into the study as a discordant couple) and sample collection. Seroconv indicates the amount of time (in months) that passed between the last seronegative test of the recipient and their first documented seroconversion. Sample indicates the amount of time that passed between the last seronegative visit of the recipient and sample collection, with seroconversion documented within this time frame or on the same day. The plasma viral load (shown in RNA copies/ml) was determined at the time of sample collection for each donor and recipient by the Roche monitor assay. Enroll Seroconv Sample Plasma VL ID Type Source (Years) (Months) (Months) Donor Recipient 13 MTF PB 4.0 2.8 2.8 5,452 2,740 106 MTF PB 1.1 3.0 4.3 267,961 48,442 109 MTF PB, PL 2.8 3.2 3.2 847,759 887,586 55 MTF PB, PL 2.2 3.0 3.0 501,927 88,544 53 FTM PB, PL 2.8 3.1 3.4 150,699 26,643 71 FTM PB 2.0 3.1 3.1 14,588 106,266 83 FTM PB, PL 1.8 3.1 3.3 80,523 1,120,643 135 FTM PB, PL 0.3 3.2 3.6 65,784 202,999 S12

Supporting References 1. S. Allen, K.E. N'Gandu, and A. Tichacek, in Preventing HIV in Developing Countries: Biomedical and Behavioral Approaches. (Platinum Press, New York, 1998). 2. S. L. McKenna et al., Aids 11 Suppl 1, S103 (1997). 3. S. Allen et al., Aids 17, 733 (2003). 4. C. A. Derdeyn et al., J Virol 74, 8358 (2000). 5. C. A. Derdeyn et al., J Virol 75, 8605 (2001). 6. J. Felsenstein, Genet Res 60, 209 (1992). 7. D. Posada, K. A. Crandall, Bioinformatics 14, 817 (1998). 8. X. Wei et al., Antimicrob Agents Chemother 46, 1896 (2002). 9. X. Wei, J.M. Decker, S. Wang, H. Hui, J.C. Kappes, X. Wu, J.F. Salazar, M.G. Salazar, J.M. Kilby, M.S. Saag, N.L. Komarova, M.A. Nowak, B.H. Hahn, P.D. Kwong, G.M. Shaw, Nature 422, 307 (2003). S13