To test the possible source of the HBV infection outside the study family, we searched the Genbank

Similar documents
Virological failure to Protease inhibitors in Monotherapy is linked to the presence of signature mutations in Gag without changes in HIV-1 replication

Supplementary Figure 1. FACS analysis of cells infected with TY93/H5N1 GFP-627E,

modified dye uptake assay including formazan test EC 90 not tested plaque reduction assay

Supplementary Figure 1 Weight and body temperature of ferrets inoculated with

Supplementary Figure 1

Deep-Sequencing of HIV-1

Originally published as:

HBV. Next Generation Sequencing, data analysis and reporting. Presenter Leen-Jan van Doorn

SUPPLEMENTARY INFORMATION. Rare independent mutations in renal salt handling genes contribute to blood pressure variation

CPTR title slide. A Standardized System for Grading Mutations in Mycobacterium tuberculosis for Association with Drug Resistance

Relative activity (%) SC35M

A Comprehensive Study of TP53 Mutations in Chronic Lymphocytic Leukemia: Analysis of 1,287 Diagnostic CLL Samples

Mutants and HBV vaccination. Dr. Ulus Salih Akarca Ege University, Izmir, Turkey

HCV NS3 Protease Drug Resistance

NEXT GENERATION SEQUENCING OPENS NEW VIEWS ON VIRUS EVOLUTION AND EPIDEMIOLOGY. 16th International WAVLD symposium, 10th OIE Seminar

SUPPLEMENTAL MATERIAL

Characterization of Hepatitis B Virus (HBV) Among Liver Patients in Kenya

HCV NS3 Protease Drug Resistance

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Supplementary information

Vertical Magnetic Separation of Circulating Tumor Cells and Somatic Genomic-Alteration Analysis in Lung Cancer Patients

Received 21 November 2005/Returned for modification 20 January 2006/Accepted 27 March 2006

History (August 2010) Therapy for Experienced Patients. History (September 2010) History (November 2010) 12/2/11

Frequency(%) KRAS G12 KRAS G13 KRAS A146 KRAS Q61 KRAS K117N PIK3CA H1047 PIK3CA E545 PIK3CA E542K PIK3CA Q546. EGFR exon19 NFS-indel EGFR L858R

Diagnostic Methods of HBV and HDV infections

Evolution of influenza

Supplementary Figure 1. Estimation of tumour content

Comparative Phylogenetic Analysis of E6 and E7 Proteins of Different 42 Strains of HPV Sangeeta Daf*, Lingaraja Jena, Satish Kumar

a) SSR with core motif > 2 and repeats number >3. b) MNR with repeats number>5.

Supplementary Figure 1. SC35M polymerase activity in the presence of Bat or SC35M NP encoded from the phw2000 rescue plasmid.

J. A. Mayfield et al. FIGURE S1. Methionine Salvage. Methylthioadenosine. Methionine. AdoMet. Folate Biosynthesis. Methylation SAH.

Evolution of hepatitis C virus in blood donors and their respective recipients

Supplementary Figure 1. Cytoscape bioinformatics toolset was used to create the network of protein-protein interactions between the product of each

Resistance Workshop. 3rd European HIV Drug

Genetic Analysis of Allosteric Signaling in RhaR from Escherichia coli and Characterization of the VirF Protein from Shigella flexneri

Viral and Host Factors in Vulvar Disease DR MICHELLE ETHERSON 26 TH OF APRIL 2016

Principles of phylogenetic analysis

iplex genotyping IDH1 and IDH2 assays utilized the following primer sets (forward and reverse primers along with extension primers).

Personalized Healthcare Update

New technologies reaching the clinic

SUPPLEMENTARY INFORMATION

Supplementary Online Content

What do we need to know about RAVs clinically?

Mapping Evolutionary Pathways of HIV-1 Drug Resistance. Christopher Lee, UCLA Dept. of Chemistry & Biochemistry

David L. Wyles, MD Chief, Division of Infectious Disease Denver Health Medical Center Denver, Colorado

SUPPLEMENTARY INFORMATION

Clinical utility of NGS for the detection of HIV and HCV resistance

Transmission of integrase resistance HIV

SUPPLEMENTARY INFORMATION

From Mosquitos to Humans: Genetic evolution of Zika Virus

Resistencias & Epidemiología. Eva Poveda Division of Clinical Virology INIBIC-Complexo Hospitalario Universitario de A Coruña

Resistance to Integrase Strand Transfer Inhibitors

Journal of Microbes and Infection,June 2007,Vol 2,No. 2. (HBsAg)2 , (PCR) 1762/ 1764

DETECTION OF LOW FREQUENCY CXCR4-USING HIV-1 WITH ULTRA-DEEP PYROSEQUENCING. John Archer. Faculty of Life Sciences University of Manchester

Supplementary Figure 1. ALVAC-protein vaccines and macaque immunization. (A) Maximum likelihood

Towards Personalized Medicine: An Improved De Novo Assembly Procedure for Early Detection of Drug Resistant HIV Minor Quasispecies in Patient Samples

HEPATITIS B: are escape mutants of concern?

Self reported ethnicity

UvA-DARE (Digital Academic Repository)

Supplementary Information. Supplementary Figures

Characterizing intra-host influenza virus populations to predict emergence

TITLE: Influenza A (H7N9) virus evolution: Which genetic mutations are antigenically important?

Supporting Information

Lamei Chen, Alla Perlina and Christopher J. Lee /JVI

Molecular genetic characterization of hepatitis epidemiology in Latvia Irina Somiskaya Paul Pumpens

Supplementary Figure 1

NRL EQAS for NAT: Assessing the variability and performance of molecular assays for clinical pathogens

Clinical Applications of Resistance Stuart C. Ray, MD

OTKA azonosító: Típus: K Vezető kutató: Gergely Lajos

Round table discussion Patients with multiresistant virus : A limited number, but a remarkable deal Introduction

De Novo Viral Quasispecies Assembly using Overlap Graphs

Influence of interleukin-18 gene polymorphisms on acute pancreatitis susceptibility in a Chinese population

Phylogenetic Methods

Sabin vaccine reversion in the field: a comprehensive analysis of Sabin-like poliovirus

Resistance of Human Cytomegalovirus to Antiviral Drugs

Supplementary Figure 1. Schematic diagram of o2n-seq. Double-stranded DNA was sheared, end-repaired, and underwent A-tailing by standard protocols.

Supplementary Figure 1. Prevalence of U539C and G540A nucleotide and E172K amino acid substitutions among H9N2 viruses. Full-length H9N2 NS

Molecular Testing in Lung Cancer

Evaluation and Management of Virologic Failure

Spherical Bearings Heavy Duty Equipments

Reliable reconstruction of HIV-1 whole genome haplotypes reveals clonal interference and genetic hitchhiking among immune escape variants

Update on Zoonotic Infections with Variant Influenza A Viruses in the USA

Study Design - GT 1 Retreatment

Basics of hepatitis B diagnostics. Dr Emma Page MRCP MD(Res) Locum Consultant Sexual Health & Virology

aV (modules 1 and 9 are required)

The molecular clock of HIV-1 unveiled through analysis of a known transmission history

Identification of hepatitis B virus DNA reverse transcriptase variants associated with partial response to entecavir

Mapping evolutionary pathways of HIV-1 drug resistance using conditional selection pressure. Christopher Lee, UCLA

Sample Metrics. Allele Frequency (%) Read Depth Ploidy. Gene CDS Effect Protein Effect. LN Metastasis Tumor Purity Computational Pathology 80% 60%

Supplementary Figure 1. Amino acid sequences of GodA and GodA*. Inserted. residues are colored red. Numbers indicate the position of each residue.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

P G K R P E W M G W L K P R G G A V N Y A R P L Q G R V T M T R D V Y S D T A F

Rajesh Kannangai Phone: ; Fax: ; *Corresponding author

HBV PUBLIC HEALTH IMPLICATIONS

Sequence analysis for VP4 of enterovirus 71 isolated in Beijing during 2007 to 2008

Accel-Amplicon Panels

Citation for published version (APA): Von Eije, K. J. (2009). RNAi based gene therapy for HIV-1, from bench to bedside

PBZ FT01_PBZ FT01_TZ FT01_NZ. interface zone (I) tumor zone (TZ) necrotic zone (NZ)

Changing demographics of smoking and its effects during therapy

Transcription:

Supplementary Discussion The source of hepatitis B virus infection To test the possible source of the HBV infection outside the study family, we searched the Genbank and HBV Database (http://hbvdb.ibcp.fr), and the representative sequences with the highest identity score (99%) were included in phylogenetic analysis. The HBV sequences derived from the study family were phylogenetically similar with strong statistical support (> 99 bootstrap replications), indicating that the HBV was acquired from only one source. Although it is also possible that the there were additional sources of HBV not represented in the public available database, we think the possible is low. If they acquired HBV from different sources, the actual HBV divergence between different hosts would be much larger, and the estimated rate of evolution based on the current expectation under mother-to-infant transmission would thus be faster. Providing that the rates of HBV evolution estimated based on mother-to-infant transmission are in a good agreement with the range shown in the literature, the possibility of more than one HBV source is low. Routes of hepatitis B virus transmission According to the colonization - adaptation tradeoff model, every transmission requires the HBV quasispecies to switch from adaptor to colonizer. As shown in Fig. 1, the HBV samples acquired from various individuals had diverged from a common pool, indicating that all individuals were infected by a similar group of viral strains. Thus, compared with the requirements for adaptors, which must continue evolving in response to host immune interaction, the requirements for colonizers are more uniform. Because the HBV must constantly shift between stages of infection, increases in transmission in a given period of time may reduce the time for the HBV to evolve. Thus, in a given period of time, we expect that more HBV transmissions will result in less divergence, if the route of HBV transmission within the family is concordant with mother-to-infant transmission. Otherwise, no such relationship is expected. We subsequently evaluated the relationship between genetic distances and the number of transmissions between individuals. In this study, most of the divergence times between individuals ranged 90-130 years. Because divergence times between D2 and GD2 and between D3 and GD3 were much shorter than the rest pairs, they were not included in this analysis. The Supplementary Fig. 3 clearly illustrates that the number of transmission is negatively correlated with sequence divergence (τ = -0.29, p < 10-2 ; Kendall s rank

correlation). To further evaluate whether chance alone is able to generate this negative correlation. We conducted a simulation by randomizing the pairing the number of transmissions and genetic distance between individuals. The number of transmissions was randomly redistributed to each pairwise comparison and genetic distance. The simulation was repeated 10,000 times and the correlation coefficients were calculated. The results were tallied for each replicate, and distributions were used to determine the p value for the observed. According to simulation, the probability to observe the τ value by random association was only 0.002. Therefore, the most likely route of transmission in this family is mother-to-infant transmission. REFERENCES 1. Günther S, Li BC, Miska S, Kruger DH, Meisel H, Will H. 1995. A novel method for efficient amplification of whole hepatitis B virus genomes permits rapid functional analysis and reveals deletion mutants in immunosuppressed patients. J Virol 69:5437-5444. 2. Wang HY, Chien MH, Huang HP, Chang HC, Wu CC, Chen PJ, Chang MH, Chen DS. 2010. Distinct hepatitis B virus dynamics in the immunotolerant and early immunoclearance phases. J Virol 84:3454-3463.

Supplementary Table 1 Primers used in this study Primers Sequence Nucleotide Direction position A PCR primers B P1 5 -TTTTTCACCTCTGCCTAATCA-3 1821 ~ 1841 Sense P2 5 -AAAAAGTTGCATGGTGCTGG-3 1823 ~ 1806 Antisense Sequencing primers C R2466 5'-GTAAAGTTTCCCACCTTATG-3' -2447~-2426 Antisense F2312 5'-CCTATCTTATCAACACTTC-3' 2330 ~ 2348 Sense F2847 5'-CATGGGAGGTTGGTCTTC-3' 2864 ~ 2881 Sense F226 5'-AATCCTCACAATACCACAGA-3' 245 ~ 264 Sense F761 5'-CCAAGTCTGTACAACATCT-3' 779 ~ 797 Sense F1264 5'-ATCCATACTGCGGAACTC-3' 1281 ~ 1298 Sense A The base positions were numbered according to the reference sequence, NC_003977. B These primers are from Günther et al. (1995)(1). C These primers are from Wang et al. (2010)(2).

Supplementary Table 2 Assembly summary of NGS datasets Sample ID A B C D E F G H J D1-N1 18,734,418 12,415,053 498,452 14,479 11,815,355 12,120 10,899,647 346,184 3,180 D1-N2 19,744,796 13,090,671 584,253 70,420 12,423,910 63,630 9,913,566 318,167 3,147 D2-N1 25,232,036 17,264,625 818,554 72,819 16,324,626 50,383 13,285,825 417,377 3,215 D2-N2 28,321,560 19,393,376 1,272,836 94,547 17,868,725 69,147 13,640,679 430,265 3,202 GD2-N2 22,590,770 15,400,583 979,261 75,402 14,226,568 51,932 9,768,304 316,116 3,121 GD2-N3 22,444,862 14,724,292 715,088 41,762 13,916,990 25,370 10,203,917 330,319 3,120 D3-N1 22,182,802 14,666,817 839,898 49,375 13,703,815 15,652 9,999,471 323,805 3,119 D3-N2 22,750,886 14,928,094 906,127 66,711 13,851,121 47,907 12,872,985 406,812 3,196 GD3-N1 22,496,488 14,895,509 1,539,342 72,693 13,123,922 28,736 11,525,392 362,073 3,215 GD3-N2 19,747,532 13,581,519 675,328 40,485 12,785,764 28,935 9,232,662 292,962 3,183 S1-N 14,970,060 9,799,111 513,867 42,912 9,230,382 26,703 5,864,839 189,977 3,118 S2-N 18,715,324 12,506,474 978,725 81,129 11,389,896 54,857 7,814,961 250,894 3,146 Average 21,494,295 14,388,844 860,144 60,228 13,388,423 39,614 10,418,521 332,079 3,164 A, Number of original sequences; B, Number of high quality sequences; C, Number of high quality unique sequences; D, Number of high quality unique sequences ( 5 count); E, Number of sequences included in high quality unique sequences ( 5 count); F, Number of mapped high quality unique sequences ( 5 count); G, Number of sequences included in mapped high quality unique sequences ( 5 count); H, Average coverage; J, Assembled length (bp)

Supplementary Table 3 Number of substitutions in different open reading frames (ORF) of HBV between serum samples according to (A) majority rule consensus and (B) strict consequence methods based on cloning sequences. (A) Majority rule consensus ORF Number of changes P S X C 0 2440 1151 414 586 1 64 35 11 31 2 21 12 3 9 3 3 2 1 3 4 3 3 0 0 5 1 0 1 0 Total 3041 96 (180.2) A 32 (5.6) 7 (0.1) 3 (0) 1 (0) Number of sites with substitutions 92 52 16 43 139 Number of inferred substitutions B 132 77 25 58 198 Proportion of multiple substitution excess 43.5% 48.1% 56.3% 34.9% 42.4% (B) Strict consequence ORF Number of changes P S X C 0 2461 1160 417 601 1 57 33 8 24 2 10 8 3 3 3 1 1 1 1 4 2 1 1 0 5 0 0 0 0 Total 3080 78 (122.89) A 16 (2.51) 3 (0.03) 2 (0) 0 (0) Number of sites with substitutions 70 28 13 43 99 Number of inferred substitutions B 88 33 21 56 127 Proportion of multiple substitution excess 25.4% 17.9% 61.5% 30.2% 28.0% The data were derived from cloning sequences. P: polymerase; S: surface; C: core. A. The expected numbers of substitutions under Poisson distribution are in parentheses, which are all significantly different from the observed number of substitution by Chi-square test (p < 10-3 ). B. The number of substitutions was inferred by parsimony methods according to the relatedness within the family.

Supplementary Table 4. Summary of nucleotide substitutions at the cutoff of 0.01 by NGS dataset and corresponding amino acid changes in different open reading frames within the study family Position A Host P Freq. B S Freq. X Freq. C Freq. 5 D2 H305P 0.94, 0.01 T125P 0.97, 0.01 10 D3; GM F307L 0.80, 0.04 126 53 D1; D3 L321P 0.85, 0.09 F141L 0.89, 0.10 85 D3 E332k 0.88, 0.11 151 109 GD2 C340S 0.78, 0.05 159 126 GD2; D3 345 L165S 0.89, 0.08 132 D3; GM E347D 0.95, 0.05 K167T 0.95, 0.05 165 GM 358 I178T 0.96, 0.04 201 GD3 370 Q190P 0.99, <0.01 213 GD3 374 F194S 0.99, <0.01 216 GD2; D3 375 L195S 0.85, 0.13 243 D1 384 Q204R 0.99, <0.01 285 GD2 398 G218E 0.86, 0.12 400 D3 L437I 0.93, 0.06 256 453 S1 454 Y274W 0.97, 0.00 GM 454 Y274S 0.97, <0.01 482 GD3 N464T 0.94, 0.02 I284L 0.95, 0.04 508 D1 R473G 0.92, 0.07 292 512 GM T474N 0.98, 0.01 P294T 0.97, 0.01 520 S1 N477D 0.89, 0.10 296 529 D3; GD3 N480D 0.52, 0.41 299 530 D1, S1 N480S 0.52, 0.04 T300A 0.95, 0.03 540 GM 483 Q303R 0.97, 0.01 582 GM 497 T317M 0.99, 0.01 630 D3 513 A333V 0.99, 0.01 634 GM I515L 0.98, 0.00 334 636 D3 515 Y335F 0.90, 0.08 654 GM 521 S341L 0.99, <0.01 720 D3 543 T363I 0.99, 0.01 777 D3 562 I382T 0.97, 0.02 813 GM 574 F394C 0.98, <0.01 1029 GD2 646 1072 GM I661V 0.99, <0.01 1122 GD2 677 1126 GM K679Q 0.88, 0.05 1317 D1; GD2 742 1320 D3; GM K743N 0.92, 0.07 1368 D1; GM 759 1503 D1 804 V44I 0.46, 0.06 GD2 804 V44L 0.46, 0.04 1508 GM Y806F 0.94, 0.05 45 1512 GM 807 T47P 0.96, 0.01 1516 D1 T809S 0.98, 0.02 D48V 0.97, 0.01 1630 D1 H86L 0.93, <0.01 1679 D1 102 1909 D1 32 1913 GD2;GD3 P34A 0.83, <0.01 1915 GD2 P34T 0.83, 0.12

1937 D3 V42A 0.90, 0.05 1978 D1 55 2078 GD2 L89V 0.86, 0.12 2120 D3 S103G 0.89, 0.04 2150, 2151 D3 L113A 0.96, 0.02 2159 D2; GD2 G116S 0.13, 0.78 2189 GD2 L126I 0.11, 0.86 2235 D1 R141K 0.98, 0.10 2288 D1 P159S 0.91, 0.02 2304 GD2 P164Q 0.84, 0.12 2357 S1 E17D 0.91, 0.07 G182C 0.91, 0.05 2363 GD2 19 S184T 0.94, 0.05 2441 GD2 45 S210P 0.90, 0.09 2443 D3 L46P 0.87, 0.10 210 2525 D3 K73N 0.63, 0.12 2537 D3 77 2559 GD2 Q85K 0.92, 0.05 2567 D3 87 2721 D1 N139D 0.96, 0.02 2771 D3 155 2922 GD2 P206S 0.84, 0.16 25 2963 GD3 R219S 0.98, <0.01 E39A 0.84, 0.01 3165 GD2 S287T 0.99, <0.01 106 3172 GD2 I289N 0.79, 0.05 S109T 0.94, 0.06 3208 D2 A301E 0.94, 0.01 Q121K 0.96, 0.02 Nucleotide substitution was defined as major nucleotide frequency greater than 0.99 in NGS data. P: polymerase; S: surface protein; X: X protein; C: core protein. Bold-types are changes from rare to common amino acid residues. A. Nucleotide positions are based on reference sequence NC_003977 B. Amino acid frequencies at each position are calculated from The Hepatitis B Virus database (HBVdb, http://hbvdb.ibcp.fr/hbvdb/hbvdbindex). Only HBV genotype B sequences were included. In total, there were 927 polymerase, 955 large and 2690 small surface, 974 X, and 1105 core proteins sequences. < sign indicates that the amino acid frequency is smaller than 0.01.

Supplementary Table 5 List of sites under positive selection in Wang et al., (2010) (2) ORF Position A Patient Frequency B Polymerase Surface X Core K73N A 0.63, 0.12 L288R G 0.81, 0.01 N480D C 0.52, 0.41 N480D F 0.52, 0.41 N480S G 0.52, 0.04 C602G G 0.17, 0.06 M617L E 0.71, 0.12 K743N F 0.92, 0.07 R841K F 0.83, 0.15 I84T G 0.51, 0.01 L108V G 0.94, 0.01 L182H G 0.95, <0.01 L195S A 0.85, 0.13 L195S E 0.85, 0.13 G218E A 0.86, 0.12 L223R A 0.94, 0.03 Q275H C 0.98, 0.01 T300A G 0.95, 0.03 Q303R C 0.97, 0.01 Q303H F 0.97, 0.01 C395Y E 0.96, 0.01 Y399S E 0.99, <0.01 A85T A 0.97, 0.01 H86P A 0.93, 0.02 K95Q F 0.96, 0.03 N118T A 0.35, 0.61 T118N C 0.61, 0.35 N118T G 0.35, 0.61 M127I F 0.07, 0.46 K130M A 0.79, 0.20 V131I A 0.80, 0.19 I88V E 0.93, 0.06 G92V E 0.97, 0.03 P164Q A 0.84, 0.12 P159L G 0.91, 0.02 T176A A 0.95, 0.04 G182C A 0.91, 0.05 Bold-types are sites overlapped with that of listed in Supplementary Table 4. Underlines indicate changes from rare to common amino acid variants. The hashed cells are regions not included in current NGS data.

A. Nucleotide positions are based on reference sequence NC_003977 B. Amino acid frequencies at each position are calculated from The Hepatitis B Virus database (HBVdb, http://hbvdb.ibcp.fr/hbvdb/hbvdbindex). Only HBV genotype B sequences were included. In total, there were 927 polymerase, 955 large and 2690 small surface, 974 X, and 1105 core proteins sequences. < sign indicates that the amino acid frequency is smaller than 0.01.

(a) (b) (c)

Supplement Figure 1 Time of divergence (in years) versus genetic distance derived from next generation sequencing data The genetic distances were derived from the whole genome (A) and nonsynonymous (B) and synonymous (C) sites of the non-overlapping regions. Diamonds are HBV divergence within hosts. Solid lines represent linear regressions with all data and the correlation coefficients (R 2 ) were 0.19 (p < 10-4 ; Pearson correlation), 0.14 (p < 10-3 ), and 0.26 (p < 10-5 ), respectively, for (A), (B), and (C). Dotted lines represent linear regressions without the diamonds points (between hosts comparisons) and the R 2 were 0.00 (p = 0.76), 0.01 (p = 0.83), and 0.04 (p = 0.04), respectively, for (A), (B), and (C). The dashed line in (A) was based on power regression (R 2 = 0.59; p < 10-15 ). The regression equations in (A) were D = 0.0055 x 4.42 x 10-5 x T for linear regression and D = 0.0013 T 0.431 for power regression, where D is the genetic distance and T is the time of divergence in years.

(a) *** * (b) ** (c)

Supplementary Figure 2 Number of transmissions versus HBV substitution rates derived from next generation sequencing data HBV substitution rates were derived from the whole genome (A) and nonsynonymous (B) and synonymous (C) sites of the non-overlapping regions. For (A) and (B), the substitution rates within hosts (number of transmission = 0) were significantly higher (p < 10-3, Wilcoxon rank sum test) than that of between hosts (number of transmissions = 1, 2, 3, and 4). The substitution rate between hosts decreased as transmission number increased in both (A) (dashed line, τ = -0.47, p < 10-7, Kendall's rank correlation) and (B) (τ = -0.24, p < 10-2 ). *, p < 0.05; **, p < 10-2 ; ***, p <10-5 (Wilcoxon rank sum test).

Supplement Figure 3 Strong negative correlation between number of transmission and genetic distance between individuals The Kendall s rank correlation coefficient was τ = -0.29 (p < 10-2 ). Transmissions between D2 and GD2 and between D3 and GD3 have shorter divergence time than the rest pairs, and were not included in this analysis (see Supplement Discussion for details).