SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 doi: /nature16931 Contents 1 Nomenclature 3 2 Fertility of homozygous B6 H/H and heterozygous B6 B6/H mice 3 3 DSB hotspot maps DSB hotspot caller Calling procedure Overlap between hotspots and map comparison Pairwise overlap between DSB hotspot maps DSB hotspot map correlations DSB hotspots in the (B6 PWD)F1 B6/PWD and (B6 PWD)F1 H/PWD mice Motif analyses Overview Motif refinement algorithm Search of motifs in genomes Identification of mutated motifs PWD sequencing and ancestral genome to B6 and PWD Mutated ancestral motifs in DSB hotspots DSB hotspot assignment in hybrids Chromosome-aware DSB calling Attribution of PRDM9 control in hybrids Evidence for erosion of PRDM9 binding motifs Fraction of asymmetric DSB hotspots explained by binding motif disruption Systematic shift towards the chromosome less bound in hybrid mice Histone marks and genomic features in the humanized mouse H3K4me3 marks H3K36me3 marks Strand effect for H3K4me3 and H3K36me Other histone marks

2 6.5 Exons and Transcription Start Sites (TSS) H3K4me3 ChIP-seq Identifying PRDM9 binding peaks and defining PRDM9-independent H3K4me3 regions Estimating haplotype-specific H3K4me3 signals Chromosome effects following Prdm9 humanization Prdm9 humanization of the (PWD B6)F1 PWD/B6 mouse Chromosome effects in DMC1 signal comparisons between infertile and rescue Chromosome effects in H3K4me3 signal comparisons between infertile and rescue Chromosome effect predictors Prdm9 humanization of the wild-type B6 mouse Chromosome effects in DMC1 signal comparisons between B6 B6/B6 and B6 B6/H Chromosome effect predictors DSB hotspot in Hstx PWD deletions at DSB hotspot locations Symmetry metric Further discussions on PRDM9 binding symmetry Fertility levels and DSB symmetry across hybrid mice Additivity of H3K4me3 binding Examining how PRDM9 binding on the homologue impacts DMC1 measures Addressing our results in the context of an individual cell Within population dynamics Supplementary Tables

3 1 Nomenclature Wild-type C57BL/6J and PWD/PhJ mice are referred to as B6 and PWD throughout the text. In general, we refer to specific mice by writing their genomic background in uppercase and their Prdm9 alleles using superscripts. In homozygous mice, when the Prdm9 allele is the same as the genomic background, we sometimes do not use superscripts to simplify notation (e.g. B6 and PWD wild-type mice carry the B6 and PWD Prdm9 alleles homozygously, respectively). F1 hybrids, the progeny of an outcross between two different inbred strains, are named according to their inbred parents, with the strain of the female written first. For example, the infertile cross is named (PWD B6)F1 PWD/B6. Unless stated otherwise, we always refer to male mice throughout this work. The humanized model described in this manuscript has been attributed the allele name Prdm9 tm1(prdm9)wthg (MGI Accession ID MGI: ). 2 Fertility of homozygous B6 H/H and heterozygous B6 B6/H mice Both male and female B6 H/H and B6 B6/H mice show normal fertility (Extended Data Fig. 2a) and Mendelian transmission of the humanized allele. Detailed cytogenetic analysis revealed no major abnormalities in DSB counts (DMC1 immunoreactivity, Extended Data Fig. 2b), crossover counts (MLH1 foci, Extended Data Fig. 2c), and normal sex body formation (γh2ax immunostaining, Fig. 1b) in heterozygous and homozygous humanized male mice. No differences in quantitative measures of fertility and successful synapsis were found between genotypes (Extended Data Fig. 2d). 3 DSB hotspot maps 3.1 DSB hotspot caller The formation of a DSB leaves a strand specific signal, due to the 3 to 5 activity of the exonuclease. To take the resulting asymmetry of the coverage of the ChIP-seq reads between the two strands into account, we developed a specific DSB hotspot caller. The assay, and the available coverage, do not permit the detection of DSBs in individual cells, but by piling up signal coming from multiple testis cells, it enables the detection of genomic regions at which DSBs occur frequently. The novel algorithm takes advantage of the shift in the mapping of single-stranded DNA (ssdna) reads between the 5 and the 3 DNA strands to call hotspots. These ssdna segments are a consequence of the resection of DNA ends that accompanies a DSB and are isolated by DMC1 ChIP amplification followed by high through-put sequencing (Methods). For each hotspot, the caller estimates the centre of the hotspot and its heat, loosely defined as the number of reads mapping to this DSB hotspot and predicted to represent real signal (as opposed to background reads). The DSB heat is proportional to the fraction of cells possessing 3

4 unrepaired breaks decorated by DMC1 at that position. The caller handles sample replicates and is able to call hotspots using several samples jointly. To test whether the genomic position p is the centre of a DSB hotspot, we compared the two following models: No hotspot model: the DNA segment surrounding the position p is divided into six bins, namely left, centre and right bins on each of the two strands, of lengths l e, l c and l e respectively. The distribution of the number of reads N i±, i {l, c, r} in each bin is assumed to be Poisson, with a mean parameter proportional to the length of the bin: N l± Po(l e λ ), N c± Po(l c λ ), and N r± Po(l e λ ), where λ is the factor of proportionality. The N i±, i {l, c, r}, are assumed to be independent. Letting n i,s, i {l, c, r}, s {+, }, be the observed number of reads falling within bin i on strand s, respectively, we estimate λ by its maximum likelihood estimator λ = 1 2(2l e + l c ) i {l,r,c},s {+, } Hotspot model: six bins are defined, as previously, around the position p. The {right, +} and {left, } bins only carry background signal, while the four other bins exhibit both background and hotspot signals, so that the distributions of the N i±, i {l, c, r}, become: n i,s. N l+ Po(l e (λ + λ b )), N c+ Po(l c (λ + λ b )), N r+ Po(l e λ b ), N l Po(l e λ b ), N c Po(l c (λ + λ b )), and N r Po(l e (λ + λ b )), where λ b is the rate of background binding, and λ is the rate at which reads representing true hotspot signal occur. These parameters are estimated by maximum likelihood as: { λ = max 1 2(l e + l c ) λ b = 1 (n r+ + n l ), 2l e ( n l+ + n c+ + n c + n r l e λ b (n r+ + n l ) ) }, 0. Because the no hotspot model is nested within the hotspot model, a simple loglikelihood ratio test can be applied to test for the presence of a DSB hotspot at the position p. Under the null hypothesis of the absence of hotspot at p, the likelihood ratio is assumed to follow a chi-squared distribution with 1 degree of freedom. If a hotspot is called (see Supplementary Information, Subsection 3.2), we refer to the quantity 2(l e + l c ) λ as the heat of the hotspot. 4

5 3.2 Calling procedure Only fragments mapping to some genomic position with mapping quality of 20 or above are considered. Duplicate fragments, defined as fragments mapping to the same genomic coordinates, are removed. A conservative, three-step procedure is then applied: 1. We test for the presence of a DSB hotspot every 250bp along chromosomes using the above model, with l e = 750bp and l c = 500bp. Positions p i associated with a p- value below 10 4 are retained. Potential DSB hotspot segments are then computed by considering the segments [p i 1000, p i ], and overlapping segments are merged, producing a final list [a j, b j ] of potential segments. 2. Segments overlapping with hotspots called in the ChIP-seq control using this procedure are discarded. 3. For each remaining segment [a j, b j ], we independently compute the likelihood ratio for each genomic position within the range [a j + 600, b j 600], with the modified parameters l e = 550bp and l c = 300bp to get a better resolution. The position p for which the likelihood ratio is maximal is reported to be the centre of the DSB hotspot, and the heat of the hotspot subsequently used is the one we obtained under this model. Supplementary Table 1 gives details about the number of DSB hotspots called in each sample. As each meiotic cell presents between 200 and 400 DSBs [1], we estimate that at most 1% of DSB hotspots called ( 15 17, 000) are active in any given meiosis. Modified procedure for replicates To combine the two B6 B6/H replicates, we modified the above procedure in the following way. Letting l 1 and l 2 (resp. l 1, l 2 ) be the log-likelihoods of the no hotspot (resp hotspot ) model for replicates 1 and 2, we defined the new likelihood ratio to be Λ = 2 ( l 1 + l 2 l 1 l ) 2 and assumed Λ follows a chi-squared distribution with d = 2 degrees of freedom under the null hypothesis of the absence of a hotspot. The inference procedure is otherwise identical to the one described above, that is, is composed of the three steps described above. The same calling p-value threshold (10 4 ) was used. The joint call set (using both replicates) for the B6 B6/H mouse was used in the main text. 3.3 Overlap between hotspots and map comparison Pairwise overlap between DSB hotspot maps Given a DSB hotspot map ( donor map), we asked what proportion of the DSB hotspots in the donor map was present in a recipient sample. To do this, we set an overlapping p-value threshold p overlap, and asked which hotspots from the donor map are found 5

6 in the recipient sample. Specifically, for each hotspot of the donor map, occurring at position p j on chromosome c, we re-ran the detailed hotspot calling procedure (step 3 in Supplementary Information Subsection 3.2) in the recipient sample in the segment [p j 600, p j +600] (with a j = b j = p j, where a j and b j are defined in step 3 of Subsection 3.2). If any position in the segment [p j 600, p j + 600] is found to be a DSB hotspot at a p-value p < p overlap, the hotspot from the donor is reported to overlap a hotspot in the recipient sample. Thus, two hotspots reported to overlap will have a distance of at most 600bp between their centres. Unless stated otherwise, we used p overlap = 10 3 throughout analyses in this study. Supplementary Table 2 gives the pairwise overlapping fractions of DSB hotspots. Hotspots in the humanized heterozygous mouse are on average hotter if they are shared with the homozygous humanized mouse rather than with the wild-type mouse (Extended Data Fig. 3d). This analysis also revealed a non-linear effect in heat transmission between one of the homozygotes (B6 B6/B6 ) and the heterozygote, suggesting a saturation effect in B6 B6/B DSB hotspot map correlations To correlate DSB maps, we divided the genome into non-overlapping bins of fixed size (Extended Data Fig. 3b). The heat assigned to a bin is the sum of the heats of DSB hotspots whose centres fall within the bin. We used the Pearson s correlation coefficient in the correlation analyses. We studied the correlation between the DSB hotspot maps obtained for homozygous humanized and wild-type mice at different scales (ranging from 1kb to 10Mb). At the 10kb-scale, a very weak correlation between the human and mouse DSB hotspots maps was observed (r=0.016), reflecting the usage of a different set of hotspots in the two mice. At larger-scales however, the correlation increased considerably (Extended Data Fig. 3b). This same pattern held when comparing the rates between mice with different mouse Prdm9 alleles, including comparison with the Prdm9 knockout mouse [2]. In contrast, different mice strains with the same Prdm9 allele showed strongly correlated DSB maps at all genomic scales, similar to the strong correlation obtained between mouse replicates. These results confirm a major role for PRDM9 zinc finger array binding specificities alone in defining the fine-scale recombination landscape, but also indicate that the determinants of recombination rates at large scales are likely to include factors other than PRDM DSB hotspots in the (B6 PWD)F1 B6/PWD and (B6 PWD)F1 H/PWD mice Similarly to the comparison between the infertile and the humanized rescue mice, we found that a genome-wide effect of humanizing the mouse is to strongly reduce hotspot asymmetry in the reciprocal rescue (B6 PWD)F1 H/PWD, compared to the (B6 PWD)- F1 B6/PWD mouse. DSB hotspots only active in the reciprocal humanized rescue (B6 PWD)- F1 H/PWD (and not in the reciprocal (B6 PWD)F1 B6/PWD ), so attributable to the human allele, occur mainly (57%) in hotspots showing approximate symmetry between 6

7 chromosomes. Conversely, only 20% of DSBs at hotspots present exclusively in the reciprocal (B6 PWD)F1 B6/PWD mouse attributable to the B6 allele occur in symmetric hotspots. See also Extended Data Fig. 6a-c. Section 5.1 describes our procedure to define hotspots for which a fraction of informative reads from the B6 chromosome can be calculated, and how this fraction is then calculated. In the above analysis, symmetric hotspots were defined as those for which the fraction of informative reads from the B6 chromosome could be computed, and for which this fraction fell in the range (inclusive). Asymmetric hotspots were conversely defined as those whose fraction fell outside this range. 4 Motif analyses 4.1 Overview Search for motif enrichment at hotspot centres was performed using MEME-ChIP (MEME Suite version 4.9.1) and a motif refinement algorithm we developed (see Supplementary Information Subsection 4.2). DNA sequences of 500bp, centred at each DSB hotspot position, were considered, but only 500bp sequences displaying no more than 10 unknown ( N ) or repeat-masked bases were retained for the motif discovery analysis (referred to as the discovery sequences). For each sample, the procedure is as follows. The motif with the most significant enrichment p-value reported by MEME-ChIP (with parameters ccut 0, nmeme 1200) was selected as the first raw motif, and passed on to the motif refinement algorithm. Next, based on the output of the motif refinement algorithm, sequences containing the refined motif were excluded from the discovery sequences pool. The remaining discovery sequences were then analysed by MEME-ChIP again, using the same parameters, and the motif that was the most similar to the previously selected motif was selected as the new raw motif, and passed on to the motif refinement algorithm. The process is iterated until no new credible motif arises from the MEME-ChIP step. We then searched for these refined motifs in all 500bp sequences using our motif refinement algorithm (for this last step, the motifs were fixed). The algorithm is constrained to report no more than one motif per 500bp sequence. A final run of the MEME-ChIP discovery algorithm on these 500bp sequences for which no motif has been found only yields repetitive sequences. We used FIMO (MEME Suite version 4.9.1) to locate the identified motifs in the mm10 mouse genome (see Subsection 4.3), as well as the reconstructed ancestral genome for B6 and PWD mice (see Supplementary Information Subsection 4.4). All motifs with an associated FIMO-reported p-value below 10 4 were kept. 4.2 Motif refinement algorithm We developed a new, Bayesian approach to characterise DNA motifs enriched in a set of sequences. The method is able to refine position weight matrices (PWMs) of initially supplied motifs, e.g. those suggested by other approaches (e.g., MEME-ChIP). In short, the algorithm samples fragments containing the motifs and iteratively updates 7

8 the PWMs via a Gibbs sampler, until convergence; in practice after a set number of iterations. The method utilises a triplet based background sequence model. It is able to infer several motifs at the same time, and assumes that each sequence contains at most one motif. This allows separation of even closely related motifs, e.g. those found using different arrangements of PRDM9 zinc fingers. Moreover, the approach outputs probabilities of motif presence for each region, and infers a distribution of where motifs lie within each sequence, improving motif identification against background. The algorithm is presented in the case where all input sequences have the same length l. 1. Input: - n sequences, each of length l, some of which may contain instances of one of k motifs; - k PWMs, denoted PWM (0) i, i = 1, 2,..., k; - the distribution on the proportion of sequences containing each motif, (α 1,..., α k ), and no motif (α 0 ), where as a prior (α 0, α 1,..., α n ) Dirichlet( 1 2, 1 2k,..., 1 2k ) (so in particular k i=0 α i = 1). By default, initially, α 0 = 1 2, and α i = 1 2k, for i 1; - prior distribution (β t ) t=1,2,...,10 Dirichlet(1, 1,..., 1) on the region of any sequence in which the beginning of the motif will fall. Each sequence is divided in 10 bins of equal size, and β t is the probability that the t-th bin has a motif in it (given the sequence has a motif). By default, the vector is uniformly initialised: β t = 1 10, for t = 1, 2,..., 10; - prior on PWMs: uniform Dirichlet(1,1,1,1) prior on each base within a motif, and a geometric prior on the number of bases within the motif, such that the mean motif length is 20 bp. 2. Initialisation: background model (a) From the input sequences, estimate the probability vector q r3 r 1 r 2 of having the base r 3 {A, C, G, T }, given we observe the bases r 1 and r 2 in the triplet (r 1, r 2, r 3 ), at any particular position in any sequence. The vector q is simply estimated by its maximum likelihood estimator. (b) Using the vector q, compute the probabilities f ijps of observing a particular fragment starting at position p in the j-th sequence, on strand s, and having the same length as the i-th motif. So f ijps represents the probability of a particular fragment occurring under the background model, in which fragments are generated according to the conditional probabilities q r3 r 1 r Iteration: m-th step (a) Compute the likelihood b ijps of observing the sequence starting at position p on the strand s in sequence j, if motif i begins at this position. This is 8

9 simply done by assuming independence between positions under the motif model (PWM). At each (motif) position, the relative probability of having a particular base is given by e PWM(m) i (by definition, PWMs are on a log scale, and scaled so these probabilities sum to 1). (b) Compute the likelihood ratio b ijps = b ijps f ijps the background model. of the sequence in (a), relative to (c) Compute the posterior probability of a particular motif occurring at a certain position as π ijps = c j α i β ti bijps, where t i = 10i l and the normalisation constant is c j = {α 0 + i,p,s α iβ ti bijps } 1. Note this relies on Bayes formula, and on our assumption that at most one motif can arise on any one sequence. (d) For the j-th sequence, assign no motif with probability 1 ips π ijps. Otherwise, assign the i-th motif at position p on strand s to this sequence, with π probability ijps ips π. ijps (e) Update the i-th motif by computing the new matrix PWM (m+1) i. This is done by estimating the PWM from all the fragments carrying the i-th motif that have been sampled from step (3d). Specifically, fragments sampled as containing the i-th motif, and extended by 25 base pairs on both ends, are used to create a count array of bases at each motif position among motif occurrences. Conditional on these occurrences of motif i, we sample a new motif length (using the geometric prior) and then sample a new PWM (given the length of the motif, we take this new PWM to be that maximising the posterior log-likelihood, rather than sampling from the full posterior distribution). (f) Update remaining model parameters, including the fractions of sequences containing each motif, and the prior on where motifs fall within each region. These updates use conjugacy properties of the chosen priors. Motifs obtained using this method for B6, PWD and humanized homozygous mice are reported in Extended Data Fig. 4a-c using logo plots to represent motifs. The information content (y-axis, bits) at a given position for the i-th base is computed as p i {2 j (p j log 2 p j )}, where p i is the relative frequency of base i, as given by the PWM. We find that binding motifs are enriched at the center of DSB hotspots (Extended Data Fig. 4e-g). 4.3 Search of motifs in genomes The FIMO module from MEME-ChIP (MEME Suite version 4.9.1) was used to scan genomes for matches to motifs obtained from the motif search within DSB sequences. For the B6 and humanized homozygous mice, the motif with the smallest enrichment p- value reported by MEME-ChIP was considered (Extended Data Fig. 4), hereafter called B6 and human motifs (obtained from B6 and humanized homozygous DSB sequences, respectively). 9

10 Sequence motifs enriched within the DSB hotspots in humanized B6 H/H and wildtype B6 B6/B6 mice were found to closely match the previously reported PRDM9 binding motifs for human and mouse, respectively [3, 4] (Extended Data Fig. 4). Overall, 85% and 78% of the DSB hotspots were inferred to contain a binding motif in the homozygous humanized and wild-type mice respectively. Motifs were most enriched at hotspot centres (Extended Data Fig. 4e-g), and hotter hotspots contained a strong PRDM9 binding motif more often than weaker hotspots (data not shown). 4.4 Identification of mutated motifs PWD sequencing and ancestral genome to B6 and PWD We sequenced the genome of a wild-type PWD male mouse at 50 coverage on Illumina HiSeq 2500 (Wafergen DNA library prep kit). We obtained paired-end reads of 150bp each, which we aligned to the reference genome mm10 using Stampy [5]. We subsequently used the variant caller Platypus [6] to call variants in the PWD mouse. Only variants reported with a PASS filter were kept in the following analysis. Multiallelic variants were ignored. Using Mus famulus and Mus caroli as outgroups, we reconstructed an ancestral reference genome for B6 and PWD. Specifically, whole genome resequencing data for these two mouse subspecies were obtained from [7, 8] and re-mapped to mm10 using Stampy. We then used Platypus to genotype PWD variants in both Mus famulus and Mus caroli. For a given PWD variant, several situations can arise : 1. A genotype has been called in both Mus famulus and Mus caroli. If both subspecies are homozygous for the PWD allele, or if one is homozygous and the other is heterozygous, the PWD variant is labelled as ancestral. Conversely, if both subspecies are homozygous and carry the B6 allele, or if one is homozygous for the B6 allele and the other is heterozygous, the PWD variant is labelled as derived. 2. A genotype has only been called in one of Mus famulus and Mus caroli. In this case, the PWD variant is marked as ancestral if the subspecies for which a call has been reported is homozygous for the PWD allele. Conversely, if the subspecies is homozygous for the B6 allele, the PWD variant is called as derived. 3. Otherwise, the ancestral status of the variant is set to missing. Using this ancestral classification of the PWD variants, we were able to recreate an ancestral reference genome for B6 and PWD using the following procedure. For each mm10 reference chromosome, start at the left end part of the chromosome and start walking to its right end. At each genomic position in the reference, if a PWD variant is encountered, either (i) replace the B6 (mm10) allele by the PWD allele if the PWD variant is ancestral, (ii) leave the B6 (mm10) allele if the PWD variant is derived, or (iii) replace the B6 (mm10) allele by the PWD allele with probability 1/2 if the ancestral status of the PWD variant is missing. Hence, while walking along the mm10 reference 10

11 sequence, we created an ancestral reference sequence. In the process, we kept a record of coordinate correspondences between the two reference genomes Mutated ancestral motifs in DSB hotspots To obtain a set of ancestral motifs, the FIMO module from MEME-ChIP (MEME Suite version 4.9.1) was used to scan the ancestral genome for matches to motifs obtained from the motif search within DSB sequences in B6 and PWD homozygous mice. To derive the fraction of DSB hotspots for which a variant is found in an ancestral motif, we applied the following procedure. Looking at, say, the hotspots that are under control of the B6 Prdm9 allele, we first restricted the hotspot list to hotspots that have exactly one occurrence of the B6 motif, within 300bp of the hotspot centre, in the ancestral reference. For all these hotspots, we only considered the 1kb segments centred at the middle of the B6 motif. For these segments, we computed the fraction of segments that harboured at least one variant (SNPs and indels) on the B6 and PWD lineages, in a 30bp window sliding from the beginning to the end of the 1kb segment. As we work from the DSB perspective throughout, while computing the fraction, each segment is weighted by the heat of the hotspot to which it corresponds. The fraction of segments that harboured at least one variant on the B6 and PWD lineage in PWD motifs are computed in the same way. The point estimates obtained along the segments are shown in Extended Data Fig. 6g. Each binding motif called in the ancestral genome is assigned a score, which is defined as the logarithm of the probability that this motif was drawn from the motif s PWM. We also computed the corresponding scores, for each motif, in both the B6 and PWD lineages, using the following procedure. For each motif in the ancestral lineage, we created the corresponding motifs for B6 and PWD lineages by substituting any ancestral base affected by a point mutation in either B6 or PWD. We then computed the motif scores for these modified motifs in the same way as previously. No score was assigned to motifs that were affected by variants that were not point mutations. Given a specific PRDM9 control (either B6 or PWD) and a specific lineage (either B6 or PWD), we computed the differences in motif score, which is simply the difference between the lineage specific and the ancestral scores (Extended Data Fig. 6g). A negative difference indicates the motif was worsened by the change along its lineage, when compared to the initial (ancestral) motif. 5 DSB hotspot assignment in hybrids 5.1 Chromosome-aware DSB calling Using the list of single nucleotide polymorphisms (SNPs) from the PWD variant calling (see Supplementary Information Subsection 4.4), each read pair from a hybrid DSB library is assigned to one of the categories B6, PWD, unclassified or uninformative using the following criteria: 11

12 1. if one or both reads of a pair overlap SNP positions from the list and the alleles carried by the read(s) are from the B6 genome, the pair is classified as B6; 2. if one or both reads of a pair overlap SNP positions from the list and the alleles carried by the read(s) are from the PWD genome, the pair is classified as PWD; 3. if one or both reads of a pair overlap SNP positions from the list and the alleles carried by the read(s) are from both B6 and PWD genomes, the pair is unclassified; 4. otherwise, the pair is uninformative. Then, for each DSB hotspot in a sample, we extract all reads mapping within 1kb of the centre of the hotspot. Using these reads, we compute the fraction of reads coming from B6 chromosome as the number of paired reads classified as B6 over the total number of paired reads classified either as B6 or PWD. The ratio is not computed if fewer than 10 pairs are classified as B6 or PWD in this genomic segment, or if more than 10% of the pairs marked B6, PWD or unclassified are actually unclassified. The fraction of reads coming from B6 chromosome is also used to classify DSB hotspots as active on either (i) the B6 chromosome, if the ratio is greater than 80%, (ii) the PWD chromosome, if ratio is less than 20%, (iii) both chromosomes, if the ratio is between 20% and 80%. 5.2 Attribution of PRDM9 control in hybrids Using both the DSB maps in homozygous samples (B6, PWD, B6 H/H and B6 / ) and PRDM9 binding motif calls on the B6/PWD ancestral reference sequence, we were able to classify DSB hotspots as being under either B6, humanized or PWD Prdm9 allele control in the hybrids. In the case of the infertile (PWD B6)F1 PWD/B6 hybrid, for each DSB hotspot: 1. if the DSB hotspot overlaps (at the p-value < 10 4 threshold) DSB hotspots (see Supplementary Information Subsection 3.3.1) from both B6 and PWD maps, or if the DSB hotspot overlaps with hotspots from B6 /, the PRDM9 control for this hotspot is undetermined ; 2. if the DSB hotspot overlaps with hotspots from exactly one of the B6 or PWD maps, the DSB hotspot is set to be under PRDM9-B6 or PRDM9-PWD control, respectively; 3. if the DSB hotspot does not overlap with the homozygous mice DSB maps, and if in the 600bp segment centred at the centre of the hotspot (in the ancestral reference sequence), there are only binding motifs from one of the Prdm9 alleles (B6 or PWD), then this hotspot is set to be under the control of the corresponding PRDM9; 4. otherwise, the PRDM9 control of the hotspot is undetermined. The humanized rescue (PWD B6)F1 PWD/H hybrid is treated in a very similar manner, replacing all occurrences of the B6 allele of PRDM9 by its humanized version. 12

13 5.3 Evidence for erosion of PRDM9 binding motifs Attributing DSB hotspots to chromosomes and to specific Prdm9 alleles, along with mutation detection using an ancestral reference (see Subsection 4.4), allows the detailed study of mutational patterns within and around PRDM9 binding sites on the B6 and PWD lineage (Extended Data Fig. 6g). For the B6 motifs that are totally lost in the B6 lineage (no hotspot called in the wild-type B6, but called in the infertile mouse on the PWD chromosome), the maximum point enrichment of both SNPs and indels within the motif, compared to genomic background, is 7 fold (data not shown). These analyses provide strong evidence that motif erosion is occurring at PRDM9 binding targets, and indicate that indels, along with point mutations, also play an important role in motif erosion. Results suggest that the B6 motif is more eroded than the PWD motif (Extended Data Fig. 6g). This could be explained either by the B6 allele being older or more common in the past than the PWD allele, or by a larger ancestral effective population size for the B6 mouse. We also observe that mutations within and around the motifs (0-200bp) are subjected to strong GC bias (data not shown) and that the hotspot heats in the ancestral lineage are associated with an increased GC bias around the binding motifs. This constitutes strong evidence for the GC biased gene conversion phenomenon at DSB hotspots in mice [9]. In the infertile mice, although a high fraction of DSB hotspots under the control of the B6 allele of Prdm9 occur solely on the PWD chromosome, due to motif erosion on the B6 lineage (and vice-versa for the PWD allele), we note, as expected, that a small fraction of hotspots harbour the opposite pattern: these hotspots occur preferentially on the B6 chromosome (Fig. 2b). This is explained by chance loss of the B6 motif on the PWD lineage (vice-versa for the PWD motif), and not meiotic drive. Besides, somewhat conversely, as shown by the blue dotted line in Extended Data Fig. 6g, a small number of B6 motifs are gained on the PWD lineage. This again likely results from random mutations (not under any drive or selection) creating new binding motifs on the PWD lineage. 5.4 Fraction of asymmetric DSB hotspots explained by binding motif disruption To assess the extent to which mutations within PRDM9 binding motifs explain the observed chromosomal asymmetry in DSB initiation, we compared the fraction of mutated motifs in symmetric versus asymmetric hotspots. Specifically, for DSB hotspots under B6 PRDM9 control in the (PWD B6)F1 PWD/B6 mouse, we defined for this analysis the symmetric hotspots as those with a fraction of reads coming from B6 chromosome in the range (inclusive), and with a hotspot heat of at least 50 (to reduce uncertainty in the ratio estimate). Asymmetric hotspots were defined as those with a fraction less than or equal to 0.1 and with a hotspot heat of at least 50. For both symmetric and asymmetric hotspots, we further restricted ourselves to DSB hotspots containing a clear motif match: exactly one B6 binding motif in the ancestral sequence (see Subsection 4.4.2), within 250bp of the DSB hotspot centre. In 13

14 both classes of hotspots, we then computed the fraction of hotspots whose binding motif was mutated (SNP or indel) between the B6 and PWD lineages. Letting f s and f a denote these fractions in the symmetric and asymmetric cases respectively, we finally computed the fraction of hotspots whose asymmetry could be explained by a mutation in the binding motif, using the symmetric hotspots to correct for chance SNPs occurring within the motif (probability f s ), as f e := 1 (1 f a )/(1 f s ). For DSB hotspots under PWD PRDM9 control in the (PWD B6)F1 PWD/B6 mouse, we followed the same approach, but we defined the asymmetric hotspots as those with a fraction greater than or equal to 0.9. Finally, for DSB hotspots under the control of humanized PRDM9 in the (PWD B6)- F1 PWD/H mouse, we followed the same approach, with two modifications. First, we used exact matches to the 13bp motif CCNCCNTNNCCNC to conservatively identify human binding motifs. We considered SNPs falling within the human refined motif (Extended Data Fig. 4b), which was naturally defined by extending the above 13bp motif seed. Second, we defined asymmetric hotspots as those with either a fraction less than or equal to 0.1 or greater than or equal to 0.9. The fractions f a, f s and f e in those three cases are reported in Supplementary Table 3. In all cases, we could explain a very high fraction (> 83%) of asymmetric hotspots by mutations in the binding motif (affecting differently the binding motifs on the B6 and the PWD chromosomes). 5.5 Systematic shift towards the chromosome less bound in hybrid mice We detected a subtle systematic shift (p < 10 15, binomial test) in DSB heat towards the chromosome where overall, PRDM9 bound less often, in the infertile mouse (Fig. 2a), moving the central peak of DSB activity away from 50% on each chromosome. This shift might reflect interference or compensation acting towards an equal number of DSBs occurring on each homologue, or alternatively differential persistence e.g. through longer repair time, of DSBs marked by DMC1 on the respective chromosomes. 6 Histone marks and genomic features in the humanized mouse PRDM9 binding motifs have varying probabilities of becoming the centre of a DSB hotspot, and some may never be used in this respect. To gain some information on the determinants of such probabilities, we compared the distributions in ChIP-seq coverage of several epigenetic marks around binding motifs in standard B6 testis cells, differentiating between motifs within DSB hotspots and those which lie outside. ChIP-seq peaks in 8 week mouse testis for histone modifications H3K4me1, H3K4me3, H3K27ac, H3K27me3 and H3K36me3 were obtained from the Mouse Encode Project [10]. When available, corresponding marks in heart, kidney and liver tissues were also used for comparison. We defined a DSB hotspot as overlapping a 14

15 particular mark if the centre of a hotspot was within 10 bp away from a peak for this mark. 6.1 H3K4me3 marks For the non-specific tissues we considered (heart, kidney and liver), the H3K4me3 enrichment was 2.4 times higher at the mouse motifs outside DSB hotspots than at those within DSB hotspots identified in the wild-type mouse (Extended Data Fig. 5a). In testis, however, the high enrichment of H3K4me3 marks at the motifs within DSB hotspots reflects the trimethyl-transferase activity of PRDM9 PR/SET domain [11]. Interestingly, mean H3K4me3 enrichment at motifs outside DSB hotspots was not significantly different in testis compared to other tissues, which suggests a low trimethylation activity of H3K4 by PRDM9 at motifs that are not the centre of DSB hotspots. Besides, human motifs outside DSB hotspots found in the humanized mouse are enriched in H3K4me3 marks, with the enrichment being stronger than observed in mouse, possibly because of the GC rich nature of the human motif (Extended Data Fig. 4b). As the ChIP-seq experiment is performed in standard B6 testis cells, this enrichment of H3K4me3 marks cannot be attributable to the action of PRDM9 at the human motifs within DSB hotspots. Therefore, the observed H3K4me3 status at human motifs within DSB hotspots must reflect the H3K4me3 status before PRDM9 binding, which is similar across tissues. To quantify the enrichment of H3K4me3 marks at motifs outside DSB hotspots, we further looked at H3K4me3 marks in kidney. For each motif within a DSB hotspot, we selected a matched motif outside DSB hotspots (at least 2kb away from nearest hotspot) in highly mappable regions. Using Odds Ratios (OR) comparing the number of motifs enriched and not enriched in H3K4me3 within and outside DSBs, we found that, for both human and mouse motifs, H3K4me3 enriched motifs are significantly enriched outside DSB hotspots compared to within (human OR = 1.18, 95CI (1.118, 1.236), p-value = ; mouse OR = 1.05, 95CI (1.002, 1.111), p-value = 0.04). Similar results are obtained with H3K4me3 marks in heart tissue. This suggests that the H3K4me3 marks in other tissues at these motifs are a good proxy for the H3K4me3 patterns prior to the action of PRDM9. Hence, in both mouse and human cases, a motif is more likely to become a DSB hotspot if its local environment is depleted for H3K4me3. Since H3K4me3 marks are enriched at promoters, this suggests that the presence of H3K4me3 marks (prior to PRDM9 activity) is involved in directing PRDM9 away from promoter regions. We also performed the same comparisons using our in-house generated H3K4me3 data which gave almost indistinguishable results. For consistency across all comparisons (i.e. H3K36me3, H3K27me3, H3K1Me3, H3K27ac) we present the published consortium data for this analysis. Brick et al. [4] had earlier shown that in the mouse, the use of PRDM9 to target sites for DSBs has the effect of moving DSBs away from promoters to the specific DNA sequence motifs targeted by PRDM9, in contrast for example to yeast, and to the Prdm9 knockout mouse (B6 / ). Here we show further that amongst instances of the binding motif in the humanized mouse (B6 H/H ), DSBs occur preferentially at motifs which do not carry H3K4me3 marks, with a similar, but weaker, effect in the wild-type B6 mouse. 15

16 6.2 H3K36me3 marks Recent evidence suggests PRDM9 is also able to trimethylate H3K36 [12]. We indeed find an enrichment of H3K36me3 signal at motifs that fall within a mouse DSB hotspot in testis, likely reflecting that this newly discovered activity of PRDM9 is associated with PRDM9 binding. However, neither enrichment nor depletion is seen in other tissues, or in motifs that lie outside DSB hotspots (Extended Data Fig. 5b). Furthermore, no significant differences are seen around human motifs, regardless of whether they are within or outside DSB hotspots (Extended Data Fig. 5b). It thus appears that H3K36me3 status does not influence PRDM9 binding, which suggests an importance of this mark (if any) downstream to PRDM9 binding in DSB formation. 6.3 Strand effect for H3K4me3 and H3K36me3 In order to better understand biochemical mechanism underlying PRDM9 function, we asked whether its trimethylation activity harbours any strand specific effect. After centering each mouse DSB hotspot to the nearest binding motif, the mean distributions for both H3K4me3 and H3K36me3 marks shows higher densities 5 of the motif, and this effect is seen on both strands (Extended Data Fig. 5c). It is unclear whether this asymmetry reflects a preference of PRDM9 to trimethylate 5 histones, or whether this preferential methylation pattern at DSB hotspots is selected downstream of PRDM9 binding. 6.4 Other histone marks We also investigated other histone marks (Extended Data Fig. 5d-h). We found in particular that H3K4me1 displays a very similar, though weaker, enrichment pattern to H3K4me3 (with DSB sites), possibly capturing e.g. a transient (de)methylation state. Moreover, H3K27me3 was found to be strongly depleted around B6 allele controlled hotspots. The histone modification H3K27me3 is involved in regulation of transcription and may be impoverished around PRDM9-induced H3K4me3 marks. Two acetyl marks (H3K9ac, H3K27ac) also show differential enrichment depending on whether or not motifs are within DSB hotspots (with motifs within DSB hotspots showing a depletion of the histone mark, while motifs outside DSB hotspots show an enrichment of the histone mark). 6.5 Exons and Transcription Start Sites (TSS) We investigated the relationship between mouse exons and our DSB hotspot maps in both wild-type B6 and humanized mice. Genomic coordinates from genes, exons and transcription start sites were retrieved from known gene tables from the UCSC gene track (mouse assembly mm9/ncbi37). Both B6 and humanized hotspots were significantly enriched in mouse exons, that span 2.9% of the mouse genome, with 10.7% and 11.0% overlapping an exon, respectively. We then asked whether these results are due to differences in numbers of motifs within and outside of exons, or is being in an exon 16

17 a significant predictor of whether or not a motif will become a hotspot. In the case of the B6 hotspots, the effect is fully explained by an enrichment of motifs in exons (OR = 0.99, 95CI (0.93, 1.05)), but not for human hotspots, which remain significantly enriched within exons after controlling for the distribution of motifs (OR = 1.15, 95CI (1.11, 1.21), p-value = ). Furthermore, hotter hotspots overlap exons slightly more often than weaker ones. Mouse exons thus appear to be in a slightly favourable conformation to allow DSBs to take place, irrespective of the Prdm9 allele present. To investigate the overlap between DSB maps and TSS, we asked what was the proportion of DSB hotspots from a particular map that overlap (at p overlap < 10 4 ) with the 200 bp segments centred around each transcription start site, as defined by the UCSC gene table. In both wild-type B6 and humanized homozygotes, we found that 1.2% of the DSB hotspots were overlapping a TSS segment thus defined, indicating that as previously observed [4], most hotspots do not overlap TSS. 7 H3K4me3 ChIP-seq 7.1 Identifying PRDM9 binding peaks and defining PRDM9-independent H3K4me3 regions Peak calling was performed using a maximum-likelihood-based peak calling algorithm that uses fragment coverage information from both sequencing replicates and the total chromatin control (specified in [13]). For each bin in the genome the approach estimates a ChIP enrichment value relative to local background, and it also provides genome-wide estimates of the proportion of reads originating from signal vs. background, giving an estimate of the purity of each replicate. For de novo identification of enriched regions we merged adjacent 100bp non-overlapping bins with p < 10 5, genome-wide. To enable filtering of H3K4me3 peaks corresponding to promoters and other PRDM9-independent sources, we identified H3K4me3-enriched regions shared among any pair of mice with different Prdm9 alleles. Specifically, we took [ (PWD B6)F1 PWD/H B6 B6/B6 ] [ (PWD B6)F1 PWD/B6 B6 H/H ] [ (B6 PWD)F1 B6/PWD B6 H/H ] to define a set of regions likely to be trimethylated independently of PRDM9. For comparisons of H3K4me3 and DMC1 signals at DSB hotspots, we used the same approach to estimate H3K4me3 enrichment in a 1kb bin centred on the midpoint of a given DSB hotspot. For downstream analyses, we removed DSB hotspots overlapping any of the PRDM9- independent H3K4me3 sites defined above. When directly comparing H3K4me3 enrichment between different mice (as in Fig. 5b), we normalised H3K4me3 enrichment to the sum across all DSB hotspots being compared. For de novo peak calling in H3K4me3-enriched regions (defined above), the base with the maximum read coverage within each region was chosen as the peak centre. Then, around each peak centre we computed coverage and performed likelihood ratio testing in a 1kb window (in keeping with the 1kb windows used for force calling around DMC1 midpoints). For peak calling with published single-end H3K4me3 ChIP-seq data from a (B6 CAST)F1 B6/CAST mouse [14], we performed all steps identically, but we computed 17

18 read coverage instead of fragment coverage across the genome. 7.2 Estimating haplotype-specific H3K4me3 signals H3K4me3 ChIP-seq reads overlapping the 1kb region surrounding each DSB hotspot centre were compared with a list of biallelic SNPs distinguishing the PWD genome from the B6 genome (described in Supplementary Information Subsection 4.4.1). Reads matching one or more PWD alleles and no B6 alleles at these sites (with base quality 20) were assigned to the PWD haplotype, and vice versa. Reads not overlapping any SNPs and reads matching alleles from both haplotypes were excluded. We then subtracted the expected background coverage at each site using information from the input lane and from our peak calling algorithm. For example, to estimate B6 haplotype coverage after subtracting expected background, we computed: d B6 rep1 + db6 rep2 0.5(α 1 + α 2 )(d B6 input + dp input W D ), where for example db6 rep1 represents the number of B6-assigned read pairs from ChIP replicate 1, and α 1 and α 2 are constants relating expected background coverage in each replicate to the input signal (these are estimated genome-wide by the peak calling algorithm). Any resulting background-subtracted coverage values below zero were set equal to zero, and then background-subtracted B6 coverage was divided by total background-subtracted B6 plus PWD coverage to estimate the proportion of H3K4me3 signal from the B6 chromosome, or assigned undetermined if there were fewer than 10 haplotype-informative reads per hotspot. This proportion was then multiplied by the total H3K4me3 enrichment estimate at each hotspot to yield a haplotype-specific enrichment estimate. 8 Chromosome effects following Prdm9 humanization 8.1 Prdm9 humanization of the (PWD B6)F1 PWD/B6 mouse Chromosome effects in DMC1 signal comparisons between infertile and rescue To test for potential chromosome effects differentiating DMC1 or H3K4me3 signals between the infertile and rescue mice, we compared signal intensities at hotspots shared between those two mice. Specifically, we defined shared hotspots as those hotspots whose estimated centres (by DMC1 signal) in the infertile and in the rescue are no more than 500bp apart. We further required the hotspots to be under PWD PRDM9 control in both mice (as expected), and we restricted our analyses to the hotspots whose H3K4me3 signal is greater than the median signal, in each of the infertile and the rescue, amongst the shared hotspots (this is because of the relative higher level of background noise in H3K4me3 ChIP-seq, and to enable us to compare identical hotspots as in Subsection 8.1.2). Finally, we only considered hotspots on the autosomes. After applying the different filters, the list of shared hotspots comprised 1,536 DSB hotspots. To avoid biasing any potential chromosome effect estimates by differences in overall DMC1 heat distributions between mice (for example, a non-linear relationship between 18

19 heat in the infertile and rescue mice, of which there is some visual evidence at higher heats), we used two approaches. In the first, we matched chromosomes for their DMC1 hotspot heat distribution in the (fertile) rescue mouse before comparing them to the infertile mouse. In the second, we used a generalized linear model which can explicitly model such a non-linear effect (see below). Both methods yielded almost identical estimates (r 2 = 0.97). In the main text, we report results using the generalized linear model, which is (in our view) a more powerful approach that directly compares hotspots, and accordingly give somewhat smaller standard errors on estimates. All the results described in this work hold, essentially identically, with the first, non-model based approach. Both methods estimate the ratio of DMC1 heat between the infertile and rescue mice. First approach to estimate chromosome effects To apply the first approach, we divided all DMC1 hotspots (across all autosomes) in the rescue mouse into 5 bins based on their heats, each bin containing the same number of hotspots. For each shared hotspot j, wj rescue hotspots on that chromosome fall into the same bin as the hotspot j, using DMC1 heats in the rescue mouse, and this gives a weight for this bin, which we applied to equalise heat across chromosomes in the rescue mouse. We then simply computed the ratio of average DMC1 heat in the infertile mouse, to heat in the rescue mouse (β i, 1 i 19), on each chromosome, in the following manner: letting d infertile j and d rescue j be the DMC1 heats at the jth shared hotspot, we formed ˆβ i = all hotspots j on ith chr all hotspots j on ith chr 1 w rescue j 1 w rescue j d infertile j d rescue j To ensure the observed chromosome effects were not due to biases introduced by specific filters, we varied the number of bins used to match DMC1 heat distributions (5 to 20), applied filters to cap extreme outlying values of DMC1, H3K4me3, or both, and also removed the lower bound (median) filters on H3K4me3. In those different cases, we found very similar chromosome effects: the median r 2 when comparing the ˆβ i s derived above (under the previously specified filters) with those obtained with the varying conditions was 0.98 (95CI: (0.95, 0.99)). We finally defined the chromosome effects on the log scale: ˆβi = log( ˆβ i ). Second approach to estimate chromosome effects We also applied a second method to estimate chromosome effects. We used a generalised linear modelling approach to model our DMC1 heats (which are based on counts of reads), namely a quasi-poisson model ( quasi allows overdispersion relative to the Poisson case, e.g. due to rescaling). We employed the default canonical (log link) function for this analysis: ( [ ]) log E d infertile d rescue, c 19 = α + γ log (d rescue ) + βi P 1 {c=i}. i=1 19

ChIP-seq data analysis

ChIP-seq data analysis ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional

More information

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological

More information

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory Computational aspects of ChIP-seq John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory ChIP-seq Using highthroughput sequencing to investigate DNA

More information

Peak-calling for ChIP-seq and ATAC-seq

Peak-calling for ChIP-seq and ATAC-seq Peak-calling for ChIP-seq and ATAC-seq Shamith Samarajiwa CRUK Autumn School in Bioinformatics 2017 University of Cambridge Overview Peak-calling: identify enriched (signal) regions in ChIP-seq or ATAC-seq

More information

Nature Structural & Molecular Biology: doi: /nsmb.2419

Nature Structural & Molecular Biology: doi: /nsmb.2419 Supplementary Figure 1 Mapped sequence reads and nucleosome occupancies. (a) Distribution of sequencing reads on the mouse reference genome for chromosome 14 as an example. The number of reads in a 1 Mb

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers Gordon Blackshields Senior Bioinformatician Source BioScience 1 To Cancer Genetics Studies

More information

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand ChIP-seq analysis J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand Tuesday : quick introduction to ChIP-seq and peak-calling (Presentation + Practical session)

More information

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc. Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes.

Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. Supplementary Figure 1 Relationship between genomic features and distributions of RS1 and RS3 rearrangements in breast cancer genomes. (a,b) Values of coefficients associated with genomic features, separately

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0

The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used

More information

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Accessing and Using ENCODE Data Dr. Peggy J. Farnham 1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000

More information

Structural Variation and Medical Genomics

Structural Variation and Medical Genomics Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions,

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies

More information

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library

Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Advance Your Genomic Research Using Targeted Resequencing with SeqCap EZ Library Marilou Wijdicks International Product Manager Research For Life Science Research Only. Not for Use in Diagnostic Procedures.

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Nature Genetics: doi: /ng Supplementary Figure 1

Nature Genetics: doi: /ng Supplementary Figure 1 Supplementary Figure 1 Expression deviation of the genes mapped to gene-wise recurrent mutations in the TCGA breast cancer cohort (top) and the TCGA lung cancer cohort (bottom). For each gene (each pair

More information

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab

Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features. Tyler Yue Lab Supervised Learner for the Prediction of Hi-C Interaction Counts and Determination of Influential Features Tyler Derr @ Yue Lab tsd5037@psu.edu Background Hi-C is a chromosome conformation capture (3C)

More information

Genomic structural variation

Genomic structural variation Genomic structural variation Mario Cáceres The new genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural

More information

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection

Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Multiplex target enrichment using DNA indexing for ultra-high throughput variant detection Dr Elaine Kenny Neuropsychiatric Genetics Research Group Institute of Molecular Medicine Trinity College Dublin

More information

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016

Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf. Rachel Schwope EPIGEN May 24-27, 2016 Patterns of Histone Methylation and Chromatin Organization in Grapevine Leaf Rachel Schwope EPIGEN May 24-27, 2016 What does H3K4 methylation do? Plant of interest: Vitis vinifera Culturally important

More information

MIR retrotransposon sequences provide insulators to the human genome

MIR retrotransposon sequences provide insulators to the human genome Supplementary Information: MIR retrotransposon sequences provide insulators to the human genome Jianrong Wang, Cristina Vicente-García, Davide Seruggia, Eduardo Moltó, Ana Fernandez- Miñán, Ana Neto, Elbert

More information

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA

Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) cancer DNA www.impactjournals.com/oncotarget/ Oncotarget, Supplementary Materials 2016 Single-strand DNA library preparation improves sequencing of formalin-fixed and paraffin-embedded (FFPE) DNA Supplementary Materials

More information

GENETIC LINKAGE ANALYSIS

GENETIC LINKAGE ANALYSIS Atlas of Genetics and Cytogenetics in Oncology and Haematology GENETIC LINKAGE ANALYSIS * I- Recombination fraction II- Definition of the "lod score" of a family III- Test for linkage IV- Estimation of

More information

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data Haipeng Xing, Yifan Mo, Will Liao, Michael Q. Zhang Clayton Davis and Geoffrey

More information

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Not IN Our Genes - A Different Kind of Inheritance! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014 Epigenetics in Mainstream Media Epigenetics *Current definition:

More information

The Epigenome Tools 2: ChIP-Seq and Data Analysis

The Epigenome Tools 2: ChIP-Seq and Data Analysis The Epigenome Tools 2: ChIP-Seq and Data Analysis Chongzhi Zang zang@virginia.edu http://zanglab.com PHS5705: Public Health Genomics March 20, 2017 1 Outline Epigenome: basics review ChIP-seq overview

More information

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery

CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery CRISPR/Cas9 Enrichment and Long-read WGS for Structural Variant Discovery PacBio CoLab Session October 20, 2017 For Research Use Only. Not for use in diagnostics procedures. Copyright 2017 by Pacific Biosciences

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research

Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Mutation Detection and CNV Analysis for Illumina Sequencing data from HaloPlex Target Enrichment Panels using NextGENe Software for Clinical Research Application Note Authors John McGuigan, Megan Manion,

More information

Colorspace & Matching

Colorspace & Matching Colorspace & Matching Outline Color space and 2-base-encoding Quality Values and filtering Mapping algorithm and considerations Estimate accuracy Coverage 2 2008 Applied Biosystems Color Space Properties

More information

LECTURE 32 GENETICS OF INVERSIONS. A. Pairing of inversion genotypes:

LECTURE 32 GENETICS OF INVERSIONS. A. Pairing of inversion genotypes: LECTURE 32 GENETICS OF INVERSIONS A. Pairing of inversion genotypes: 1. Characteristic inversion loops form only in chromosomal heterozygotes of both para- and pericentric inversions. Based on the inversion

More information

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types.

Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. Supplementary Figure 1 Comparison of open chromatin regions between dentate granule cells and other tissues and neural cell types. (a) Pearson correlation heatmap among open chromatin profiles of different

More information

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs

ChIP-seq hands-on. Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs ChIP-seq hands-on Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs Main goals Becoming familiar with essential tools and formats Visualizing and contextualizing raw data Understand

More information

Variations in Chromosome Structure & Function. Ch. 8

Variations in Chromosome Structure & Function. Ch. 8 Variations in Chromosome Structure & Function Ch. 8 1 INTRODUCTION! Genetic variation refers to differences between members of the same species or those of different species Allelic variations are due

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Global variation in copy number in the human genome Redon et. al. Nature 444:444-454 (2006) 12.03.2007 Tarmo Puurand Study 270 individuals (HapMap collection) Affymetrix 500K Whole Genome TilePath (WGTP)

More information

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. Supplementary Figure Legends Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63. A. Screenshot of the UCSC genome browser from normalized RNAPII and RNA-seq ChIP-seq data

More information

Supplemental Information For: The genetics of splicing in neuroblastoma

Supplemental Information For: The genetics of splicing in neuroblastoma Supplemental Information For: The genetics of splicing in neuroblastoma Justin Chen, Christopher S. Hackett, Shile Zhang, Young K. Song, Robert J.A. Bell, Annette M. Molinaro, David A. Quigley, Allan Balmain,

More information

Research Strategy: 1. Background and Significance

Research Strategy: 1. Background and Significance Research Strategy: 1. Background and Significance 1.1. Heterogeneity is a common feature of cancer. A better understanding of this heterogeneity may present therapeutic opportunities: Intratumor heterogeneity

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma. Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz Software Manual Institute of Bioinformatics, Johannes Kepler University Linz cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Mapping by recurrence and modelling the mutation rate

Mapping by recurrence and modelling the mutation rate Mapping by recurrence and modelling the mutation rate Shamil Sunyaev Broad Institute of M.I.T. and Harvard Current knowledge is from Comparative genomics Experimental systems: yeast reporter assays Potential

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing

Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing Iso-Seq Method Updates and Target Enrichment Without Amplification for SMRT Sequencing PacBio Americas User Group Meeting Sample Prep Workshop June.27.2017 Tyson Clark, Ph.D. For Research Use Only. Not

More information

High Throughput Sequence (HTS) data analysis. Lei Zhou

High Throughput Sequence (HTS) data analysis. Lei Zhou High Throughput Sequence (HTS) data analysis Lei Zhou (leizhou@ufl.edu) High Throughput Sequence (HTS) data analysis 1. Representation of HTS data. 2. Visualization of HTS data. 3. Discovering genomic

More information

Chapter 2. Mitosis and Meiosis

Chapter 2. Mitosis and Meiosis Chapter 2. Mitosis and Meiosis Chromosome Theory of Heredity What structures within cells correspond to genes? The development of genetics took a major step forward by accepting the notion that the genes

More information

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data Bioinformatics methods, models and applications to disease Alex Essebier ChIP-seq experiment To

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz Software Manual Institute of Bioinformatics, Johannes Kepler University Linz cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Lab 5: Testing Hypotheses about Patterns of Inheritance

Lab 5: Testing Hypotheses about Patterns of Inheritance Lab 5: Testing Hypotheses about Patterns of Inheritance How do we talk about genetic information? Each cell in living organisms contains DNA. DNA is made of nucleotide subunits arranged in very long strands.

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

SALSA MLPA KIT P060-B2 SMA

SALSA MLPA KIT P060-B2 SMA SALSA MLPA KIT P6-B2 SMA Lot 111, 511: As compared to the previous version B1 (lot 11), the 88 and 96 nt DNA Denaturation control fragments have been replaced (QDX2). Please note that, in contrast to the

More information

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012 Elucidating Transcriptional Regulation at Multiple Scales Using High-Throughput Sequencing, Data Integration, and Computational Methods Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc.

Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Complete Genomics, Inc. Dr Rick Tearle Senior Applications Specialist, EMEA Complete Genomics Topics Overview of Data Processing Pipeline Overview of Data Files 2 DNA Nano-Ball (DNB) Read Structure Genome : acgtacatgcattcacacatgcttagctatctctcgccag

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints Supplementary Figure 1 Using DNA barcode-labeled MHC multimers to generate TCR fingerprints (a) Schematic overview of the workflow behind a TCR fingerprint. Each peptide position of the original peptide

More information

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis

A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis APPLICATION NOTE Cell-Free DNA Isolation Kit A complete next-generation sequencing workfl ow for circulating cell-free DNA isolation and analysis Abstract Circulating cell-free DNA (cfdna) has been shown

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms.

Nature Genetics: doi: /ng Supplementary Figure 1. Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms. Supplementary Figure 1 Immunofluorescence (IF) confirms absence of H3K9me in met-2 set-25 worms. IF images of wild-type (wt) and met-2 set-25 worms showing the loss of H3K9me2/me3 at the indicated developmental

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Pan-cancer analysis of global and local DNA methylation variation a) Variations in global DNA methylation are shown as measured by averaging the genome-wide

More information

Below, we included the point-to-point response to the comments of both reviewers.

Below, we included the point-to-point response to the comments of both reviewers. To the Editor and Reviewers: We would like to thank the editor and reviewers for careful reading, and constructive suggestions for our manuscript. According to comments from both reviewers, we have comprehensively

More information

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni.

Supplementary Figure 1. Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Nature Immunology: doi: /ni. Supplementary Figure 1 Efficiency of Mll4 deletion and its effect on T cell populations in the periphery. Expression of Mll4 floxed alleles (16-19) in naive CD4 + T cells isolated from lymph nodes and

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Task timeline for Solo and Info trials.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Task timeline for Solo and Info trials. Supplementary Figure 1 Task timeline for Solo and Info trials. Each trial started with a New Round screen. Participants made a series of choices between two gambles, one of which was objectively riskier

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Supplementary Information. Supplementary Figures

Supplementary Information. Supplementary Figures Supplementary Information Supplementary Figures.8 57 essential gene density 2 1.5 LTR insert frequency diversity DEL.5 DUP.5 INV.5 TRA 1 2 3 4 5 1 2 3 4 1 2 Supplementary Figure 1. Locations and minor

More information

Data mining with Ensembl Biomart. Stéphanie Le Gras

Data mining with Ensembl Biomart. Stéphanie Le Gras Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:

More information

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells.

Nature Immunology: doi: /ni Supplementary Figure 1. DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells. Supplementary Figure 1 DNA-methylation machinery is essential for silencing of Cd4 in cytotoxic T cells. (a) Scheme for the retroviral shrna screen. (b) Histogram showing CD4 expression (MFI) in WT cytotoxic

More information

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time.

Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time. Supplement for: CD4 cell dynamics in untreated HIV-1 infection: overall rates, and effects of age, viral load, gender and calendar time. Anne Cori* 1, Michael Pickles* 1, Ard van Sighem 2, Luuk Gras 2,

More information

Supplementary Methods

Supplementary Methods Supplementary Methods Short Read Preprocessing Reads are preprocessed differently according to how they will be used: detection of the variant in the tumor, discovery of an artifact in the normal or for

More information

Hands-On Ten The BRCA1 Gene and Protein

Hands-On Ten The BRCA1 Gene and Protein Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such

More information

Nature Biotechnology: doi: /nbt.1904

Nature Biotechnology: doi: /nbt.1904 Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is

More information

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22. Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.32 PCOS locus after conditioning for the lead SNP rs10993397;

More information

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing

SUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION

More information

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD,

Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Heintzman, ND, Stuart, RK, Hon, G, Fu, Y, Ching, CW, Hawkins, RD, Barrera, LO, Van Calcar, S, Qu, C, Ching, KA, Wang, W, Weng, Z, Green, RD, Crawford, GE, Ren, B (2007) Distinct and predictive chromatin

More information

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information) Statistical Assessment of the Global Regulatory Role of Histone Acetylation in Saccharomyces cerevisiae (Support Information) Authors: Guo-Cheng Yuan, Ping Ma, Wenxuan Zhong and Jun S. Liu Linear Relationship

More information

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types Epigenome Informatics Workshop Bioinformatics Research Laboratory 1 Introduction Active or inactive states of transcription factor

More information

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells CAMDA 2009 October 5, 2009 STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells Guohua Wang 1, Yadong Wang 1, Denan Zhang 1, Mingxiang Teng 1,2, Lang Li 2, and Yunlong Liu 2 Harbin

More information

The Association Design and a Continuous Phenotype

The Association Design and a Continuous Phenotype PSYC 5102: Association Design & Continuous Phenotypes (4/4/07) 1 The Association Design and a Continuous Phenotype The purpose of this note is to demonstrate how to perform a population-based association

More information

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region

Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region Supplemental Figure S1. Tertiles of FKBP5 promoter methylation and internal regulatory region methylation in relation to PSS and fetal coupling. A, PSS values for participants whose placentas showed low,

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes

Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,

More information

Chromosome Structure & Recombination

Chromosome Structure & Recombination Chromosome Structure & Recombination (CHAPTER 8- Brooker Text) April 4 & 9, 2007 BIO 184 Dr. Tom Peavy Genetic variation refers to differences between members of the same species or those of different

More information

MODULE NO.14: Y-Chromosome Testing

MODULE NO.14: Y-Chromosome Testing SUBJECT Paper No. and Title Module No. and Title Module Tag FORENSIC SIENCE PAPER No.13: DNA Forensics MODULE No.21: Y-Chromosome Testing FSC_P13_M21 TABLE OF CONTENTS 1. Learning Outcome 2. Introduction:

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer: Bioinformatics/Genomics of Cancer: Professor of Computer Science, Mathematics and Cell Biology Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of

More information

MRC-Holland MLPA. Description version 19;

MRC-Holland MLPA. Description version 19; SALSA MLPA probemix P6-B2 SMA Lot B2-712, B2-312, B2-111, B2-511: As compared to the previous version B1 (lot B1-11), the 88 and 96 nt DNA Denaturation control fragments have been replaced (QDX2). SPINAL

More information

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis HMG Advance Access published December 21, 2012 Human Molecular Genetics, 2012 1 13 doi:10.1093/hmg/dds512 Whole-genome detection of disease-associated deletions or excess homozygosity in a case control

More information

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing Last update: 05/10/2017 MODULE 4: SPLICING Lesson Plan: Title MEG LAAKSO Removal of introns from messenger RNA by splicing Objectives Identify splice donor and acceptor sites that are best supported by

More information

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach)

Single SNP/Gene Analysis. Typical Results of GWAS Analysis (Single SNP Approach) Typical Results of GWAS Analysis (Single SNP Approach) High-Throughput Sequencing Course Gene-Set Analysis Biostatistics and Bioinformatics Summer 28 Section Introduction What is Gene Set Analysis? Many names for gene set analysis: Pathway analysis Gene set

More information

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder

Introduction to linkage and family based designs to study the genetic epidemiology of complex traits. Harold Snieder Introduction to linkage and family based designs to study the genetic epidemiology of complex traits Harold Snieder Overview of presentation Designs: population vs. family based Mendelian vs. complex diseases/traits

More information