Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber
Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2. Merging LOH with copy number Why? Tips 3. Allele Specific Copy Number Paired vs Unpaired? Detecting Imbalance Visualization 4. Q & A
Data Sets used 20 paired tumor/normal samples Kindly provided by Ian Campbell, Peter MacCallum Cancer Centre Run on the Affymetrix Human SNP 6.0 Arrays 20 HapMap samples run on Illumina 1M
Why study alleles? Interest in copy number is based on the idea that the deletion or duplication of DNA can promote tumorigenesis Deletion of tumor suppressor genes Duplication of oncogenes Fundamentally deletion or duplication is not the important issue, its loss or amplification of functionality or mutation. If everyone was homozygous everywhere, CGH would be all we need. In the heterozygous case if one allele is deleted, or amplified, it can alter the expression of the functional or mutant phenotype.
Focus on heterozygosity In areas of heterozygosity we are not interested in merely CGH changes, instead we want to see how the alleles are changing There are two tools Partek lays at our disposal: LOH- Loss of Heterozygosity ASCN- Allele Specific Copy Number Fundamentally the difference is that LOH is a state. Either you are or you aren t. ASCN will be used to find magnitude and to help simplify images.
Data types and import To analyze for LOH Partek will require genotype calls Partek does not have a genotyping algorithm Import from CHP files or create them from within Partek for Affymetrix data Import from beadstudio project using Partek exporter for Illumina data To analyze for ASCN Partek will also require allele intensities Import CEL files into Partek for Affymetrix data Import from beadstudio project using Partek exporter for Illumina data
Loss of Heterozygosity What is it? Looking across many continuous markers, find genomic regions that contain stretches of SNP markers called heterozygous (AB) in the normal samples, but called homozygous (AA or BB) in the cancer samples. How is it determined? Partek uses the HMM algorithm to find these regions. Blood Tumor Normal Het LOH Homozygous: Heterozygous: LOH:
Why/When is a baseline needed? If you do not have paired data, then the baseline file is used to determine the expected rate of heterozygosity for each SNP This use case is not accurately described as Loss of Heterozygosity, rather it is the detection of runs of homozygosity (ROH), specifically unexpected ones But you have a baseline, right? There are baselines available download drawn from the Hapmap2. If you don t have a random mix of these populations, then the baseline won t give a good estimate of expected frequency! 0% heterozygous 50% heterozygous European Population Japanese with cancer ROH, but is it LOH? Japanese Population Common haplotype block False positive
LOH Paired is preferred when possible For baselines, use a normal population similar to your samples Better expected genotype frequencies Avoiding LOH due to common haplotype blocks within populations
LOH Set Up Dialog - Paired HMM parameters are difficult to tweak. Partek suggests taking the default parameters. A quick primer Max probability: Chance that if a probe is LOH, the next will be as well Genomic Decay: How quickly does the effect of one SNP s LOH on neighboring SNPs status decay Genotype error: Chance that the genotype call is made incorrectly
LOH Set Up Dialog - Unpaired Nearly the same except: Input baseline Default frequency If there is no baseline at all, or all the baseline samples were NCs this is used as default het frequency for the SNPs
LOH creates a segment table One row per sample per LOH region.
Paired LOH question Q: I ran paired LOH, and yet, in some of my LOH regions, the het rate is quite large, even when I set genotype error to 0! A: When looking for paired LOH, we look only at the SNPs that are heterozygous in the normal tissue. In long regions of normal homozygosity, LOH may be detected using just the few heterozygous SNPs. The many homozygous SNPs, due to genotyping error, may become heterozygous in the tumor sample; enough to noticeably effect the het rate in the region. If genotype error is non zero, this actually happens more rarely.
Common LOH Sig-Regions Number of samples and heterozygous rate is a possible filter. Automatically generated with LOH, when box is checked in setup.
LOH and Copy Number Overlap Using the LOH workflow, we can find regions of LOH, common LOH. Even places where LOH differs between groups with a little elbow grease. Using the copy number workflow we can find regions of amplification and deletion. Additionally we can overlap the two to find regions that intersect CN Overlap LOH
LOH and Copy Number Overlap When overlapping you must have the same sample IDs for the LOH analysis and the CN analysis Report is at the sample level, each region falls into one of 6 categories: Amplification, Deletion Amplification + LOH, Deletion + LOH Copy Neutral LOH Why do I care about cnloh? Because allelic imbalance can occur without alteration of chromosomal abundance. And a shift in the allelic balance within a sample can lead to phenotypic alternations.
Copy Neutral LOH is prevalent in Cancer 67% of LOH events in 26 pancreatic cancer cells lines occurs in regions with either copy neutral or copy gain regions Calhoun et al. Can Res. 2006 66:7290 75% of LOH events in cervical cell lines did not show a copy number change Kloth et al. BMC Genomics. 2007 8:53 56% of LOH events in glioblastomas were copy neutral Kuga et al. Neuro-oncology. 2008 10:995
Why Allele Specific copy number (AsCN)? Cancer Studies w/ primary specimens Pure normal population required Does not require paired samples or large reference population Can be run paired (recommended)
What is allele specific copy number? Like LOH, only pay attention to heterozygous SNPs In a heterozygous SNP, we expect balance between our two SNP calls. In fact, in diploid organisms, each SNP becomes an analogue of each allele! If we find that one of our alleles has become larger, the smaller, or both then we have imbalance. LOH is a case of severe imbalance, but in mixed tissue perfect LOH is hard to come by A B Normal LOH In between imbalance
Mixing Tumor & Normal Cell Lines on 50K SNP
Allele Specific CN The workflow
Allele Specific CN Setup No prebuilt references Requires normals to be included in project Normal samples do not have to come from the same experiment, if necessary, download some normal samples off GEO and merge them in
How does ASCN use the baseline? ASCN will only make estimates in areas of heterozygosity In paired data, areas of heterozygosity is defined by normals In unpaired data, areas of heterozygosity are defined from genotypes Yes, unpaired ASCN will have no estimates in areas of LOH as defined by the LOH workflow. They complement each other Not heterozygous areas are given? s ASCN must make an estimate of what intensity maps to 1 copy of an allele In paired data, this is the intensity of the allele in the normal samples In unpaired data, this is the average intensity of the allele in heterozygous samples
Allele Specific Copy Number (AsCN) Two rows per sample
Detect Allelic Imbalances Proportion = (Max-Min) / (Max + Min)
Using Imbalance to Drive Copy # Discovery Sort Descending on Proportion column in imbalance spreadsheet Max = 1.05 and Min = 0.95, then Proportion = 0.1 / 2 = 0.05 Max = 1.9 and Min = 0.1, then Proportion = 1.8 / 2 = 0.90 Large Proportion Values represent Allelic Imbalance
Copy Number vs Allele Specific Copy #
Common questions when viewing ASCN Why is the image sparse? In paired data, homozygous SNPs in the normal are not used. Its possible the region is dominantly homozygous In unpaired data, NC and homozygous SNPs in the sample are not tested. In areas of great aberration or LOH fewer calls are made. Why doesn t it always agree with the LOH? LOH uses the genotype calls. If there is a large dominance of one allele, often the region will be called as homozygous Why doesn t it always agree with copy number or allele ratio? These are all separate algorithms. While they are usually very close in most ways when they don t match up, there should still be a consistent story
Interplay between Copy Number Measures Amplification on p arm. Four clusters in allele ratio. Separation in AsCN