Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions, insertions, substitutions Copy number variations Small inversions Small translocations 1
Chromosomal Mutations (Aberration) Types A fragment of a chromosome joins a homologous chromosome The additional region of genes may be incorporated within the chromosome or at one end of the chromosome, or become attached to another chromosome These changes, if not lethal, may cause profound changes in the phenotype 2
A segment of a chromosome separates and is lost The affected chromosome loses certain genes, and becomes shorter than normal If deletion affects the same gene loci on both homologous chromosomes the effect is usually lethal A segment of a chromosome separates and rejoins it in an inverted position Inversion changes the sequence of nitrogenous bases in the chromosomes Inversion occurs when a region of a chromosome breaks off and rotates through 180 before rejoining the chromosome No change in genotype occurs as a result of inversion but phenotypic changes may be seen This suggests that the order of gene loci on the chromosome is important, a phenomenon known as the position effect 3
A segment of chromosome breaks off and joins a non homologous chromosome Both the affected chromosomes get modified The donor suffers deletion and becomes shorter than normal The recipient has an extra set of genes and becomes longer than normal Reciprocal Translocation 4
Special Case: Robertsonian Translocation Down s Syndrome can occur in two ways Trisomy 21 where a person has three chromosome 21s Or chromosome 21 can be duplicated and translocated onto one of the larger chromosomes Both result in an increase is gene expression of chromosome 21 products 5
Chromosome Abnormalities and Cancer Translocations and inversions create new arrangements of genes Some genes are more highly expressed Others are much less expressed Chromosome position effects gene expression Some of these changes lead to cancer, how? Overexpression of an oncogene Under expression of a tumor suppressor Detection of Structural Variations Large enough variations are visualized directly on chromosomes Chromosome painting, FISH, or SKY 6
Fluorescence In Situ Hybridization uses probes that bind to highly complementary sequences Spectral karyotyping (SKY) visualizes chromosomes by "painting" each pair a different fluorescent color. 7
Detection of Structural Variations Large enough variations are visualized directly on chromosomes Chromosome painting, SKY, or FISH An inversion in Drosophila was observed in the 1920 s Smaller structural variants are more difficult to detect than SNPs Microarrays are the measure of test: reference fluorescence ratio 8
Handling Sequence Data 9
Unfortunately, assembling a human genome de novo i.e. with no prior information of sufficient quality for structural variation studies remains difficult with limited read lengths. In a resequencing approach, one finds differences between an individual genome and a closely related reference genome whose sequence is known by aligning reads from the individual genome to the reference genome. Differences (variants) between the genomes correspond to differences between the aligned reads and the reference sequence. Even when resequencing identifying SVs is not easy Split Read is a strategy when resequencing 10
Paired end mapping (PEM) is another resequencing strategy In PEM, a paired end sequencing protocol is used to obtain paired reads from opposite ends of a larger DNA fragment, or clone, from a individual genome. These paired reads are then aligned to a reference genome. Most paired reads result in concordant pairs where the distance between aligned reads is equal to the fragment length. In contrast, discordant pairs have alignments with abnormal distance or that lie on different chromosomes. These suggest the presence of an SV or a sequencing error. Other types of discordant pairs identify inversions, transpositions, or duplications that distinguish the individual genome from the reference genome. 11
Why are Structural Variations and Large Copy Number Variants Hard to Represent? Start and stop coordinates of a structural variant breakpoint are often uncertain The Solution Display the amount of uncertainty Start/stop coordinates known (bar over the region) Inner start/stop only (breakpoint is outside of bar) Outer start/stop only (breakpoint is inside of bar) Inner/outer start/stop coordinate maxima (breakpoint in shaded region) 12
13
In Class Assignment Align sequences of same color to form a single contiguous sequence Identify which type of structural variation occurred in each colored sequence Display your results using the structural variation notation just discussed Challenges for Cancer Genomics Studies Most cancer genomes are aneuploid, meaning that the number of copies of regions of the genome are variable, due to duplications and deletions of segments of the normal genome. Cancer tissues are a heterogeneous mixture of cells with possibly different numbers of mutations. This heterogeneity includes admixture between normal and cancer cells, as well as subpopulations of tumor cells. Some of these subpopulations might contain important driver mutations, or drug resistance mutations. Because of the amount of DNA required for current sequencing technologies, most cancer genome sequencing studies do not sequence single tumor cells but rather sequence a mixture of cells 14
Reference Readings Raphael, B. Chapter 6: structural variation and medical genomics. PLoS Comput. Biol. 8,e1002821 (2012) "Database of Genomic Structural Variation." National Center for Biotechnology Information. U.S. National Library of Medicine, n.d. Web. http://www.ncbi.nlm.nih.gov/dbvar/content/overview/ Mulenga, M. 5. Chromosome Mutations [PowerPoint Slides]. Retrieved From http://www.slideshare.net/drmulenga/5 chromosommutations 15