Systematic Analysis for Identification of Genes Impacting Cancers

Size: px
Start display at page:

Download "Systematic Analysis for Identification of Genes Impacting Cancers"

Transcription

1 Systematic Analysis for Identification of Genes Impacting Cancers Arpita Singhal Stanford University Saint Francis High School ABSTRACT Currently, vast amounts of molecular information involving genomic characterizations exist for various types of cancers. However, the integration of the various forms of biological data, necessary for a better understanding of the key processes underlying cancer, remains challenging. This project uses microarray based comparative genomic hybridization (acgh) data to study genomic alterations on various tumor samples, with the statistical procedures in R. To find the hidden copy number states of each chromosome to characterize genomic alterations, this project utilizes Hidden Markov Models on datasets from cancer patients. The efficacy of the homogeneous and heterogeneous Hidden Markov Models is evaluated against the known truth of simulated data by looking at the true positive rates and false discovery rates for breakpoint detection. This project mainly determines the number and types of copy number variations present in the chromosomes of the tumor datasets, obtained from the The Cancer Genome Atlas portal. Recurrent chromosomal aberrations at particular genome locations may indicate the presence of tumor suppressor genes or oncogenes. After recognizing the chromosomes with high copy number changes, genes causing these high copy number variations are identified. The association between chromosomal location and cancer phenotype provides a more reliable and informative cancer genome characterization that can lead to useful insights into cancer biology for further disease classification, prognosis, and personalized treatment. INTRODUCTION A central issue in cancer biology is the identification of specific chromosomal regions that are involved in cancer progression and other biological processes. Unbalanced chromosomal abnormalities, that result in gains and losses of chromosomal segments, often cause several human genetic disorders, including cancer. Driven by an accumulation of genetic and epigenetic changes, tumors represent altered levels of gene expression and the disruption of normal cell growth and survival. A variety of cancers exhibit gains in protooncogenes and losses in tumor suppressor genes; thus, growth-limiting functions and self-repair processes of cancerous regions are often seriously harmed. The genomic alterations, observed in tumors, reflect underlying failures in the maintenance of genetic stability. Copy Number Variations (CNVs) Copy Number Variations collectively describe deletions, insertions, duplications, and other complex variants present in the human genome. Redon et al. (2006) defined a CNV as a DNA segment of one kilobase or larger that is present at a variable copy number in comparison to a reference genome. A CNV can be simple in structure, such as a duplication, or it may involve complex gains or losses of homologous sequences at multiple sites in the genome. Chromosomal copy numbers are defined to be 2 for normal cells, 1 or 0 for single and double deletions, and 3 or higher for single copy gains or higher level amplifications. Figure 1 shows the various forms of chromosome changes. Cancer progression is usually a result of copy number variations, which may represent the over-expression of proto-oncogenes or down-regulation of tumor suppressor genes in cancer genomes. Structural variations, such as CNVs, influence the expression of different phenotypic traits and are found to impact various diseases and affect the development of tumors. DNA copynumber variations are used in cancer research, by searching for novel genes involved in cancers through the analysis of genes located in specific regions. Thereby, it is of considerable importance to identify as precisely as possible the chromosomal regions with abnormal copy numbers. 65

2 Array CGH Through the use of microarray based comparative genomic hybridization, the regions of genes with altered copy numbers can be identified. This technique characterizes the relationship between target sequences on an unknown test genome and reference genome. Array CGH has been developed to identify CNV expression within cancerous regions. As an indispensable tool to understand disease mechanisms, acgh detects and maps changes in the copy number of DNA sequences and can be used to analyze tumor genomes and chromosomal aberrations. The log-ratio values, obtained from the acgh data, are used as the emissions in the Hidden Markov Model, in order to find the hidden states representing the copy numbers of the chromosomes. This technique uses a test DNA sample, such as tumor genomic DNA, and a reference DNA sample, such as normal genomic DNA, that are both labeled with different fluorescent dyes. The DNA samples are then combined with unlabeled Cot-1 DNA, a reagent used to block repetitive DNA sequences and prevent nonspecific hybridization. The two samples are hybridized together onto a microarray, and a microarray scanner is used to measure the fluorescent signals and capture digital images. The fluorescence intensity signals from labeled DNA, hybridized on target probes, are processed and normalized. The difference between the intensity signals of each probe from the test and reference genomes is expressed as a log ratio and can be analyzed to detect genomic alterations and aberrations. In the ideal case, the log ratio is equal to 0, demonstrating that no copy change has occurred in that region of the genome; however, a higher or lower log ratio implies a change in copy number. The calculation of the log ratios determines the copy number variation. The log ratio always changes due to the test intensity while the reference intensity stays constant at 2, representing the homozygous phenotype in the normal sample. When the tumor sample has no copy of the particular region identified on the chromosome, a value log2(0/2) equal to infinity is seen indicating that region of the chromosome has experienced a homozygous deletion. The log2(1/2) value is observed when the copy number is equal to 1; since log2(1/2) is equal to -1, a heterozygous deletion has occurred. When the tumor intensity is equal to 3, the log2 ratio of (3/2) is calculated and results in 0.585, implying that a heterozygous duplication has taken place. Lastly, when the tumor intensity is equal to 4, the log2 ratio of (4/2) is calculated and results in 1 and implies that a homozygous duplication has occurred. The array CGH is further analyzed with appropriate statistical methods. A log ratio greater than 1 represents a higher number of target sequences in the test genome when compared to the reference genome; conversely, a log ratio less than one indicates a lower number of target sequences in the test genome. However, the complexity of eukaryotic genomes often causes the total signal of a microarray hybridization to be diluted and makes acgh data noisy and inappropriate in determining the accurate copy number of a region. Thus, methods that can accurately use acgh data must be implemented. The analysis of acgh data can help determine the location of DNA copy number aberrations within the tumor genome for improved cancer diagnosis, drug development, and molecular therapy. A representation of the micro-array based comparative genomic hybridization is shown in Figure 2. With more array CGH data sets emerging, more efficient algorithms that detect regions of gains and losses are necessary to provide an accurate estimate of error for the detection. The research conducted for this study uses an algorithm to categorize the chromosomes based on the types of copy number aberrations to accurately identify genes relevant to tumor progression.the objectives of this project are (1) analyze cancer genomic data in order to predict the hidden number states for each chromosomal region and (2) use the hidden number states of each region to accurately identify proto-oncogenes and tumor suppressor genes. 66

3 General approach used in this project The approach used in this project can be divided into the following six steps which are discussed in detail later. 1. Upload data from Data Portal 2. Normalization of Data 3. Segmentation of data 4. Applying Hidden Markov model 5. Results a. Comparing the Efficacy of the Hidden Markov Models for True Positive Rate (TPR) and False Discovery Rate (FDR) b. Detection of Genes through Analysis of gains and Losses Previous Approaches forarray CGH Data Analysis With more array CGH data sets emerging, more efficient algorithms that detect regions of gains or losses and provide an accurate estimate of error for the detection are necessary. Previously, researchers have devised means for analyzing the array CGH data sets. Wang et al. (2004) used the method of Clustering Along Chromosomes to detect the signal regions by depicting the spatial structure within genomic alterations. Olshen et al. (2004) utilized circular binary segmentation to segment a chromosome into connecting regions and illustrate a parametric model of the data with its use of a permutation reference distribution. However, these methods do not take into account the various biological covariates, including the distance between clones, that impact segmentation of the array CGH data. The research conducted for this study uses an algorithm to categorize the chromosomes based on the types of copy number aberrations to accurately identify genes relevant to tumor progression. The Cancer Genome Atlas (TCGA) The array CGH data used for this project was obtained from the TCGA data portal. This platform allows access to data sets, and it provides various types of data, including clinical information, genomic characterization data, and high level sequence analysis of the tumor genomes. METHODS Data The data used for this project was obtained from the TCGA platform. GBM Level 1 Array CGH data, from the Agilent Human Genome CGH Microarray 244A platform processed at the Harvard Medical School Center, was downloaded from the TCGA data portal. Level 1 data represents raw signals per probe for each participant s tumor sample. All data sets were processed using the R packages Bioconductor (Gentleman, 2004), limma (Smyth and Speed, 2003), and snapcgh (Smith, 2009). Data Normalization during Pre-Processing Raw array CGH data often has many experimental and biological factors that make it difficult to identify the true copy number for a genomic clone. Biological factors include the purity and ploidy of a sample. In order to correct this issue, background correction and normalization techniques were performed on each array. With normalization, the ploidy of the reference sample no longer played a role. The arrays were normalized using the normalizewithinarrays() function within the limma package. This function normalized the expression log ratios for two-color spotted microarray experiments, so that the log ratios averaged to zero within each array. The backgroundcorrect() function, also within the limma package, was used to correct the background of the microarray expression intensities by subtracting the average signal intensity of the area between spots. 67

4 Segmentation of Data Each array CGH was processed using the processcgh() function from the snapcgh package. This function used the normalized MAList, that contained the log expression ratios and was created by the normalization and background correction. It, then, ordered and filtered the clones based on the mapping information of the log ratios. Thus, the datasets were segmented. Using segmentation models, specific segments were identified and the segment variance of log ratio values was minimized. Hidden Markov Models (HMMs) Hidden Markov Models are a formal foundation for making probabilistic models of sequence labeling problems (Eddy, 2004). An HMM indicates a finite set of states, with each set containing emission probability distributions and specific transition probabilities between states. At each state, a residue is produced from the state s emission probability distribution. Then, the next state is chosen based on the state s transition probability distribution. The model thus generates two sets of information: the underlying state path, which is created while transitioning from state to state and is hidden, and the observed sequence, which is the residue emitted from each state in the state path. Because HMMs can effectively uncover the relationship between the underlying states and the observed emissions, they are useful in analyzing array CGH data. The log ratios obtained from the array CGH data are the emissions, and the underlying states are the copy number values of each region on the chromosome and correspond to the emissions, based on specific probabilities. Two types of HMMs exist to identify the underlying states of the array CGH data, representing the copy number aberrations: the homogenous model and the heterogeneous model, which both have their own distinct advantages. The former option, the homogenous HMM, estimates the number of hidden states via model selection and performs an analysis for each chromosome. It regards the underlying states as segments of a common mean that represent the copy number values of each region. The homogenous HMM assumes that the transition probability matrix is the same at the each state and thus does not consider the distance between clones. To fit the unsupervised homogenous HMM for each dataset, methods in the Bioconductor package snapcgh were used; the function runhomhmm() was used to discover the hidden copy numbers for each chromosome from the patient datasets for GBM. On the other hand, the heterogeneous HMM utilizes transition probabilities that are dependent on the distance between clones; furthermore, the probability of remaining in the same hidden state is a decreasing function of the distance between one probe and the probe before it. When the distance between two clones is maximized, the state of a probe is not affected by the state of the previous clone. The function, runbiohmm() was used for the heterogeneous HMM. This project uses both the homogenous and heterogeneous HMMs to identify which one assesses the copy number variations more accurately using simulated data and the corresponding True Positive Rates and False Discovery Rates. RESULTS First, the efficacy of the homogeneous HMM was compared to that of the heterogeneous HMM. A three-step algorithm was then used to identify the altered chromosomal regions in the cancer data. The three steps of the algorithm consist of the data pre-processing and segmentation of data, the identification of the hidden copy number states of the cancer data using the HMMs, and the quantification of the specific gains or losses to detect the genes in the regions of interest. The three-step algorithm is applied on array CGH GBM datasets for five different patients. True Positive Rate and False Discovery Rate The efficacy of the homogeneous and heterogeneous HMMs is evaluated against the known truth of the simulated data by looking at the true positive and false discovery rates for breakpoint detection, as seen in Figure 3. The data was simulated using the simulatedata() method in the snapcgh package. This function simulates acgh data, and this function was used to create 10 arrays to account for variation in copy number data. The comparesegmentations() method was used to create a matrix, consisting of the true positive rates 68

5 and the false discovery rates for each HMM; this function evaluates the performance of the segmentation method to the known truth of the simulated data. The boxplot() function was used to generate a plot of the rates to effectively compare the two HMMs. The true positive rates and the false discovery rates of both the homogenous and heterogeneous HMMs demonstrated that the heterogenous HMM was more successful in identifying the copy number values accurately. Normalization, Background Correction, and Segmentation The first step, the data pre-processing, helped eliminate any background errors within the data using normalization and background correction methods. These methods allowed for the next steps to become less likely to experience error. Segmentation was carried out using the snapcgh package in R which first splits each dataset into various segments based on the variation of copy number. Then the unsupervised HMM was used to find the copy number states of each chromosome. After the segmentation, the smoothed log ratios for each patient s data were plotted, as shown in Figure 4. Each figure represents the dataset from a different GBM patient and demonstrates the log ratios of each patient plotted against the kilobase. The different colors represent the twenty-four total chromosomes in the human genome. These log ratios were used as the emissions in the HMM, necessary for determining the copy number states of each patient. Use of the Hidden Markov Models Both the homogeneous HMM and heterogeneous HMM were used to identify the copy number states of each chromosome for every patient; however, only results from the heterogeneous HMM are shown because of its higher efficacy rates. Figure 5 displays the plots of the hidden states of each patient that were found for each chromosome. The plot for Patient 1 shows up-regulation of genetic data in somatic chromosomes 5 and 14 and sex chromosome Y, which is shown as chromosome 24; down-regulation of genetic data is observed in chromosomes 4 and 21. The plot of the states for Patient 2 demonstrates upregulation in chromosomes 2, 4, 5, 7, 8, 9, 12, 14, 20, 21, and 23; down-regulation is seen in somatic chromosomes 1, 16, 18, and sex Chromosome Y. The states of Patient 3 show few copy number changes: somatic chromosomes 10 and 15 and sex chromosome Y have an increased copy number, and chromosome 1 has a decrease in copy number. On the other hand, the states of Patient 4 show greater copy number variance. Somatic chromosomes 3, 12, 14, 15, 16, 17, and 22 and sex chromosomes X and Y all demonstrate a greater copy number, and this patient has no losses in genetic data. Lastly, Patient 5 also has several copy number gains in somatic chromosomes 2, 6, 7, 9, 12, 13, 14, 15, 20, and 22 and sex chromosome Y. While it is important to consider the fact that each individual s genome consists of several mutations and some copy number variations, whole chromosomal aberrations are quite often indications of disease. The identification of the copy number states of each chromosome in the genomes of cancer patients is useful for identifying common chromosomes that may impact the progression of the Glioblastoma Multiforme tumor. If observed in several tumors, genes can be identified as oncogenes or tumor suppressor genes through the analysis of the specific chromosomal position. In addition, the individual variance in copy number of each chromosome for each patient allows for personalized treatment. Identification of Specific Gains or Losses The third and final step was conducted by comparing the log ratio plots of the five patient samples, as seen in Figure 4, and identifying the common regions with similar gains or losses and mapping those regions to specific genes. While some datasets displayed a more drastic change in the log ratios as compared to the other datasets, a majority of the datasets exhibited an elevated copy number at chromosomes 12 and Y and a decreased copy number at chromosome 1. The chromosome numbers are identified through the heterogeneous HMM analysis on single chromosomes. The variance in copy number among the different patients can be attributed to the diversity of genomic data from individual to individual. While each patient s genome may represent common gains and losses, there are several external conditions that influence the expression of 69

6 regions of the genome, including the patient s age and medical history. The gains or losses of certain chromosomal regions were identified using the plots of the copy numbers that rely on the log ratios. DISCUSSION This project aims to design an algorithm that can identify the copy number states for each chromosome. Remarkably, the method yields interesting data for analysis. This project applies the methods on Glioblastoma multiforme array CGH data to figure out the copy number states for each chromosome. It also efficiently matches the corresponding copy number gain or loss to a certain region of interest, that may be involved in the progression of the tumor. The results from this project can be used for improved and personalized treatment by identifying genes that are up-regulated or under-expressed. Each data set obtained from a different patient, while being affected by the same disease, has some differing log ratios and copy numbers. The variance in copy number among patients is due to the factors, including environmental and hereditary information, that impact the log ratios and, thus, the copy number variations. For further research, patient medical history, age, and other medical factors can be included in the study in order to more accurately study chromosomal aberrations that are involved in GBM. Some similar regions of interest were identified amongst the GBM patients. Most of the datasets contain duplications at Chromosome 12. Using the GeneName data, the original names of the genes, attributing to the elevated copy number were found. Chromosome 12 contains the genes, PDE3A and ST8SIA1. PDE3A, or Phosphodiesterase 3A, plays a critical role in many cellular processes by regulating the amplitude and duration of the intracellular cyclic nucleotide signals. ST8SIA1, or ST8 Alpha-N-Acetyl- Neuraminide Alpha-2,8-Sialyltransferase 1, is important for cell adhesion and growth of malignant cells. The dysregulation of these genes may attribute to the progression of cancer as these genes are important in maintaining cell processes and seem to affect the growth of malignant cells. In chromosome 1, genes AMY2A and KIFAP2 were under-expressed; this decrease in expression may have caused the cells to stop functioning normally and thus encouraged tumor growth. Additionally, heterozygous and homozygous duplications were seen near Chromosomes 19, which contains genes MLL4 and PSENEN genes. MLL4, or Myeloid lymphoid or mixed-lineage leukemia 4, is most commonly seen in luekemia; however, it is often amplified in tumor cell lines and may be involved in the formation of the GBM tumor. Also, some patients had an increased copy number at chromosome 7, which represents the amplification in the Epidermal Growth Factor Receptor (EGFR) gene that causes cells to grow and divide. EGFR is a highly prominent oncogene present in various types of cancer, including GBM and Lung Cancers. In addition to the genes identified across all samples, genes specific to certain patients can be used for more personalized treatment. CONCLUSION This project has successfully utilized array CGH data to discover various genes that may impact the formation and progression of the GBM tumor in patients. The copy number phenotype discovered for each cancer patient is associated with a known biological marker that may be associated with the progression of the cancer, either by its overexpression or underexpression. If the gene is over-expressed, it is most likely an oncogene that causes cells to grow and divide, as observed in cancers. When the gene is under-expressed, the gene may be a cause of the tumor development because it is probably an important cell cycle gene, that suppresses the formation of tumors in cells. The resulting copy number phenotype, determined with the HMM used in this project, is associated with biological markers that may be previously unassociated with the cancer phenotype. This association will help provide the most reliable and informative genome characterization of cancer and the development of more specialized disease classification, prognosis, and personalized treatment for the cancer patient. Since this algorithm has been used on Level 1 data, this project has successfully demonstrated the analysis of the raw data by normalization, segmentation, and implementation of the HMMs to identify cancer biomarkers for the development of a better and more personalized form of treatment for patients affected with 70

7 GBM. For further research, the algorithm used in this project can be used on more GBM datasets to more successfully find the biological markers that may cause the formation of the brain tumor within the cancer patients. Additionally, the algorithm used in this project can be utilized on other cancer types for a similar analysis of cancer biomarkers. While incorporating this algorithm, other medical factors can be taken into account to eliminate any interference in the study of the copy number variations. Further research can be conducted that will standardize the data to incorporate factors, including the age and previous medical conditions of the patient. ACKNOWLEDGEMENTS I am grateful to Professor Susan Holmes from the Statistics Department at Stanford University for her valuable time, help, and guidance provided while I was conducting this project and taking the BioStatistics course; Professor Trevor Martin for his help during the BioStatistics course; and Julia Fukuyama for her advice on how to approach certain issues while using R. Also, my Physics-Honors teacher, Mrs. Segal, provided me with valuable advice while conducting my project. In addition, I am very grateful to Dr. Sean Davis, Staff Scientist at the Center for Cancer Research at the National Cancer Institute, for his valuable time and feedback provided while conducting this project. Also, I am thankful to my parents for their continuous support. ANNOTATED BIBLIOGRAPHY Eddy, Sean R. What Is a Hidden Markov Model. Nature.com. Nature Publishing Group, Web. 5 Oct This research article discusses the definition of a Hidden Markov Model. The author defines a Hidden Markov Model as a formal foundation for making probabilistic models of sequences by considering transition probabilities. His definition really encompasses the significance of this project, which uses Hidden Markov Models to find the underlying states from the given emissions. Additionally, the author uses examples based on the genetic sequences. Through this example, he notes that the sequence, in terms of A, C, T, and G, represents the overlying emissions, and the underlying state path is hidden and must be discovered through the use of the Hidden Markov Models, that contains transition probabilities. The author of this research article presents his research in a highly credible fashion since he first defines the Hidden Markov Model and then provides examples supporting his definition. In addition, he makes use of several sources from credible authors; for example, he cited Rabiner who conducted a tutorial on Hidden Markov Models. Dr. Sean R. Eddy works at Howard Hughes Medical Institute and the Department of Genetics at Washington University School of Medicine. He has authored research papers that have used Hidden Markov Models. Thus, he is a credible source as he has the knowledge necessary for defining and demonstrating what a Hidden Markov Model is. Olshen, A.B., E. S. Venkatraman, Robert Lucito, and Michael Wigler. Circular binary segmentation for the analysis of array based DNA copy number data. Biostat (2004) 5 (4): , doi: /biostatistics/kxh008. The research paper, Circular binary segmentation for the analysis of array based DNA copy number data, discusses another approach for analyzing array CGH data. They have utilized array CGH data and circular binary segmentation method to translate noisy intensity measurements into regions of equal copy number. They have applied this method on test breast cancer data, as well as simulated data with known copy number alterations to test the efficacy of their new method. They have effectively discovered another method for analyzing array CGH data to detect regions of gains and losses based on the segments that they found with their method. 71

8 The authors of this research paper present the research in a highly efficient and credible way as they have demonstrated a new development while applying it on simulated data and test data. Their method is one approach for analyzing array CGH data to obtain the over-expressed and down-regulated regions. Dr. Venkatraman is from the Department of Epidemiology and Biostatistics at the Memorial Sloan-Kettering Cancer Center; his position gives him the credibility for conducting this research paper. The other two authors, Robert Lucito and Michael Wigler also have significant experience in the cancer field as they conduct cancer research at the Cold Spring Harbor Laboratory in New York. Wang, P., Y. Kim, J. Pollack, B. Narasimhan, and R. Tibshirani. A Method for Calling Gains and Losses in Array CGH Data. Biostatistics 6.1 (2004): Web. This research paper focuses on the development of a new method for detecting gains and losses in Array CGH data. The authors utilize clustering to identify crucial regions. They have developed a new algorithm, Clustering along Chromosomes (CLAC) to detect specific regions. The CLAC builds hierarchical clustering-style trees along each chromosome arm or chromosome and then selects the interesting clusters by controlling the False Discovery Rates. They have applied the data on a lung cancer microarray CGH data set. Their clustering algorithm is iterative as it continues until a big cluster is formed, and it is based on the identification of specific clusters with one gene in each cluster, and then the two adjacent clusters are merged. The authors of this research paper all work in different departments at Stanford University and thus represent an interdisciplinary approach to this paper. The main author, Dr. Wang, works in the Statistics Department and thus is extremely knowledgeable in this field. Their research provides a valuable insight into another way of analyzing array CGH data, and underscores the necessity of analyzing array CGH data to find the regions that have demonstrated gains or losses for better disease treatment in the future. WORKS CITED Albertson, D.G. and Daniel Pinkel, Genomic microarrays in Human Genetic Disease and cancer. Hum. Mol. Genet. (2003) 12 (suppl 2): R145-R152, August 5, 2003, doi: /hmg/ddg261 Eddy, Sean R. What Is a Hidden Markov Model. Nature.com. Nature Publishing Group, Web. 5 Oct Gentleman, R.C., Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dett- ling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, TorstenHothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch Cheng Li, Martin Maechler, Anthony J. Rossini, Gunther Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y. H. Yang, and Jianhua Zhang. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5:R80, Marioni, J.C., N.P. Thorne, S. Tavare, F. Radyanyi. BioHMM: A heterogeneous Hidden Markov Model for Segmenting array CGH data. Bioinformatics.2006; 22: Olshen, A.B., E. S. Venkatraman, Robert Lucito, and Michael Wigler. Circular binary segmentation for the analysis of arraybased DNA copy number data. Biostat (2004) 5 (4): , doi: /biostatistics/kxh008. Rabiner, L.R., A Tutorial on Hidden Markov Model and Selected Applications in Speech Recognition. Proceedings of the IEEE, Volume 77, February 1989, Smith, M.L., John C. Marioni, Steven McKinney, Thomas Hardcastle and Natalie P. Thorne (2009). snapcgh: Segmentation, normalisation and processing of acgh data. R package version Redon, Richard, Shumpei Ishikawa, Karen R. Fitch, Lars Feuk, George H. Perry, T. Daniel Andrews, Heike Fiegler, Michael H. Shapero, Andrew R. Carson, Wenwei Chen, Eun Kyung Cho, Stephanie Dallaire, Jennifer L. Freeman, Juan R. González, MònicaGratacòs, Jing Huang, DimitriosKalaitzopoulos, Daisuke Komura, Jeffrey R. Macdonald, Christian R. Marshall, Rui Mei, Lyndal Montgomery, Kunihiro Nishimura, Kohji Okamura, Fan Shen, Martin J. Somerville, Joelle Tchinda, Armand 72

9 Valsesia, Cara Woodwark, Fengtang Yang, Junjun Zhang, Tatiana Zerjal, Jane Zhang, LluisArmengol, Donald F. Conrad, Xavier Estivill, Chris Tyler-Smith, Nigel P. Carter, Hiroyuki Aburatani, Charles Lee, Keith W. Jones, Stephen W. Scherer, and Matthew E. Hurles. "Global Variation in Copy Number in the Human Genome."Nature (2006): Web. Smyth, G.K. Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages , Web. Wang, P., Y. Kim, J. Pollack, B. Narasimhan, and R. Tibshirani. A Method for Calling Gains and Losses in Array CGH Data. Biostatistics 6.1 (2004): Web. Zhang, N. DNA Copy Number Profiling in Normal and Tumor Genomes. Frontiers in Computational and Systems Biology.Vol. 15. London: Springer, Web. FIGURES Figure 1: Forms of chromosome changes. 73

10 Figure 2. Schematic Representation of Array CGH 74

11 Figure 3: Boxplots comparing the efficacy of the Hidden Markov Models 75

12 Figure 4: Log Ratios for five patients, that were used as the emissions in the Hidden Markov Models. 76

13 Figure 5: The states are identified with the Heterogeneous Hidden Markov Model for the five patients, and they range from 0 to 5 for the chromosomes, depending on the patient. 77

0.1% variance attributed to scattered single base-pair changes SNPs

0.1% variance attributed to scattered single base-pair changes SNPs April 2003, human genome project completed: 99.9% of genome identical in all humans 0.1% variance attributed to scattered single base-pair changes SNPs It has been long recognized that variation in the

More information

Understanding DNA Copy Number Data

Understanding DNA Copy Number Data Understanding DNA Copy Number Data Adam B. Olshen Department of Epidemiology and Biostatistics Helen Diller Family Comprehensive Cancer Center University of California, San Francisco http://cc.ucsf.edu/people/olshena_adam.php

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

Integrated Analysis of Copy Number and Gene Expression

Integrated Analysis of Copy Number and Gene Expression Integrated Analysis of Copy Number and Gene Expression Nexus Copy Number provides user-friendly interface and functionalities to integrate copy number analysis with gene expression results for the purpose

More information

Structural Variation and Medical Genomics

Structural Variation and Medical Genomics Structural Variation and Medical Genomics Andrew King Department of Biomedical Informatics July 8, 2014 You already know about small scale genetic mutations Single nucleotide polymorphism (SNPs) Deletions,

More information

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

Science. Webinar Series. CNVs vs SNPs: 16 July, Participating Experts: Understanding Human Structural Variation in Disease

Science. Webinar Series. CNVs vs SNPs: 16 July, Participating Experts: Understanding Human Structural Variation in Disease Science Webinar Series CNVs vs SNPs: 16 July, 2008 Understanding Human Structural Variation in Disease Brought to you by the Science/AAAS Business Office Participating Experts: Charles Lee, Ph.D. Harvard

More information

BIO-132 Population Genetics of Human Copy Number Variations:

BIO-132 Population Genetics of Human Copy Number Variations: BIO-132 Population Genetics of Human Copy Number Variations: Models and Simulation of their Evolution Along and Across the Genomes September 16, 2007 Abstract Population genetic models play a significant

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Introduction to LOH and Allele Specific Copy Number User Forum

Introduction to LOH and Allele Specific Copy Number User Forum Introduction to LOH and Allele Specific Copy Number User Forum Jonathan Gerstenhaber Introduction to LOH and ASCN User Forum Contents 1. Loss of heterozygosity Analysis procedure Types of baselines 2.

More information

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015 Goals/Expectations Computer Science, Biology, and Biomedical (CoSBBI) We want to excite you about the world of computer science, biology, and biomedical informatics. Experience what it is like to be a

More information

Biostatistical modelling in genomics for clinical cancer studies

Biostatistical modelling in genomics for clinical cancer studies This work was supported by Entente Cordiale Cancer Research Bursaries Biostatistical modelling in genomics for clinical cancer studies Philippe Broët JE 2492 Faculté de Médecine Paris-Sud In collaboration

More information

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh)

Harvard University. A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) Harvard University Harvard University Biostatistics Working Paper Series Year 2005 Paper 30 A Pseudolikelihood Approach for Simultaneous Analysis of Array Comparative Genomic Hybridizations (acgh) David

More information

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre

Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre Genome-wide copy-number calling (CNAs not CNVs!) Dr Geoff Macintyre Structural variation (SVs) Copy-number variations C Deletion A B C Balanced rearrangements A B A B C B A C Duplication Inversion Causes

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Comparison of segmentation methods in cancer samples

Comparison of segmentation methods in cancer samples fig/logolille2. Comparison of segmentation methods in cancer samples Morgane Pierre-Jean, Guillem Rigaill, Pierre Neuvial Laboratoire Statistique et Génome Université d Évry Val d Éssonne UMR CNRS 8071

More information

Section D: The Molecular Biology of Cancer

Section D: The Molecular Biology of Cancer CHAPTER 19 THE ORGANIZATION AND CONTROL OF EUKARYOTIC GENOMES Section D: The Molecular Biology of Cancer 1. Cancer results from genetic changes that affect the cell cycle 2. Oncogene proteins and faulty

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information

Genomic structural variation

Genomic structural variation Genomic structural variation Mario Cáceres The new genomic variation DNA sequence differs across individuals much more than researchers had suspected through structural changes A huge amount of structural

More information

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer

Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Genetic alterations of histone lysine methyltransferases and their significance in breast cancer Supplementary Materials and Methods Phylogenetic tree of the HMT superfamily The phylogeny outlined in the

More information

Cost effective, computer-aided analytical performance evaluation of chromosomal microarrays for clinical laboratories

Cost effective, computer-aided analytical performance evaluation of chromosomal microarrays for clinical laboratories University of Iowa Iowa Research Online Theses and Dissertations Summer 2012 Cost effective, computer-aided analytical performance evaluation of chromosomal microarrays for clinical laboratories Corey

More information

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer:

Human Cancer Genome Project. Bioinformatics/Genomics of Cancer: Bioinformatics/Genomics of Cancer: Professor of Computer Science, Mathematics and Cell Biology Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of

More information

Module 3: Pathway and Drug Development

Module 3: Pathway and Drug Development Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis HMG Advance Access published December 21, 2012 Human Molecular Genetics, 2012 1 13 doi:10.1093/hmg/dds512 Whole-genome detection of disease-associated deletions or excess homozygosity in a case control

More information

Association for Molecular Pathology Promoting Clinical Practice, Basic Research, and Education in Molecular Pathology

Association for Molecular Pathology Promoting Clinical Practice, Basic Research, and Education in Molecular Pathology Association for Molecular Pathology Promoting Clinical Practice, Basic Research, and Education in Molecular Pathology 9650 Rockville Pike, Bethesda, Maryland 20814 Tel: 301-634-7939 Fax: 301-634-7990 Email:

More information

CNV Detection and Interpretation in Genomic Data

CNV Detection and Interpretation in Genomic Data CNV Detection and Interpretation in Genomic Data Benjamin W. Darbro, M.D., Ph.D. Assistant Professor of Pediatrics Director of the Shivanand R. Patil Cytogenetics and Molecular Laboratory Overview What

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma. Supplementary Figure 1 Mutational signatures in BCC compared to melanoma. (a) The effect of transcription-coupled repair as a function of gene expression in BCC. Tumor type specific gene expression levels

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 28

More information

Identification of regions with common copy-number variations using SNP array

Identification of regions with common copy-number variations using SNP array Identification of regions with common copy-number variations using SNP array Agus Salim Epidemiology and Public Health National University of Singapore Copy Number Variation (CNV) Copy number alteration

More information

Analysis of acgh data: statistical models and computational challenges

Analysis of acgh data: statistical models and computational challenges : statistical models and computational challenges Ramón Díaz-Uriarte 2007-02-13 Díaz-Uriarte, R. acgh analysis: models and computation 2007-02-13 1 / 38 Outline 1 Introduction Alternative approaches What

More information

Session 4 Rebecca Poulos

Session 4 Rebecca Poulos The Cancer Genome Atlas (TCGA) & International Cancer Genome Consortium (ICGC) Session 4 Rebecca Poulos Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 20

More information

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University

Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University Role of Chemical lexposure in Generating Spontaneous Copy Number Variants (CNVs) Jennifer Freeman Assistant Professor of Toxicology School of Health Sciences Purdue University CNV Discovery Reference Genetic

More information

SubLasso:a feature selection and classification R package with a. fixed feature subset

SubLasso:a feature selection and classification R package with a. fixed feature subset SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017 RESEARCH ARTICLE Classification of Cancer Dataset in Data Mining Algorithms Using R Tool P.Dhivyapriya [1], Dr.S.Sivakumar [2] Research Scholar [1], Assistant professor [2] Department of Computer Science

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Contents. 1.5 GOPredict is robust to changes in study sets... 5

Contents. 1.5 GOPredict is robust to changes in study sets... 5 Supplementary documentation for Data integration to prioritize drugs using genomics and curated data Riku Louhimo, Marko Laakso, Denis Belitskin, Juha Klefström, Rainer Lehtonen and Sampsa Hautaniemi Faculty

More information

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach Manuela Zucknick Division of Biostatistics, German Cancer Research Center Biometry Workshop,

More information

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Supplementary note: Comparison of deletion variants identified in this study and four earlier studies Here we compare the results of this study to potentially overlapping results from four earlier studies

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

Tutorial: acgh Data Analysis With Chipster

Tutorial: acgh Data Analysis With Chipster Tutorial: acgh Data Analysis With Chipster Ilari Scheinin (firstname.lastname@gmail.com) January 14, 2011 Abstract This tutorial covers analysis of array comparative genomic hybridization (acgh) data with

More information

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075

S1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075 Aiello & Alter (216) PLoS One vol. 11 no. 1 e164546 S1 Appendix A-1 S1 Appendix: Figs A G and Table A a Tumor Generalized Fraction b Normal Generalized Fraction.25.5.75.25.5.75 1 53 4 59 2 58 8 57 3 48

More information

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department

More information

Supplementary Figure 1

Supplementary Figure 1 Supplementary Figure 1 Supplementary Fig. 1: Quality assessment of formalin-fixed paraffin-embedded (FFPE)-derived DNA and nuclei. (a) Multiplex PCR analysis of unrepaired and repaired bulk FFPE gdna from

More information

November 9, Johns Hopkins School of Medicine, Baltimore, MD,

November 9, Johns Hopkins School of Medicine, Baltimore, MD, Fast detection of de-novo copy number variants from case-parent SNP arrays identifies a deletion on chromosome 7p14.1 associated with non-syndromic isolated cleft lip/palate Samuel G. Younkin 1, Robert

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed.

of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Supplementary Note The potential association and implications of HBV integration at known and putative cancer genes of TERT, MLL4, CCNE1, SENP5, and ROCK1 on tumor development were discussed. Human telomerase

More information

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from

Nature Genetics: doi: /ng Supplementary Figure 1. SEER data for male and female cancer incidence from Supplementary Figure 1 SEER data for male and female cancer incidence from 1975 2013. (a,b) Incidence rates of oral cavity and pharynx cancer (a) and leukemia (b) are plotted, grouped by males (blue),

More information

Nature Biotechnology: doi: /nbt.1904

Nature Biotechnology: doi: /nbt.1904 Supplementary Information Comparison between assembly-based SV calls and array CGH results Genome-wide array assessment of copy number changes, such as array comparative genomic hybridization (acgh), is

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 10, Issue 1 2011 Article 52 Modeling Read Counts for CNV Detection in Exome Sequencing Data Michael I. Love, Max Planck Institute for Molecular

More information

Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK 2

Department of Pathology, University of Cambridge, Tennis Court Road, Cambridge CB2 1QP, UK 2 Advances in Bioinformatics Volume 22, Article ID 876976, 2 pages doi:.55/22/876976 Research Article A High-Throughput Computational Framework for Identifying Significant Copy Number Aberrations from Array

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University

False Discovery Rates and Copy Number Variation. Bradley Efron and Nancy Zhang Stanford University False Discovery Rates and Copy Number Variation Bradley Efron and Nancy Zhang Stanford University Three Statistical Centuries 19th (Quetelet) Huge data sets, simple questions 20th (Fisher, Neyman, Hotelling,...

More information

Chapter 4 Cellular Oncogenes ~ 4.6 -

Chapter 4 Cellular Oncogenes ~ 4.6 - Chapter 4 Cellular Oncogenes - 4.2 ~ 4.6 - Many retroviruses carrying oncogenes have been found in chickens and mice However, attempts undertaken during the 1970s to isolate viruses from most types of

More information

CHROMOSOMAL MICROARRAY (CGH+SNP)

CHROMOSOMAL MICROARRAY (CGH+SNP) Chromosome imbalances are a significant cause of developmental delay, mental retardation, autism spectrum disorders, dysmorphic features and/or birth defects. The imbalance of genetic material may be due

More information

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics

More information

Agilent s Copy Number Variation (CNV) Portfolio

Agilent s Copy Number Variation (CNV) Portfolio Technical Overview Agilent s Copy Number Variation (CNV) Portfolio Abstract Copy Number Variation (CNV) is now recognized as a prevalent form of structural variation in the genome contributing to human

More information

DNA-seq Bioinformatics Analysis: Copy Number Variation

DNA-seq Bioinformatics Analysis: Copy Number Variation DNA-seq Bioinformatics Analysis: Copy Number Variation Elodie Girard elodie.girard@curie.fr U900 institut Curie, INSERM, Mines ParisTech, PSL Research University Paris, France NGS Applications 5C HiC DNA-seq

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Vol 3 November doi:.38/nature39 Global variation in copy number in the human genome Richard Redon, Shumpei Ishikawa,3, Karen R. Fitch, Lars Feuk,, George H. Perry 7, T. Daniel Andrews, Heike Fiegler, Michael

More information

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA LEO TUNKLE *

HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA   LEO TUNKLE * CERNA SEARCH METHOD IDENTIFIED A MET-ACTIVATED SUBGROUP AMONG EGFR DNA AMPLIFIED LUNG ADENOCARCINOMA PATIENTS HALLA KABAT * Outreach Program, mircore, 2929 Plymouth Rd. Ann Arbor, MI 48105, USA Email:

More information

Results and Discussion of Receptor Tyrosine Kinase. Activation

Results and Discussion of Receptor Tyrosine Kinase. Activation Results and Discussion of Receptor Tyrosine Kinase Activation To demonstrate the contribution which RCytoscape s molecular maps can make to biological understanding via exploratory data analysis, we here

More information

Genomic complexity and arrays in CLL. Gian Matteo Rigolin, MD, PhD St. Anna University Hospital Ferrara, Italy

Genomic complexity and arrays in CLL. Gian Matteo Rigolin, MD, PhD St. Anna University Hospital Ferrara, Italy Genomic complexity and arrays in CLL Gian Matteo Rigolin, MD, PhD St. Anna University Hospital Ferrara, Italy Clinical relevance of genomic complexity (GC) in CLL GC has been identified as a critical negative

More information

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014

Challenges of CGH array testing in children with developmental delay. Dr Sally Davies 17 th September 2014 Challenges of CGH array testing in children with developmental delay Dr Sally Davies 17 th September 2014 CGH array What is CGH array? Understanding the test Benefits Results to expect Consent issues Ethical

More information

Canadian College of Medical Geneticists (CCMG) Cytogenetics Examination. May 4, 2010

Canadian College of Medical Geneticists (CCMG) Cytogenetics Examination. May 4, 2010 Canadian College of Medical Geneticists (CCMG) Cytogenetics Examination May 4, 2010 Examination Length = 3 hours Total Marks = 100 (7 questions) Total Pages = 8 (including cover sheet and 2 pages of prints)

More information

A REVIEW OF BIOINFORMATICS APPLICATION IN BREAST CANCER RESEARCH

A REVIEW OF BIOINFORMATICS APPLICATION IN BREAST CANCER RESEARCH Journal of Advanced Bioinformatics Applications and Research. Vol 1, Issue 1, June 2010, pp 59-68 A REVIEW OF BIOINFORMATICS APPLICATION IN BREAST CANCER RESEARCH Vidya Vaidya, Shriram Dawkhar Department

More information

R2: web-based genomics analysis and visualization platform

R2: web-based genomics analysis and visualization platform R2: web-based genomics analysis and visualization platform Overview Jan Koster Department of Oncogenomics Academic Medical Center (AMC) UvA, the Netherlands jankoster@amc.uva.nl jankoster@amc.uva.nl 1

More information

LTA Analysis of HapMap Genotype Data

LTA Analysis of HapMap Genotype Data LTA Analysis of HapMap Genotype Data Introduction. This supplement to Global variation in copy number in the human genome, by Redon et al., describes the details of the LTA analysis used to screen HapMap

More information

Global variation in copy number in the human genome

Global variation in copy number in the human genome Global variation in copy number in the human genome Redon et. al. Nature 444:444-454 (2006) 12.03.2007 Tarmo Puurand Study 270 individuals (HapMap collection) Affymetrix 500K Whole Genome TilePath (WGTP)

More information

SALSA MLPA probemix P315-B1 EGFR

SALSA MLPA probemix P315-B1 EGFR SALSA MLPA probemix P315-B1 EGFR Lot B1-0215 and B1-0112. As compared to the previous A1 version (lot 0208), two mutation-specific probes for the EGFR mutations L858R and T709M as well as one additional

More information

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB

AVENIO ctdna Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits The complete NGS liquid biopsy solution EMPOWER YOUR LAB Analysis Kits Next-generation performance in liquid biopsies 2 Accelerating clinical research From liquid biopsy to next-generation

More information

Introduction. Cancer Biology. Tumor-suppressor genes. Proto-oncogenes. DNA stability genes. Mechanisms of carcinogenesis.

Introduction. Cancer Biology. Tumor-suppressor genes. Proto-oncogenes. DNA stability genes. Mechanisms of carcinogenesis. Cancer Biology Chapter 18 Eric J. Hall., Amato Giaccia, Radiobiology for the Radiologist Introduction Tissue homeostasis depends on the regulated cell division and self-elimination (programmed cell death)

More information

An Overview of Cytogenetics. Bridget Herschap, M.D. 9/23/2013

An Overview of Cytogenetics. Bridget Herschap, M.D. 9/23/2013 An Overview of Cytogenetics Bridget Herschap, M.D. 9/23/2013 Objectives } History and Introduction of Cytogenetics } Overview of Current Techniques } Common cytogenetic tests and their clinical application

More information

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G.

CNV detection. Introduction and detection in NGS data. G. Demidov 1,2. NGSchool2016. Centre for Genomic Regulation. CNV detection. G. Introduction and detection in NGS data 1,2 1 Genomic and Epigenomic Variation in Disease group, Centre for Genomic Regulation 2 Universitat Pompeu Fabra NGSchool2016 methods: methods Outline methods: methods

More information

Computational Analysis of Genome-Wide DNA Copy Number Changes

Computational Analysis of Genome-Wide DNA Copy Number Changes Computational Analysis of Genome-Wide DNA Copy Number Changes Lei Song Thesis submitted to the faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements

More information

PROCEEDINGS OF SPIE. Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order

PROCEEDINGS OF SPIE. Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order PROCEEDINGS OF SPIE SPIEDigitalLibrary.org/conference-proceedings-of-spie Models of temporal enhanced ultrasound data for prostate cancer diagnosis: the impact of time-series order Layan Nahlawi Caroline

More information

Vega: Variational Segmentation for Copy Number Detection

Vega: Variational Segmentation for Copy Number Detection Vega: Variational Segmentation for Copy Number Detection Sandro Morganella Luigi Cerulo Giuseppe Viglietto Michele Ceccarelli Contents 1 Overview 1 2 Installation 1 3 Vega.RData Description 2 4 Run Vega

More information

Multimarker Genetic Analysis Methods for High Throughput Array Data

Multimarker Genetic Analysis Methods for High Throughput Array Data Multimarker Genetic Analysis Methods for High Throughput Array Data by Iuliana Ionita A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department

More information

DETECTING HIGHLY DIFFERENTIATED COPY-NUMBER VARIANTS FROM POOLED POPULATION SEQUENCING

DETECTING HIGHLY DIFFERENTIATED COPY-NUMBER VARIANTS FROM POOLED POPULATION SEQUENCING DETECTING HIGHLY DIFFERENTIATED COPY-NUMBER VARIANTS FROM POOLED POPULATION SEQUENCING DANIEL R. SCHRIDER * Department of Biology and School of Informatics and Computing, Indiana University, 1001 E Third

More information

and SNPs: Understanding Human Structural Variation in Disease. My

and SNPs: Understanding Human Structural Variation in Disease. My CNVs vs. SNPs: Understanding Human Structural Variation in Disease [0:00:00] Hello and welcome to today s Science/AAAS live webinar entitled, CNVs and SNPs: Understanding Human Structural Variation in

More information

Informative Gene Selection for Leukemia Cancer Using Weighted K-Means Clustering

Informative Gene Selection for Leukemia Cancer Using Weighted K-Means Clustering IOSR Journal of Pharmacy and Biological Sciences (IOSR-JPBS) e-issn: 2278-3008, p-issn:2319-7676. Volume 9, Issue 4 Ver. V (Jul -Aug. 2014), PP 12-16 Informative Gene Selection for Leukemia Cancer Using

More information

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16 PGAR: ASD Candidate Gene Prioritization System Using Expression Patterns Steven Cogill and Liangjiang Wang Department of Genetics and

More information

Using Network Flow to Bridge the Gap between Genotype and Phenotype. Teresa Przytycka NIH / NLM / NCBI

Using Network Flow to Bridge the Gap between Genotype and Phenotype. Teresa Przytycka NIH / NLM / NCBI Using Network Flow to Bridge the Gap between Genotype and Phenotype Teresa Przytycka NIH / NLM / NCBI Journal Wisla (1902) Picture from a local fare in Lublin, Poland Genotypes Phenotypes Journal Wisla

More information

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2 For a complete list of defined terms, see the Glossary. Transformation the process by which a cell acquires characteristics of a tumor cell. LESSON 3.2 WORKBOOK How do normal cells become cancer cells?

More information

Shape-based retrieval of CNV regions in read coverage data. Sangkyun Hong and Jeehee Yoon*

Shape-based retrieval of CNV regions in read coverage data. Sangkyun Hong and Jeehee Yoon* 254 Int. J. Data Mining and Bioinformatics, Vol. 9, No. 3, 2014 Shape-based retrieval of CNV regions in read coverage data Sangkyun Hong and Jeehee Yoon* Department of Computer Engineering, Hallym University

More information

Variations in Chromosome Structure & Function. Ch. 8

Variations in Chromosome Structure & Function. Ch. 8 Variations in Chromosome Structure & Function Ch. 8 1 INTRODUCTION! Genetic variation refers to differences between members of the same species or those of different species Allelic variations are due

More information

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells

Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells Analysis of CGH and SNP arrays for the detection of chromosomal aberrations in single cells Peter Konings 1 Evelyne Vanneste 1,2 Thierry Voet 1 Cédric Le Caignec 1 Michèle Ampe 1 Cindy Melotte 1 Sophie

More information

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL

MicroRNA expression profiling and functional analysis in prostate cancer. Marco Folini s.c. Ricerca Traslazionale DOSL MicroRNA expression profiling and functional analysis in prostate cancer Marco Folini s.c. Ricerca Traslazionale DOSL What are micrornas? For almost three decades, the alteration of protein-coding genes

More information

Micro RNA Research. Ken Kosik. Harriman Professor, Department of Molecular, Cellular & Developmental Biology and Biomolecular Sciences & Engr.

Micro RNA Research. Ken Kosik. Harriman Professor, Department of Molecular, Cellular & Developmental Biology and Biomolecular Sciences & Engr. Ken Kosik Harriman Professor, Department of Molecular, Cellular & Developmental Biology and Biomolecular Sciences & Engr. Program Co-Director, Neurosciences Research Institute Micro RNA Research Neuroscience

More information

TCGA. The Cancer Genome Atlas

TCGA. The Cancer Genome Atlas TCGA The Cancer Genome Atlas TCGA: History and Goal History: Started in 2005 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) with $110 Million to catalogue

More information

The Cancer Genome Atlas & International Cancer Genome Consortium

The Cancer Genome Atlas & International Cancer Genome Consortium The Cancer Genome Atlas & International Cancer Genome Consortium Session 3 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW 31 st July 2014 1

More information

Genomic Instability. Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute. RPN-530 Oncology for Scientist-I October 18, 2016

Genomic Instability. Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute. RPN-530 Oncology for Scientist-I October 18, 2016 Genomic Instability Kent Nastiuk, PhD Dept. Cancer Genetics Roswell Park Cancer Institute RPN-530 Oncology for Scientist-I October 18, 2016 Previous lecturers supplying slides/notes/inspiration Daniel

More information

Understanding Genotype- Phenotype relations in Cancer via Network Approaches

Understanding Genotype- Phenotype relations in Cancer via Network Approaches AlgoCSB Algorithmic Methods in Computational and Systems Biology Understanding Genotype- Phenotype relations in Cancer via Network Approaches Teresa Przytycka NIH / NLM / NCBI Phenotypes Journal Wisla

More information

Feature Vector Denoising with Prior Network Structures. (with Y. Fan, L. Raphael) NESS 2015, University of Connecticut

Feature Vector Denoising with Prior Network Structures. (with Y. Fan, L. Raphael) NESS 2015, University of Connecticut Feature Vector Denoising with Prior Network Structures (with Y. Fan, L. Raphael) NESS 2015, University of Connecticut Summary: I. General idea: denoising functions on Euclidean space ---> denoising in

More information

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD Department of Biomedical Informatics Department of Computer Science and Engineering The Ohio State University Review

More information

Cancer gene discovery via network analysis of somatic mutation data. Insuk Lee

Cancer gene discovery via network analysis of somatic mutation data. Insuk Lee Cancer gene discovery via network analysis of somatic mutation data Insuk Lee Cancer is a progressive genetic disorder. Accumulation of somatic mutations cause cancer. For example, in colorectal cancer,

More information

New Enhancements: GWAS Workflows with SVS

New Enhancements: GWAS Workflows with SVS New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

More information

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016

Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016 Multiple Copy Number Variations in a Patient with Developmental Delay ASCLS- March 31, 2016 Marwan Tayeh, PhD, FACMG Director, MMGL Molecular Genetics Assistant Professor of Pediatrics Department of Pediatrics

More information

Epigenetic programming in chronic lymphocytic leukemia

Epigenetic programming in chronic lymphocytic leukemia Epigenetic programming in chronic lymphocytic leukemia Christopher Oakes 10 th Canadian CLL Research Meeting September 18-19 th, 2014 Epigenetics and DNA methylation programming in normal and tumor cells:

More information