Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Similar documents
RNA-seq Introduction

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Transcriptional control in Eukaryotes: (chapter 13 pp276) Chromatin structure affects gene expression. Chromatin Array of nuc

Ch. 18 Regulation of Gene Expression

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

TRANSCRIPTION. DNA à mrna

Computer Science, Biology, and Biomedical Informatics (CoSBBI) Outline. Molecular Biology of Cancer AND. Goals/Expectations. David Boone 7/1/2015

Bio 111 Study Guide Chapter 17 From Gene to Protein

Introduction to Genetics

Life Sciences 1A Midterm Exam 2. November 13, 2006

Analysis of Massively Parallel Sequencing Data Application of Illumina Sequencing to the Genetics of Human Cancers

Regulation of Gene Expression in Eukaryotes

Bi 8 Lecture 17. interference. Ellen Rothenberg 1 March 2016

DNA codes for RNA, which guides protein synthesis.

LESSON 4.4 WORKBOOK. How viruses make us sick: Viral Replication

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Eukaryotic Gene Regulation

Hands-On Ten The BRCA1 Gene and Protein

ChIP-seq data analysis

PROTEIN SYNTHESIS. It is known today that GENES direct the production of the proteins that determine the phonotypical characteristics of organisms.

Peak-calling for ChIP-seq and ATAC-seq

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Breast Cancer and Biotechnology Jacquie Bay, Jo Perry, Michal Denny and Peter Lobie

EPIGENOMICS PROFILING SERVICES

Section 6. Junaid Malek, M.D.

Take-Home Final Exam: Mining Regulatory Modules from Gene Expression Data

RNA- seq Introduc1on. Promises and pi7alls

BIOLOGY 111. CHAPTER 9: The Links in Life s Chain Genetics and Cell Division

Structural Variation and Medical Genomics

STAT1 regulates microrna transcription in interferon γ stimulated HeLa cells

Last time we talked about the few steps in viral replication cycle and the un-coating stage:

Molecular Markers. Marcie Riches, MD, MS Associate Professor University of North Carolina Scientific Director, Infection and Immune Reconstitution WC

AVENIO family of NGS oncology assays ctdna and Tumor Tissue Analysis Kits

RNA Processing in Eukaryotes *

Computational Biology I LSM5191

Protein Synthesis

Raymond Auerbach PhD Candidate, Yale University Gerstein and Snyder Labs August 30, 2012

Human Genome: Mapping, Sequencing Techniques, Diseases

RNA (Ribonucleic acid)

Genetics. Instructor: Dr. Jihad Abdallah Transcription of DNA

General Biology 1004 Chapter 11 Lecture Handout, Summer 2005 Dr. Frisby

Transcription and RNA processing

For all of the following, you will have to use this website to determine the answers:

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

Point total. Page # Exam Total (out of 90) The number next to each intermediate represents the total # of C-C and C-H bonds in that molecule.

Chapter 11 How Genes Are Controlled

The Blueprint of Life: DNA to Protein. What is genetics? DNA Structure 4/27/2011. Chapter 7

The Blueprint of Life: DNA to Protein

Central Dogma. Central Dogma. Translation (mrna -> protein)

Unit 9: The Cell Cycle

Gene Regulation Part 2

Molecular Cell Biology - Problem Drill 10: Gene Expression in Eukaryotes

Alternative RNA processing: Two examples of complex eukaryotic transcription units and the effect of mutations on expression of the encoded proteins.

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Profiles of gene expression & diagnosis/prognosis of cancer. MCs in Advanced Genetics Ainoa Planas Riverola

Unit 9: The Cell Cycle

MicroRNA and Male Infertility: A Potential for Diagnosis

LESSON 3.2 WORKBOOK. How do normal cells become cancer cells? Workbook Lesson 3.2

Breast Cancer and Biotechnology Jacquie L. Bay, Jo K. Perry and Peter E. Lobie

IPA Advanced Training Course

Chapter 9. Cells Grow and Reproduce

Chapter 1 : Genetics 101

User Guide. Association analysis. Input

1 By Drs. Ingrid Waldron and. Jennifer Doherty, Department of Biology, University of Pennsylvania, These Teacher

Cytogenetics Technologies, Companies & Markets

Overview: Conducting the Genetic Orchestra Prokaryotes and eukaryotes alter gene expression in response to their changing environment

ChIP-seq analysis. J. van Helden, M. Defrance, C. Herrmann, D. Puthier, N. Servant, M. Thomas-Chollier, O.Sand

Gene Regulation - 4. One view of the Lactose Operon

Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Supplementary methods:

Patrick: An Introduction to Medicinal Chemistry 5e Chapter 06

The bases on complementary strands of DNA bond with each other in a specific way A-T and G-C

Cancer. October is National Breast Cancer Awareness Month

numbe r Done by Corrected by Doctor

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

EXPression ANalyzer and DisplayER

Mechanisms of alternative splicing regulation

The bases on complementary strands of DNA bond with each other in a specific way A-T and G-C

Not IN Our Genes - A Different Kind of Inheritance.! Christopher Phiel, Ph.D. University of Colorado Denver Mini-STEM School February 4, 2014

Data mining with Ensembl Biomart. Stéphanie Le Gras

The Meaning of Genetic Variation

Genomic structural variation

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

MRC-Holland MLPA. Description version 08; 30 March 2015

DNA, Genes, and Chromosomes. The instructions for life!!!

CELLS. Cells. Basic unit of life (except virus)

Genetics and Genomics in Medicine Chapter 6 Questions

Cytogenetics 101: Clinical Research and Molecular Genetic Technologies

PRECISION INSIGHTS. GPS Cancer. Molecular Insights You Can Rely On. Tumor-normal sequencing of DNA + RNA expression.

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Genome-wide Association Studies (GWAS) Pasieka, Science Photo Library

September 20, Submitted electronically to: Cc: To Whom It May Concern:

6.3 DNA Mutations. SBI4U Ms. Ho-Lau

Question #1 Controls on cell growth and division turned on and off

Lecture 2: Virology. I. Background

Integrative Omics for The Systems Biology of Complex Phenotypes

Sections 12.3, 13.1, 13.2

Transcription:

Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing and Data Mining Spring 2010 Breast cancer is a cancer that starts in the tissues of the breast Over the course of a lifetime, 1 in 8 women will be diagnosed with breast cancer. What are the causes? Can we do anything about this? Lu CSC 209 Fall 2009 2 Risk factors you cannot change include: Age and gender Family history of breast cancer Genes -- defects in the BRCA1 and BRCA2 genes Menstrual cycle. Treatment Plan Selection Treatment is based on many factors, including type and stage of the cancer whether the cancer is sensitive to certain hormones whether or not the cancer overproduces (over-expresses) a gene called HER2/neu Lu CSC 209 Fall 2009 3 Lu CSC 209 Fall 2009 4 1

Cancer Treatments Chemotherapy to kill cancer cells Radiation therapy to destroy cancerous tissue Surgery to remove cancerous tissue Hormonal therapy to block certain hormones that fuel cancer growth Targeted therapy to interfere with cancer cell grow and function Targeted therapy newer type also called biologic therapy uses special anti-cancer drugs that identify certain changes in a cell that can lead to cancer this type of medicine plus chemotherapy can cut the risk of the cancer coming back by 50% Lu CSC 209 Fall 2009 5 Lu CSC 209 Fall 2009 6 Targeted Therapy for Breast Cancer Treatment Aimed at specific processes of cancer cell growth, division and lifecycle Existing targeted therapies for breast cancer include Avastin, Herceptin, Iressa, and Tykerb Each of these drugs has specific effects on cancer cells Needs for new drug Microarray Profile Data An Array or slide is a collection of features spatially arranged in a two dimensional grid, arranged in columns and rows microarrays can be used to measure changes in expression levels, to detect problems of our body Lu CSC 209 Fall 2009 7 Lu CSC 209 Fall 2009 8 2

Our body Our body consists of a number of organs Each organ composes of a number of tissues Each tissue composes of cells of the same type. Cell Cell performs two type of functions Perform chemical reactions necessary to maintain our life Pass the information for maintaining life to the next generation We also know that Protein performs chemical reactions DNA stores and passes information RNA is the intermediate between DNA and proteins Lu CSC 209 Fall 2009 9 Lu CSC 209 Fall 2009 10 DNA DNA stores the instruction needed by the cell to perform daily life function. It consists of two strands which interwoven together and form a double helix. Each strand is a chain of some small molecules called nucleotides. Double stranded DNA Normally, DNA is double stranded within a cell. The two strands are antiparallel. One strand is the reverse complement of another one. The double strands are interwoven together and form a double helix. One reason for double stranded is that it eases DNA replicate. Lu CSC 209 Fall 2009 11 Lu CSC 209 Fall 2009 12 3

Some terms related to DNA Genome Chromosome Gene Lu CSC 209 Fall 2009 13 Chromosome Usually, a DNA is tightly wound around histone proteins and forms a chromosome. The total information stored in all chromosomes constitute a genome. In most multi-cell organisms, every cell contains the same complete set of genome. May have some small difference due to mutation Example: Human Genome: has 3G base pairs, organized in 23 pairs of chromosomes Lu CSC 209 Fall 2009 14 Gene A gene is a sequence of DNA that encodes a protein or an RNA molecule. In human genome, it is expected there are 30,000 35,000 genes. For gene that encodes protein, In Prokaryotic genome, one gene corresponds to one protein In Eukaryotic genome, one gene can corresponds to more than one protein because of the process alternative splicing Lu CSC 209 Fall 2009 15 Central Dogma Central Dogma tells us how we get the protein from the gene. This process is called gene expression. The expression of gene consists of two steps Transcription: DNA mrna Translation: mrna Protein Post-translation Modification: Protein Modified protein RNA AAAA DNA Protein Modified Protein Lu CSC 209 Fall 2009 16 4

More on Gene Structure regulatory region 5' untranslated region coding region 3' untranslated region Gene has 4 regions Coding region contains the codons for protein. It is also called open reading frame. Its length is a multiple of 3. It must begin with start codon, end with end codon, and the rest of its codons are not a end codon. mrna transcript contains 5 untranslated region + coding region + 3 untranslated region Regulatory region contains promoter, which regulate the transcription process. Lu CSC 209 Fall 2009 17 Hybridization Among thousands of DNA fragments, Biologists routinely need to find a DNA fragment which contains a particular DNA subsequence. This can be done based on hybridization. 1. Suppose we need to find a DNA fragments which contains ACCGAT. 2. Create probes which is inversely complementary to ACCGAT. 3. Mix the probes with the DNA fragments. 4. Due to the hybridization rule (A=T, C G), DNA fragments which contain ACCGAT will hybridize with the probes. Lu CSC 209 Fall 2009 18 DNA array The idea of hybridization leads to the DNA array technology. In the past, one gene in one experiment Hard to get the whole picture DNA array is a technology which allows researchers to do experiment on a set of genes or even the whole genome. DNA array s idea (I) An orderly arrangement of thousands of spots. Each spot contains many copies of the same DNA fragment. Lu CSC 209 Fall 2009 19 Lu CSC 209 Fall 2009 20 5

DNA array s idea (II) When the array is exposed to the target solution, DNA fragments in both array and target solution will match based on hybridization rule: A=T, C G (hydrogen bond) Such idea allows us to do thousands of hybridization experiments at the same time. DNA sample hybridize Applications of DNA arrays Sequencing by hybridization A promising alternative to sequencing by gel electrophoresis It may be able to reconstruct longer DNA sequences in shorter time Expression profile of a cell DNA arrays allow us to monitor the activities within a cell Each spot contains the complement of a particular gene Due to hybridization, we can measure the concentration of different mrnas within a cell SNP detection Using probes with different alleles to detect the single nucleotide variation. Many many other applications! Lu CSC 209 Fall 2009 21 Lu CSC 209 Fall 2009 22 Gene regulation Every cell of an organism has exactly the same genome Different cells express different set of genes to form different types of tissues How does a cell know what genes are required and when they should express? the process of controlling the expression of gene is called gene regulation Lu CSC 209 Fall 2009 23 Transcription Factors (TF s) Gene regulation dictates when, where (in what tissue) and how much quantity of a particular proteins is produced The most direct control mechanism is transcription regulation RNA-polymerase is responsible for the transcription with the assistance of a number of DNA binding proteins called transcription factors (TF s) Lu CSC 209 Fall 2009 24 6

Binding-sites finding using ChIP-PET or ChIP-seq What is Binding site? Problem: objectives looking for the association rules between a group of TF s (module) and a target gene behavior ( or ) from a single time-series profile data Binding site of Fos Genome Lu CSC 209 Fall 2009 25 Lu CSC 209 Fall 2009 26 Problem: outputs Desired results which are expressed in association rules: (TF1 TF2 TF3 ) target gene (TF1 TF2, 28 dist. 30) ( ) target gene Lu CSC 209 Fall 2009 27 Enhancers - 1 One type of TF -- transcription factors ("Enhancer-binding protein") bind to regions of DNA that are thousands of base pairs away from the gene they control. Binding increases the rate of transcription of the gene. Enhancers can be located upstream, downstream, or even within the gene they control. Lu CSC 209 Fall 2009 28 7

Enhancer - 2 Silencers Another type of TF Silencers are control regions of DNA that, like enhancers, may be located thousands of base pairs away from the gene they control. However, when transcription factors bind to them, expression of the gene they control is repressed. Lu CSC 209 Fall 2009 29 Lu CSC 209 Fall 2009 30 Problem: Input data - 1 Data file 1: describes the details of each candidate enhancer: location of the TF binding site relative to the peak location of the enhancer Each row is an enhancer. Each factor has 3 columns: Column 1: position Column 2: PWM score Column 3: p-value Problem: Input Data - 2 Data file 2 the enhancer expression time series profile data for a particular breast cancer patient. Column Columns A F are for enhancer candidate Columns G S are for breast gene K: transcription start site (TSS) L: transcription end site M: distance between TSS and the peak location M, O, P, Q, R, S: the microarray expression Level at different time (3hr, 6hr, 9hr, 12hr, 24hr, and 48hr) Lu CSC 209 Fall 2009 31 Lu CSC 209 Fall 2009 32 8

Inferring Transcriptional Module Methods Data pre-processing Efficient algorithm for ranking and sorting profile data into input data for the data mining algorithm to be used Data mining Algorithm to identify the association rules that can be helpful to our understanding of regulatory network drug design Early work -1 GRAM 2003 [1] The GRAM algorithm explicitly links genes to the factors that regulate them by incorporating DNA binding data > biological insights into the regulatory network GSEA 2005 [2] Gene Set Enrichment Analysis (GSEA) yields insights into several cancer-related data sets including Leukemia and lung cancer. Lu CSC 209 Fall 2009 33 Lu CSC 209 Fall 2009 34 Early work - 2 ReMoDiscovery 2006 [3] A two-step methodology to unravel active modules based on the concurrent analysis of three data sources: Seed discovery step predicts seed module Seed extension step optimizes the gene content of the module and indicates whether the module is active or not in regulation Data integration is tackled using the Apriori algorithm Early work - 3 EEM 2009 [4] We still have little knowledge about regulatory mechanisms underlying the trascriptome EEM the latest module discovery method by extending previously reported module discovery methods, and applied it to breast cancer expression data Identified 10 principle expression modules based on their expression coherence Lu CSC 209 Fall 2009 35 Lu CSC 209 Fall 2009 36 9

Spring 2019 CSC 177 Data Warehousing and Data Mining Elective for both CSC graduate and undergraduate A course term project can lead to a MS project on data warehousing or data mining CSC 177 MS Project topics Data mining applications: Inferring transcriptional modules from cancer profile data Your choices of data mining problem domain Data warehousing courseware development [5] Data mining concept animation library [6] Lu CSC 209 Fall 2009 37 Lu CSC 209 Fall 2009 38 References 1. Computational discovery of gene modules and regulatory networks by Z. Bar-Joseph, Nature Biotechnology, November 2003. 2. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles by A. Subramanian, PNAS, October 2005. 3. Inferring transcriptional modules from ChIP-chip, motif and microarray data by Karen Lemmens, Genome Biology, 2006. 4. Gene set-based module discovery in the breast cancer transcriptome by Atsushi Niida, BMC Bioinformatics, February, 2009. 5. http://gaia.ecs.csus.edu/~enroll/enrolldw/intro.php 6. Towards a data mining concept animation library by Nisarg Rajesh Shah and Meiliu Lu, Research, Reflections and Innovations in Integrating ICT I Education Vol. 3, 2009. Lu CSC 209 Fall 2009 39 10