Low Abundance Genes. DNA Microarrays. Log Transformation? Basic Idea. Mining for Low-abundance Transcripts in Microarray Data. Exploratory Methods

Size: px
Start display at page:

Download "Low Abundance Genes. DNA Microarrays. Log Transformation? Basic Idea. Mining for Low-abundance Transcripts in Microarray Data. Exploratory Methods"

Transcription

1 Mining for Low-abundance Transcripts in Microarray Data Yi Lin 1, Samuel T. Nadler 2, Alan D. Attie 2, Brian S. Yandell 1,3 1 Statistics, 2 Biochemistry, 3 Horticulture, University of Wisconsin-Madison September 2 Basic Idea DNA microarrays present tremendous opportunities to understand complex processes Transcription factors and receptors expressed at low levels, missed by most other methods Analyze low expression genes with normal scores Robustly adapt to changing variability across average expression levels Basis for clustering and other exploratory methods Data-driven p-value sensitive to low abundance & changes in variability with gene expression DNA Microarrays 1-1, items per chip cdna, oligonucleotides, proteins dozens of technologies, changing rapidly expose live tissue from organism mrna cdna hybridize with chip read expression signal as intensity (fluorescence) design considerations compare conditions across or within chips worry about image capture of signal Low Abundance Genes background adjustment remove local geography comparing within and between chips negative values after adjustment low abundance genes virtually absent in one condition could be important genes: transcription factors, receptors large measurement variability early technology (bleeding edge) prevalence across genes on a chip -2% per chip 1-5% across multiple conditions Log Transformation? tremendous scale range in mean intensities 1-1 fold common concentrations of chemicals (ph) fold changes have intuitive appeal looks pretty good in practice want transformed data to be roughly normal easy to test if no difference across conditions looking for genes that are outliers beyond edge of bell shaped curve provide formal or informal thresholds Exploratory Methods Clustering methods (Eisen 1998, Golub 1999) Self-organizing maps (Tamayo 1999) search for genes with similar changes across conditions do not determine significance of changes in expression require extensive pre-filtering to eliminate low intensity modest fold changes may detect patterns unrelated to fold change comparison of discrimination methods (Dudoit 2) 1

2 Confirmatory Methods ratio-based decisions for 2 conditions (Chen 1997) constant variance of ratio on log scale, use normality anova (Kerr 2, Dudoit 2) handles multiple conditions in anova model constant variance on log scale, use normality Bayesian inference (Newton 2, Tsodikov 2) Gamma-Gamma model variance proportional to squared intensity error model (Roberts 2, Hughes 2) variance proportional to squared intensity transform to log scale, use normality. acquire data Q, B 1. adjust for background A=Q B 2. rank order genes R=rank(A)/(n+1) 3. normal scores N=qnorm(R) 4. contrast conditions Y=N 1 N 2 Y = contrast 7. standardize Z= Y center spread 6. center & spread X = mean 5. mean intensity X=mean(N) Normal Scores Procedure adjusted expression A = Q B rank order R = rank(a) / (n+1) normal scores N = qnorm( R ) average intensity X = (N 1 +N 2 )/2 difference Y = N 1 N 2 variance Var(Y X) σ 2 (X) standardization S = [Y µ(x)]/σ(x) Motivate Normal Scores natural transformation to normality is log(a) background intensity B = bd +ω B measured with error ω attenuation d may depend on condition gene measurement Q = [αexp(g+h+ε)+b]d+ω Q gene signal g degree of hybridization h: intrinsic noise ε (variance may depend on g) attenuation α (depends thickness of sample, etc.) subtract background: A = Q B adjusted measurements A = dαexp(g+h+ε)+δ symmetric measurement error δ=ω B ω Q Motivate Normal Scores (cont.) adjusted measurements: A = Q B = dαg+δ log expression level: log(g) = g+h+ε gene signal g confounded with hybridization h unless hybridization h independent of condition G is observed if no measurement error δ no dye or array effect (no attenuation dα) no background intensity natural under this model to consider N=log(G) normal scores almost as good Robust Center & Spread genes sorted based on X partitioned into many (about 4) slices containing roughly the same number of genes slices summarized by median and MAD MAD = median absolute deviation robust to outliers (e.g. changing genes) MAD ~ same distribution across X up to scale MAD i = σ i Z i, Z i ~ Z, i = 1,,4 log(mad i ) = log(σ i ) + log( Z i ) median ~ same idea 2

3 Robust Center & Spread MAD ~ same distribution across X up to scale log(mad i ) = log(σ i ) + log( Z i ), I = 1,,4 regress log(mad i ) on X i with smoothing splines smoothing parameter tuned automatically generalized cross validation (Wahba 199) globally rescale anti-log of smooth curve Var(Y X) σ 2 (X) can force σ 2 (X) to be decreasing similar idea for median E(Y X) µ(x) Motivation for Spread log expression level: log(g) = g+h+ε hybridization h negligible or same across conditions intrinsic noise ε may depend on gene signal g compare two conditions 1 and 2 Y = N 1 N 2 log(g 1 ) log(g 2 ) = g 1 g 2 + ε 1 -ε 2 no differential expression: g 1 =g 2 =g Var(Y g) = σ 2 (g) g X suggests condition on X instead, but: Var(Y X) not exactly σ 2 (X) cannot be determined without further assumptions Simulation of Spread Recovery 1, genes: log expression Normal(4,2) 5% altered genes: add Normal(,2) no measurement error, attenuation estimate robust spread Robust Spread Estimation Estimate of Spread Q-Q Plot spread Sample Quantiles mean expression Bonferroni-corrected p-values standardized normal scores S = [Y µ(x)]/σ(x) ~ Normal(,1)? genes with differential expression more dispersed Zidak version of Bonferroni correction p = 1 (1 p 1 ) n 13, genes with an overall level p =.5 each gene should be tested at level 1.95*1-6 differential expression if S > 4.62 differential expression if Y µ(x) > 4.62σ(X) too conservative? weight by X? Dudoit (2) Simulation Study simulations with two conditions 1, genes g 1,g 2 ~ Normal(4,2) for nonchanging genes 5% with differential expression g c ~ Normal(4,2) + Normal(3-rank(X)/(n+1),1/2) up- or down-regulated with probability 1/2 intrinsic noise ε ~ Normal(,.5) attenuation α = 1 measurement error variance =, 1, 2, 5, 1, 2 3

4 Comparison of Methods 5 differential expression for two conditions Newton (2) J Comp Biol Gamma-Gamma-Bernouli model Bayesian odds of differential expression Chen (1997) constant ratio of expressions underlying log-normality 1 normal scores (Lin 2) some (unknown) transformation to normality robust, smooth estimate of spread & center Bonferroni-style p-values Capturing Changed Genes: Noise.2 Fold Change 1. Fold Change Capturing Changed Genes: No Noise No Noise: Average Intensity.5 E. coli with IPTG-b Comparison with E. coli Data Normal Scores IPTG-b Cy3/Cy5 ratio ~15 genes at low abundance including key genes Cy3/Cy5 ratio 1 e-3 1 e-1 1 e+3 Bayesian odds of differential expression IPTG-b known to affect only a few genes Raw IPTG-b 4,+ genes (whole genome) Newton (2) J Comp Biol Noise 1: Average Intensity 5. Number of Changed Genes Captured Success in Capture 1 e-2 1 e+ Mean Intensity 1 e Mean Intensity 2. 4

5 Diabetes & Obesity Study 13,+ gene fragments (11,+ genes) oligonuleotides, Affymetrix gene chips mean(pm) - mean(nm) adjusted expression levels six conditions in 2x3 factorial lean vs. obese B6, F1, BTBR mouse genotype adipose tissue influence whole-body fuel partitioning might be aberrant in obese and/or diabetic subjects Nadler (2) PNAS Low Abundance Obesity Genes low mean expression on at least 1 of 6 conditions negative adjusted values ignored by clustering routines transcription factors I-κB modulates transcription - inflammatory processes RXR nuclear hormone receptor - forms heterodimers with several nuclear hormone receptors regulation proteins protein kinase A glycogen synthase kinase-3 roughly 1 genes 9 new since Nadler (2) PNAS Table 1. Genes Differentially Expressed with Obesity as Identified by the Normal Scores Procedure Accesion # Description Obese/Lean Lean/Obese Average p-value aa68251 Mus musculus Dp1l1 mrna for polyposis locus protein 1-like 1 (TB2 protein-like ) x93999 M.musculus Gal beta-1,3-galnac-specific GalNAc alpha-2,6-sialyltransferase gene w45955 No significant homologies U16297 Mus musculus cytochrome B561 (mcyt) X72862* M.musculus gene for beta-3-adrenergic receptor X12521 Mouse mrna for transition protein 1 TP aa No significant homologies AA62269 Mus musculus protamine AA Mus musculus transition protein 1 (Tnp1) X63349 M.musculus tyrosinase-related protein AA11671 Mus musculus serum response factor (Srf) AA15229 Mus musculus hepsin (Hpn) aa26714 No significant homologies aa48365 Mus musculus mrna for SRp25 nuclear protein j5663 Mouse vas deferens androgen related protein (MVDP) w49331 No significant homologies J2935 Mouse camp-dependent protein kinase type II regulatory subunit aa195 No significant homologies m3181 Mouse 2,3 -cyclic-nucleotide 3 -phosphodiesterase AA33381 No significant homologies aa No significant homologies u2883 Mus musculus Balb/c mammary-derived growth inhibitor (MDGI) AA No significant homologies D14883 Mouse C33/R2/IA W36455 Mus musculus adipsin (Adn) m2751 Mus musculus protamine X8796 M.musculus brevican AA48974 No significant homologies ET63455 Mus musculus serum amyloid A-4 protein (Saa4) X6526 M.musculus mrna for GTP-binding protein AA Similar to Rat S6 kinase ET6236 CALCIUM-DEPENDENT PHOSPHOLIPASE A2 PRECURSOR w64688 No significant homologies AA3587 No significant homologies AA Mus musculus hippocalcin-like 1 (Hpcal1) w1922 No significant homologies w48392 similar X7518 Id4 helix-loop-helix protein AA Sequence withdrawn X99143 M.musculus hair keratin, mhb AA similar to Human ras-related protein rab W1926 Mus musculus mrna for ubiquitin like protein AA13862 Similar to Human hqp J4179 Mouse chromatin nonhistone high mobility group protein (HGM-I(Y)) X355 Mouse gene exon 2 for serum amyloid A (SAA) 3 protein U957 Mus musculus p21 (Waf1) AA2399 Mus musculus dutpase u7746 Mus musculus anaphylatoxin C3a receptor x66224 M.musculus retinoid X receptor-beta AA Mus musculus protein tyrosine phosphatase (Ptpn16) W85163 Mus musculus migration inhibitory factor Low Abundance Genes for Obesity Obese vs. Lean Microarray ANOVAs Kerr (2) gene by condition interaction N ijk = gene i + condition j + gene*condition ij + rep error ijk conditions organized in factorial design experimental units may be whole or part of array genes are random effects focus on outliers (BLUPs), not variance components gene*condition ij = differential expression allow variance to depend on gene i main effect replication to improve precision, catch gross errors Average Intensity for Obesity 5

6 Microarray Random Effects variance component for non-changing genes robust estimate of MS(G*C) using smoothed MAD rescale normal score response N by spread σ(x) look for differential expression or use clustering methods variance component for replication robust estimate of MSE using smoothed MAD look for outliers = gross errors Obesity Genotype Main Effects Obesity Dominance Obesity Additive Microarray QTLs condition may be genotype whole organism or pattern of genes genotype may be inferred rather than known exactly N ijk = gene i + QTL j + gene*qtl ij + individual ijk QTL genotype depends on flanking markers mixture model across possible QTL genotypes single vs. multiple QTL single QTL may influence numerous genes epistasis = inter-genic interaction modification of biochemical pathway(s) 6

Cancer outlier differential gene expression detection

Cancer outlier differential gene expression detection Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,

More information

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit

Experimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Experimental Design For Microarray Experiments Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Copyright 2002 Complexity of Genomic data the functioning of cells is a complex and highly

More information

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data

Data analysis and binary regression for predictive discrimination. using DNA microarray data. (Breast cancer) discrimination. Expression array data West Mike of Statistics & Decision Sciences Institute Duke University wwwstatdukeedu IPAM Functional Genomics Workshop November Two group problems: Binary outcomes ffl eg, ER+ versus ER ffl eg, lymph node

More information

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality.

Nature Genetics: doi: /ng Supplementary Figure 1. Assessment of sample purity and quality. Supplementary Figure 1 Assessment of sample purity and quality. (a) Hematoxylin and eosin staining of formaldehyde-fixed, paraffin-embedded sections from a human testis biopsy collected concurrently with

More information

islet insulin production in diabetes model: spatial comparison of mouse strains

islet insulin production in diabetes model: spatial comparison of mouse strains islet insulin production in diabetes model: spatial comparison of mouse strains mentor: Brian S. Yandell, UW-Biostat Trainee www.stat.wisc.edu/~yandell scientist: Donnie Stapleton, Attie Biochem Lab dsstaple@biochem.wisc.edu

More information

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)

Statistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information) Statistical Assessment of the Global Regulatory Role of Histone Acetylation in Saccharomyces cerevisiae (Support Information) Authors: Guo-Cheng Yuan, Ping Ma, Wenxuan Zhong and Jun S. Liu Linear Relationship

More information

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies

Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Statistical Analysis of Single Nucleotide Polymorphism Microarrays in Cancer Studies Stanford Biostatistics Workshop Pierre Neuvial with Henrik Bengtsson and Terry Speed Department of Statistics, UC Berkeley

More information

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Gene Selection for Tumor Classification Using Microarray Gene Expression Data Gene Selection for Tumor Classification Using Microarray Gene Expression Data K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung Department of Computer Science New Mexico Institute of Mining and Technology

More information

Supplement to SCnorm: robust normalization of single-cell RNA-seq data

Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplementary Note 1: SCnorm does not require spike-ins, since we find that the performance of spike-ins in scrna-seq is often compromised,

More information

MOST: detecting cancer differential gene expression

MOST: detecting cancer differential gene expression Biostatistics (2008), 9, 3, pp. 411 418 doi:10.1093/biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

SUPPLEMENTARY FIGURES

SUPPLEMENTARY FIGURES SUPPLEMENTARY FIGURES Figure S1. Clinical significance of ZNF322A overexpression in Caucasian lung cancer patients. (A) Representative immunohistochemistry images of ZNF322A protein expression in tissue

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University

Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics. Mike West Duke University Aspects of Statistical Modelling & Data Analysis in Gene Expression Genomics Mike West Duke University Papers, software, many links: www.isds.duke.edu/~mw ABS04 web site: Lecture slides, stats notes, papers,

More information

Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains

Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains Genetic Analysis of Anxiety Related Behaviors by Gene Chip and In situ Hybridization of the Hippocampus and Amygdala of C57BL/6J and AJ Mice Brains INTRODUCTION To study the relationship between an animal's

More information

Sample Size Estimation for Microarray Experiments

Sample Size Estimation for Microarray Experiments Sample Size Estimation for Microarray Experiments Gregory R. Warnes Department of Biostatistics and Computational Biology Univeristy of Rochester Rochester, NY 14620 and Peng Liu Department of Biological

More information

Biol220 Cell Signalling Cyclic AMP the classical secondary messenger

Biol220 Cell Signalling Cyclic AMP the classical secondary messenger Biol220 Cell Signalling Cyclic AMP the classical secondary messenger The classical secondary messenger model of intracellular signalling A cell surface receptor binds the signal molecule (the primary

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Complex Trait Genetics in Animal Models. Will Valdar Oxford University

Complex Trait Genetics in Animal Models. Will Valdar Oxford University Complex Trait Genetics in Animal Models Will Valdar Oxford University Mapping Genes for Quantitative Traits in Outbred Mice Will Valdar Oxford University What s so great about mice? Share ~99% of genes

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Bayesian hierarchical modelling

Bayesian hierarchical modelling Bayesian hierarchical modelling Matthew Schofield Department of Mathematics and Statistics, University of Otago Bayesian hierarchical modelling Slide 1 What is a statistical model? A statistical model:

More information

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5 PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Application of Resampling Methods in Microarray Data Analysis

Application of Resampling Methods in Microarray Data Analysis Application of Resampling Methods in Microarray Data Analysis Tests for two independent samples Oliver Hartmann, Helmut Schäfer Institut für Medizinische Biometrie und Epidemiologie Philipps-Universität

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Nature Getetics: doi: /ng.3471

Nature Getetics: doi: /ng.3471 Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer) Project Dilemmas How do I know when I m done? How do I know what I ve accomplished? clearly define focus/goal from beginning design a search method that handles plateaus improve some ML method s robustness

More information

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays

RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,

More information

AP Statistics. Semester One Review Part 1 Chapters 1-5

AP Statistics. Semester One Review Part 1 Chapters 1-5 AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists.

chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists. chapter 1 - fig. 1 The -omics subdisciplines. chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists. 201 figures chapter 1 chapter 2 - fig. 1 Schematic overview of the different steps

More information

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.

Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock

More information

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,

More information

KEY CONCEPT QUESTIONS IN SIGNAL TRANSDUCTION

KEY CONCEPT QUESTIONS IN SIGNAL TRANSDUCTION Signal Transduction - Part 2 Key Concepts - Receptor tyrosine kinases control cell metabolism and proliferation Growth factor signaling through Ras Mutated cell signaling genes in cancer cells are called

More information

Biostatistics for Med Students. Lecture 1

Biostatistics for Med Students. Lecture 1 Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html

More information

2402 : Anatomy/Physiology

2402 : Anatomy/Physiology Dr. Chris Doumen Lecture 2 2402 : Anatomy/Physiology The Endocrine System G proteins and Adenylate Cyclase /camp TextBook Readings Pages 405 and 599 through 603. Make use of the figures in your textbook

More information

Receptor mediated Signal Transduction

Receptor mediated Signal Transduction Receptor mediated Signal Transduction G-protein-linked receptors adenylyl cyclase camp PKA Organization of receptor protein-tyrosine kinases From G.M. Cooper, The Cell. A molecular approach, 2004, third

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive

More information

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data Breast cancer Inferring Transcriptional Module from Breast Cancer Profile Data Breast Cancer and Targeted Therapy Microarray Profile Data Inferring Transcriptional Module Methods CSC 177 Data Warehousing

More information

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI / Index A Adjusted Heterogeneity without Overdispersion, 63 Agenda-driven bias, 40 Agenda-Driven Meta-Analyses, 306 307 Alternative Methods for diagnostic meta-analyses, 133 Antihypertensive effect of potassium,

More information

HS Exam 1 -- March 9, 2006

HS Exam 1 -- March 9, 2006 Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,

More information

Analysis of Unbalanced Microarray Data

Analysis of Unbalanced Microarray Data Journal of Data Science 1(2003), 103-121 Analysis of Unbalanced Microarray Data Mei-Ling Ting Lee 1,2,3,4,G.A.Whitmore 5, Rus Y. Yukhananov 1,2 1 Brigham and Women s Hospital, 2 Harvard Medical School,

More information

Chapter 1: Introduction to Statistics

Chapter 1: Introduction to Statistics Chapter 1: Introduction to Statistics Variables A variable is a characteristic or condition that can change or take on different values. Most research begins with a general question about the relationship

More information

The Association Design and a Continuous Phenotype

The Association Design and a Continuous Phenotype PSYC 5102: Association Design & Continuous Phenotypes (4/4/07) 1 The Association Design and a Continuous Phenotype The purpose of this note is to demonstrate how to perform a population-based association

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

General Principles of Endocrine Physiology

General Principles of Endocrine Physiology General Principles of Endocrine Physiology By Dr. Isabel S.S. Hwang Department of Physiology Faculty of Medicine University of Hong Kong The major human endocrine glands Endocrine glands and hormones

More information

Classification of cancer profiles. ABDBM Ron Shamir

Classification of cancer profiles. ABDBM Ron Shamir Classification of cancer profiles 1 Background: Cancer Classification Cancer classification is central to cancer treatment; Traditional cancer classification methods: location; morphology, cytogenesis;

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Research Analysis MICHAEL BERNSTEIN CS 376

Research Analysis MICHAEL BERNSTEIN CS 376 Research Analysis MICHAEL BERNSTEIN CS 376 Last time What is a statistical test? Chi-square t-test Paired t-test 2 Today ANOVA Posthoc tests Two-way ANOVA Repeated measures ANOVA 3 Recall: hypothesis testing

More information

Unsupervised Identification of Isotope-Labeled Peptides

Unsupervised Identification of Isotope-Labeled Peptides Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

IPA Advanced Training Course

IPA Advanced Training Course IPA Advanced Training Course October 2013 Academia sinica Gene (Kuan Wen Chen) IPA Certified Analyst Agenda I. Data Upload and How to Run a Core Analysis II. Functional Interpretation in IPA Hands-on Exercises

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Data analysis in microarray experiment

Data analysis in microarray experiment 16 1 004 Chinese Bulletin of Life Sciences Vol. 16, No. 1 Feb., 004 1004-0374 (004) 01-0041-08 100005 Q33 A Data analysis in microarray experiment YANG Chang, FANG Fu-De * (National Laboratory of Medical

More information

Signal Transduction Cascades

Signal Transduction Cascades Signal Transduction Cascades Contents of this page: Kinases & phosphatases Protein Kinase A (camp-dependent protein kinase) G-protein signal cascade Structure of G-proteins Small GTP-binding proteins,

More information

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer

Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Boosted PRIM with Application to Searching for Oncogenic Pathway of Lung Cancer Pei Wang Department of Statistics Stanford University Stanford, CA 94305 wp57@stanford.edu Young Kim, Jonathan Pollack Department

More information

RNA-seq. Design of experiments

RNA-seq. Design of experiments RNA-seq Design of experiments Experimental design Introduction An experiment is a process or study that results in the collection of data. Statistical experiments are conducted in situations in which researchers

More information

SUPPLEMENTARY INFORMATION. Rett Syndrome Mutation MeCP2 T158A Disrupts DNA Binding, Protein Stability and ERP Responses

SUPPLEMENTARY INFORMATION. Rett Syndrome Mutation MeCP2 T158A Disrupts DNA Binding, Protein Stability and ERP Responses SUPPLEMENTARY INFORMATION Rett Syndrome Mutation T158A Disrupts DNA Binding, Protein Stability and ERP Responses Darren Goffin, Megan Allen, Le Zhang, Maria Amorim, I-Ting Judy Wang, Arith-Ruth S. Reyes,

More information

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks

MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1. Lecture 27: Systems Biology and Bayesian Networks MBios 478: Systems Biology and Bayesian Networks, 27 [Dr. Wyrick] Slide #1 Lecture 27: Systems Biology and Bayesian Networks Systems Biology and Regulatory Networks o Definitions o Network motifs o Examples

More information

Supplementary Materials for

Supplementary Materials for www.sciencetranslationalmedicine.org/cgi/content/full/7/303/303ra139/dc1 Supplementary Materials for Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular

More information

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit APPLICATION NOTE Ion PGM System Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit Key findings The Ion PGM System, in concert with the Ion ReproSeq PGS View Kit and Ion Reporter

More information

V. Gathering and Exploring Data

V. Gathering and Exploring Data V. Gathering and Exploring Data With the language of probability in our vocabulary, we re now ready to talk about sampling and analyzing data. Data Analysis We can divide statistical methods into roughly

More information

Experimental Studies. Statistical techniques for Experimental Data. Experimental Designs can be grouped. Experimental Designs can be grouped

Experimental Studies. Statistical techniques for Experimental Data. Experimental Designs can be grouped. Experimental Designs can be grouped Experimental Studies Statistical techniques for Experimental Data Require appropriate manipulations and controls Many different designs Consider an overview of the designs Examples of some of the analyses

More information

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego Biostatistics Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego (858) 534-1818 dsilverstein@ucsd.edu Introduction Overview of statistical

More information

Lecture 15. Signal Transduction Pathways - Introduction

Lecture 15. Signal Transduction Pathways - Introduction Lecture 15 Signal Transduction Pathways - Introduction So far.. Regulation of mrna synthesis Regulation of rrna synthesis Regulation of trna & 5S rrna synthesis Regulation of gene expression by signals

More information

Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes

Knowledge Discovery and Data Mining. Testing. Performance Measures. Notes. Lecture 15 - ROC, AUC & Lift. Tom Kelsey. Notes Knowledge Discovery and Data Mining Lecture 15 - ROC, AUC & Lift Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-17-AUC

More information

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s Using Bayesian Networks to Analyze Expression Data Xu Siwei, s0789023 Muhammad Ali Faisal, s0677834 Tejal Joshi, s0677858 Outline Introduction Bayesian Networks Equivalence Classes Applying to Expression

More information

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems.

Normal Distribution. Many variables are nearly normal, but none are exactly normal Not perfect, but still useful for a variety of problems. Review Probability: likelihood of an event Each possible outcome can be assigned a probability If we plotted the probabilities they would follow some type a distribution Modeling the distribution is important

More information

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: Research Methods 1 Handouts, Graham Hole,COGS - version 10, September 000: Page 1: T-TESTS: When to use a t-test: The simplest experimental design is to have two conditions: an "experimental" condition

More information

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc. Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,

More information

Bayesian Prediction Tree Models

Bayesian Prediction Tree Models Bayesian Prediction Tree Models Statistical Prediction Tree Modelling for Clinico-Genomics Clinical gene expression data - expression signatures, profiling Tree models for predictive sub-typing Combining

More information

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction

Abstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant

More information

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups

More information

Quantitative genetics: traits controlled by alleles at many loci

Quantitative genetics: traits controlled by alleles at many loci Quantitative genetics: traits controlled by alleles at many loci Human phenotypic adaptations and diseases commonly involve the effects of many genes, each will small effect Quantitative genetics allows

More information

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). 1. Bayesian Information Criterion 2. Cross-Validation 3. Robust 4. Imputation

More information

Memorial Sloan-Kettering Cancer Center

Memorial Sloan-Kettering Cancer Center Memorial Sloan-Kettering Cancer Center Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series Year 2007 Paper 14 On Comparing the Clustering of Regression Models

More information

Hormones and Signal Transduction. Dr. Kevin Ahern

Hormones and Signal Transduction. Dr. Kevin Ahern Dr. Kevin Ahern Signaling Outline Signaling Outline Background Signaling Outline Background Membranes Signaling Outline Background Membranes Hormones & Receptors Signaling Outline Background Membranes

More information

Multi-Channel Film Dosimetry and Gamma Map Verification

Multi-Channel Film Dosimetry and Gamma Map Verification Multi-Channel Film Dosimetry and Gamma Map Verification Micke A International Specialty Products Ashland proprietary technology, patents pending Single Channel Film Dosimetry Calibration Curve X=R R ave

More information

Supplementary Figures

Supplementary Figures Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Six Sigma Glossary Lean 6 Society

Six Sigma Glossary Lean 6 Society Six Sigma Glossary Lean 6 Society ABSCISSA ACCEPTANCE REGION ALPHA RISK ALTERNATIVE HYPOTHESIS ASSIGNABLE CAUSE ASSIGNABLE VARIATIONS The horizontal axis of a graph The region of values for which the null

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Membrane associated receptor transfers the information. Second messengers relay information

Membrane associated receptor transfers the information. Second messengers relay information Membrane associated receptor transfers the information Most signals are polar and large Few of the signals are nonpolar Receptors are intrinsic membrane proteins Extracellular and intracellular domains

More information

Theta sequences are essential for internally generated hippocampal firing fields.

Theta sequences are essential for internally generated hippocampal firing fields. Theta sequences are essential for internally generated hippocampal firing fields. Yingxue Wang, Sandro Romani, Brian Lustig, Anthony Leonardo, Eva Pastalkova Supplementary Materials Supplementary Modeling

More information

Supplementary Materials for

Supplementary Materials for www.sciencesignaling.org/cgi/content/full/8/364/ra18/dc1 Supplementary Materials for The tyrosine phosphatase (Pez) inhibits metastasis by altering protein trafficking Leila Belle, Naveid Ali, Ana Lonic,

More information

Small-area estimation of mental illness prevalence for schools

Small-area estimation of mental illness prevalence for schools Small-area estimation of mental illness prevalence for schools Fan Li 1 Alan Zaslavsky 2 1 Department of Statistical Science Duke University 2 Department of Health Care Policy Harvard Medical School March

More information

Chapter 4 DESIGN OF EXPERIMENTS

Chapter 4 DESIGN OF EXPERIMENTS Chapter 4 DESIGN OF EXPERIMENTS 4.1 STRATEGY OF EXPERIMENTATION Experimentation is an integral part of any human investigation, be it engineering, agriculture, medicine or industry. An experiment can be

More information