Supplement to SCnorm: robust normalization of single-cell RNA-seq data
|
|
- MargaretMargaret Gertrude Burns
- 5 years ago
- Views:
Transcription
1 Supplement to SCnorm: robust normalization of single-cell RNA-seq data Supplementary Note 1: SCnorm does not require spike-ins, since we find that the performance of spike-ins in scrna-seq is often compromised, and many labs do not use them for normalization 2. Specifically, spike-ins are not routinely representative of the full range of expression, show substantial bias, and are often spiked-in at much higher concentrations than targeted (Supplementary Figures 14-15). However, if good spikeins are available, performance of SCnorm may be improved in the post-normalization scaling step, which is required when multiple conditions are available. Since spike-ins are added in equal concentrations and are biologically inactive, between condition scale factors can be computed over the spike-ins alone, as detailed in Methods. Since good spike-ins are expected to be equivalently expressed (not DE) between conditions, we expect this approach will be more accurate than using the full set of target genes in the rescaling step, especially when the overall proportion of DE genes is very high (e.g. over 50%). Supplementary Note 2: While quantile regression proved to be more flexible and more robust to outliers relative to a generalized linear model based approach, we recognize that the non-linear log transformation introduces a bias in the count-depth relationship for small counts. Consequently, in addition to quantile regression, count-depth relationships are also assessed on untransformed data using generalized linear model regression with a negative binomial model using the glm.nb function in R. Supplementary Note 3: Like other methods for normalization 3 7, SCnorm leaves zeros unchanged. Consequently, the goal of SCnorm is to remove the effect of sequencing depth (and perhaps gene-specific features) among the non-zero counts. To do so, the 1
2 count-depth relationship must be estimated prior to adjustment using only non-zero count data (Supplementary Figure 12). MAST is commonly used to identify DE genes in scrna-seq data. The user has the option to test for DE on the non-zero count data (continuous component), the zeros (discrete component), or both which combines evidence from the continuous and discrete tests. A recent method, scdd 8, is similar in that tests for zeros and non-zeros are conducted separately. When two biological conditions are being compared, SCnorm rescales the normalized estimates so that the two conditions have similar means overall among the non-zero counts. Other normalization methods provide normalized estimates of expression that have similar means among all counts, which is problematic for foldchange calculations and DE testing (as shown in Figure 2). See Supplementary Figure 13 for further detail. 2
3 Supplementary Figure 1: Estimated count-depth relationships in bulk and single-cell datasets before and after normalization. Results are structurally identical to those shown in Figure 1, but using negative binomial generalized linear regression instead of median quantile regression to calculate gene-specific slopes. 3
4 Supplementary Figure 2: Fold-changes and ROC curves for SIM I. For each simulated dataset, genes are divided into four equally sized groups based on their median expression among non-zero un-normalized measurements. In each group, the genespecific difference between estimated fold-change and true fold-change is calculated for SCnorm, SCnorm.SI, MR, TPM, and scran. Boxplots of these estimates are shown in panel (a) for 100 simulations of SIM I with K=1. Panel (b) shows ROC curves for detection of differentially expressed (DE) genes for 100 simulations of SIM I with K=1. The solid line is the average over the 100 iterations, and the dashed lines represent ROC curves for five randomly chosen iterations. Panels (c) and (d) are structurally identical, for SIM I with K=4. 4
5 Supplementary Figure 3: Fold-changes and ROC curves for SIM II. For each scenario in SIM II, panels (a) (d) show the gene-specific difference calculated between estimated fold-change of non-zero counts and true fold-change for 100 simulations. Boxplots of the averages are shown for data normalized by SCnorm, TPM and scran. MR cannot be evaluated in these simulations as each gene contains at least one zero and so no genes pass the MR filter. Motivation for considering non-zero counts to calculate fold-change is discussed in Supplementary Note 3. Panels (e) (h) are structurally identical to (a) (d), but with fold-changes calculated with zeros included. Panels (i) (k) show ROC curves for detection of differentially expressed (DE) genes for 100 simulations of SIM II, scenario 2 (panel (a)), 3 (panel (b)), and 4 (panel(c)) for data normalized by SCnorm, TPM, and scran. The solid line is the average over the 100 5
6 iterations, and the dashed lines represent ROC curves for five randomly chosen iterations. Supplementary Figure 4: Fold-changes and DE genes calculated from the H9 case study data. For each gene, the fold-change of non-zero counts between the H9-4M and H9-1M groups was computed for data following normalization via SCnorm, MR, TPM, scran, SCDE, and BASiCS. Box-plots of gene-specific fold-changes are shown in panel (a) for data normalized by each method. The number of genes identified as DE using MAST is shown in panel (b). Genes are divided into four equally sized expression groups based on their median among non-zero un-normalized expression measurements and results are shown as a function of expression group. Motivation for considering non-zero counts to calculate fold-change is discussed in Supplementary Note 3. 6
7 Supplementary Figure 5: ROC curves for a comparison of S vs. G2/M in the H1- FUCCI data. For this evaluation, we subsampled cells from the S and G2/M H1-FUCCI case study data. For the subsampled cells, there are negligible differences in cellular detection rates (CDRs) between the two conditions and there is on average a 1.5 fold increase in sequencing depth (details in Methods). Without differences in CDR, we would expect an EE gene expressed at level x in S to be expressed at level 1.5*x in G2/M. Given this, we define a gold standard DE list to be those genes showing a fold change bigger than a threshold (or smaller than one over that threshold), adjusting for the expected increase in expression due to increased sequencing depth. MAST was applied as detailed in Methods to identify DE genes; and thresholds equal to 1.5, 2, 2.5, and 3 are shown here. 7
8 Supplementary Figure 6: Count-depth relationship of H1-1M data. For each gene, median quantile regression was used to estimate the count-depth relationship before normalization and after normalization via SCnorm, MR, TPM, scran, SCDE, and BASiCS. Shown are densities of slopes within each of ten equally sized gene groups where a gene s group membership is determined by its median expression among nonzero un-normalized measurements. 8
9 Supplementary Figure 7: Count-depth relationship of H1-4M data. Results are structurally identical to those shown in Supplementary Figure 6, but for the H1-4M data. 9
10 Supplementary Figure 8: Count-depth relationship for 5 publically available datasets. Results are structurally identical to those shown in Supplementary Figure 6, but for five publicly available datasets (a) prior to normalization and (b) normalized by SCnorm. 10
11 Supplementary Figure 9: Estimated count-depth relationship of H1-1M data. Results are structurally identical to those shown in Supplementary Figure 6, but using negative binomial generalized linear regression instead of median quantile regression to calculate gene-specific slopes. 11
12 Supplementary Figure 10: Count-depth relationship of H1-4M data. Results are structurally identical to those shown in Supplementary Figure 7, but using negative binomial generalized linear regression instead of median quantile regression to calculate gene-specific slopes. 12
13 Supplementary Figure 11: Count-depth relationship for five publically available datasets. Results are structurally identical to those shown in Supplementary Figure 8(b), but using negative binomial generalized linear regression instead of median quantile regression to calculate gene-specific slopes. 13
14 Supplementary Figure 12: Motivation for using quantile regression on non-zero data. Panel (a) shows true expression of one hypothetical gene in 100 cells. All variation shown is considered biological. Panel (b) shows ideal measured expression for the same gene as a function of sequencing depth where ideal measured expression is free of technical artifacts. Panel (c) shows the same gene where some cells show zero counts due to technical artifacts. Panel (d) shows expression vs. depth and estimated regression fits for quantile regression (blue) and negative binomial generalized linear model regression (red) without (solid) and with (dashed) the zero counts included. SCnorm leaves zeros unchanged and corrects for the count-depth relationship among non-zeros, which is more accurately summarized by the regression fits on non-zero data. 14
15 Supplementary Figure 13: Effect of normalization on non-zero means. Panel (a) shows measured expression for a hypothetical EE gene that is sequenced in two conditions. The sequencing depths in the first condition (red) are smaller than those in the second condition (blue). Panel (b) shows the same gene where counts have been normalized using a global scale factor based approach. Note that global scale factor based approaches provide normalized estimates of expression that have similar means among all counts which results in non-zero counts having different mean expression levels if there are differences in the proportion of zeros across conditions. Methods to identify DE genes in scrna-seq data such as MAST and scdd test on non-zero and zero counts separately and, consequently, would identify this EE gene as DE. Panel (c) shows counts normalized by SCnorm which provides normalized estimates of expression that have similar means among non-zero counts. 15
16 Supplementary Figure 14: The proportion of spike-in expression counts to the total expression counts for each cell is shown for four publicly available datasets and the H1-1M and H1-4M datasets. Cells are ordered by sequencing depth. 16
17 Supplementary Figure 15: Box-plots of log read counts in each cell are shown separately for endogenous genes (left panel) and spike-ins (right panel) for four publicly available datasets and the H1 case study data. Counts smaller than one are not shown. Cells are ordered by sequencing depth. 17
18 Supplementary Figure 16: Results from Figure 2, with NODES included. 18
19 Supplementary Figure 17: Results are structurally identical to Figure 3, but with NODES included. Misclassification rates for SCnorm, MR, TPM, scran, SCDE, and NODES averaged across the three cell cycle phases are 0.12, 0.21, 0.22, 0.22, 0.24, and.31, respectively. Note that these rates differ from those shown in Figure 3 because NODES removes genes and cells prior to normalization, and here we restrict to the genes and cells retained by NODES to facilitate comparing across methods. 19
20 Supplementary Figure 18: Results from Supplementary Figure 5, with NODES included. 20
21 Supplementary Figure 19: Summary statistics of simulated and case-study data. The empirical cumulative distribution functions of the gene-specific variances and genespecific means are shown in black in panels (a) and (b), respectively, for one SIM I dataset with. Shown in red are the empirical cumulative distribution functions of the gene-specific variances and gene-specific means for the genes sampled from the H1-1M and H1-4M datasets and used to simulate the SIM I data. Panels (c) and (d) are structurally identical, for one SIM I dataset with. Variances and means are computed on log non-zero expression measurements. 21
22 References 1. Risso, D., Schwartz, K., Sherlock, G. & Dudoit, S. GC-content normalization for RNA-Seq data. BMC Bioinformatics 12, 480 (2011). 2. Lin, Y. et al. Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. BMC Genomics 17, 28 (2016). 3. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010). 4. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). 5. L. Lun, A. T., Bach, K. & Marioni, J. C. Pooling across cells to normalize singlecell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016). 6. Kharchenko, P. V, Silberstein, L. & Scadden, D. T. Bayesian approach to singlecell differential expression analysis. Nat. Methods 11, (2014). 7. Vallejos, C. A., Marioni, J. C. & Richardson, S. BASiCS: Bayesian Analysis of Single-Cell Sequencing Data. PLOS Comput. Biol. 11, e (2015). 8. Korthauer, K. D. et al. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol. 17, 222 (2016). 22
RNA-seq: filtering, quality control and visualisation. COMBINE RNA-seq Workshop
RNA-seq: filtering, quality control and visualisation COMBINE RNA-seq Workshop QC and visualisation (part 1) Slide taken from COMBINE RNAseq workshop on 23/09/2016 RNA-seq of Mouse mammary gland Basal
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationBayesian Inference for Single-cell ClUstering and ImpuTing (BISCUIT) Elham Azizi
Bayesian Inference for Single-cell ClUstering and ImpuTing (BISCUIT) Elham Azizi BioC 2017: Where Software and Biology Connect Profiling Tumor-Immune Ecosystem in Breast Cancer Immunotherapy treatments
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationRASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays
Supplementary Materials RASA: Robust Alternative Splicing Analysis for Human Transcriptome Arrays Junhee Seok 1*, Weihong Xu 2, Ronald W. Davis 2, Wenzhong Xiao 2,3* 1 School of Electrical Engineering,
More informationExperimental Design For Microarray Experiments. Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit
Experimental Design For Microarray Experiments Robert Gentleman, Denise Scholtens Arden Miller, Sandrine Dudoit Copyright 2002 Complexity of Genomic data the functioning of cells is a complex and highly
More informationSupplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells.
SUPPLEMENTAL FIGURE AND TABLE LEGENDS Supplemental Figure S1. Expression of Cirbp mrna in mouse tissues and NIH3T3 cells. A) Cirbp mrna expression levels in various mouse tissues collected around the clock
More informationSum of Neurally Distinct Stimulus- and Task-Related Components.
SUPPLEMENTARY MATERIAL for Cardoso et al. 22 The Neuroimaging Signal is a Linear Sum of Neurally Distinct Stimulus- and Task-Related Components. : Appendix: Homogeneous Linear ( Null ) and Modified Linear
More informationNumerous hypothesis tests were performed in this study. To reduce the false positive due to
Two alternative data-splitting Numerous hypothesis tests were performed in this study. To reduce the false positive due to multiple testing, we are not only seeking the results with extremely small p values
More informationCancer outlier differential gene expression detection
Biostatistics (2007), 8, 3, pp. 566 575 doi:10.1093/biostatistics/kxl029 Advance Access publication on October 4, 2006 Cancer outlier differential gene expression detection BAOLIN WU Division of Biostatistics,
More informationNature Getetics: doi: /ng.3471
Supplementary Figure 1 Summary of exome sequencing data. ( a ) Exome tumor normal sample sizes for bladder cancer (BLCA), breast cancer (BRCA), carcinoid (CARC), chronic lymphocytic leukemia (CLLX), colorectal
More informationMethods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy
Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy Methods Research Report An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy
More informationNature Methods: doi: /nmeth.3115
Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by
More informationRNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB
RNA-Seq Preparation Comparision Summary: Lexogen, Standard, NEB CSF-NGS January 22, 214 Contents 1 Introduction 1 2 Experimental Details 1 3 Results And Discussion 1 3.1 ERCC spike ins............................................
More informationThe Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0
The Loss of Heterozygosity (LOH) Algorithm in Genotyping Console 2.0 Introduction Loss of erozygosity (LOH) represents the loss of allelic differences. The SNP markers on the SNP Array 6.0 can be used
More informationNature Neuroscience: doi: /nn Supplementary Figure 1
Supplementary Figure 1 Illustration of the working of network-based SVM to confidently predict a new (and now confirmed) ASD gene. Gene CTNND2 s brain network neighborhood that enabled its prediction by
More informationSearch settings MaxQuant
Search settings MaxQuant Briefly, we used MaxQuant version 1.5.0.0 with the following settings. As variable modifications we allowed Acetyl (Protein N-terminus), methionine oxidation and glutamine to pyroglutamate
More informationBroad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes
Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor suppressor genes Kaifu Chen 1,2,3,4,5,10, Zhong Chen 6,10, Dayong Wu 6, Lili Zhang 7, Xueqiu Lin 1,2,8,
More informationSupplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities
Supplementary information for: Human micrornas co-silence in well-separated groups and have different essentialities Gábor Boross,2, Katalin Orosz,2 and Illés J. Farkas 2, Department of Biological Physics,
More informationAn Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy
Number XX An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and Human Services 54 Gaither
More informationMOST: detecting cancer differential gene expression
Biostatistics (2008), 9, 3, pp. 411 418 doi:10.1093/biostatistics/kxm042 Advance Access publication on November 29, 2007 MOST: detecting cancer differential gene expression HENG LIAN Division of Mathematical
More information7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.
Supplementary Figure 1 7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans. Regions targeted by the Even and Odd ChIRP probes mapped to a secondary structure model 56 of the
More informationApplied Machine Learning in Biomedicine. Enrico Grisan
Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Algorithm s objective cost Formal objective for algorithms: - minimize a cost function - maximize an objective function
More informationSUPPLEMENTAL MATERIAL
1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across
More informationSystematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang
Systematic Reviews and meta-analyses of Diagnostic Test Accuracy Mariska Leeflang m.m.leeflang@amc.uva.nl This presentation 1. Introduction: accuracy? 2. QUADAS-2 exercise 3. Meta-analysis of diagnostic
More informationVARIATION IN MEASUREMENT OF HIV RNA VIRAL LOAD
VARIATION IN MEASUREMENT OF HIV RNA VIRAL LOAD SOURCES OF VARIATION (RANDOM VS SYSTEMATIC) MAGNITUDE OF EACH SOURCE CONSEQUENCES FOR CONFIDENCE LIMITS AROUND MEASUREMENTS AND CHANGES DATA FROM ROCHE HIV
More informationSawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.
Sawtooth Software RESEARCH PAPER SERIES MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB Bryan Orme, Sawtooth Software, Inc. Copyright 009, Sawtooth Software, Inc. 530 W. Fir St. Sequim,
More informationSupplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first
Supplementary Figure 1: Features of IGLL5 Mutations in CLL: a) Representative IGV screenshot of first intron IGLL5 mutation depicting biallelic mutations. Red arrows highlight the presence of out of phase
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationThe 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis
The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis Tieliu Shi tlshi@bio.ecnu.edu.cn The Center for bioinformatics
More informationST440/550: Applied Bayesian Statistics. (10) Frequentist Properties of Bayesian Methods
(10) Frequentist Properties of Bayesian Methods Calibrated Bayes So far we have discussed Bayesian methods as being separate from the frequentist approach However, in many cases methods with frequentist
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationLinear Regression in SAS
1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25
More informationTitle: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items
Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items Introduction Many studies of therapies with single subjects involve
More information(b) empirical power. IV: blinded IV: unblinded Regr: blinded Regr: unblinded α. empirical power
Supplementary Information for: Using instrumental variables to disentangle treatment and placebo effects in blinded and unblinded randomized clinical trials influenced by unmeasured confounders by Elias
More informationNature Immunology: doi: /ni Supplementary Figure 1
Supplementary Figure 1 A β-strand positions consistently places the residues at CDR3β P6 and P7 within human and mouse TCR-peptide-MHC interfaces. (a) E8 TCR containing V β 13*06 carrying with an 11mer
More informationData Analysis Using Regression and Multilevel/Hierarchical Models
Data Analysis Using Regression and Multilevel/Hierarchical Models ANDREW GELMAN Columbia University JENNIFER HILL Columbia University CAMBRIDGE UNIVERSITY PRESS Contents List of examples V a 9 e xv " Preface
More informationBehavioral generalization
Supplementary Figure 1 Behavioral generalization. a. Behavioral generalization curves in four Individual sessions. Shown is the conditioned response (CR, mean ± SEM), as a function of absolute (main) or
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationEarly Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.
The temporal structure of motor variability is dynamically regulated and predicts individual differences in motor learning ability Howard Wu *, Yohsuke Miyamoto *, Luis Nicolas Gonzales-Castro, Bence P.
More informationComputerized Mastery Testing
Computerized Mastery Testing With Nonequivalent Testlets Kathleen Sheehan and Charles Lewis Educational Testing Service A procedure for determining the effect of testlet nonequivalence on the operating
More informationUser Guide. Association analysis. Input
User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.
More informationAnalysis of gene expression in blood before diagnosis of ovarian cancer
Analysis of gene expression in blood before diagnosis of ovarian cancer Different statistical methods Note no. Authors SAMBA/10/16 Marit Holden and Lars Holden Date March 2016 Norsk Regnesentral Norsk
More informationPackage AbsFilterGSEA
Type Package Package AbsFilterGSEA September 21, 2017 Title Improved False Positive Control of Gene-Permuting GSEA with Absolute Filtering Version 1.5.1 Author Sora Yoon Maintainer
More informationIdentification of Tissue Independent Cancer Driver Genes
Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important
More informationChIP-seq data analysis
ChIP-seq data analysis Harri Lähdesmäki Department of Computer Science Aalto University November 24, 2017 Contents Background ChIP-seq protocol ChIP-seq data analysis Transcriptional regulation Transcriptional
More informationSupplemental material: Interference between number magnitude and parity: Discrete representation in number processing
Krajcsi, Lengyel, Laczkó: Interference between number and parity Supplemental material 1/7 Supplemental material: Interference between number magnitude and parity: Discrete representation in number processing
More information4. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2017 4. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationComputational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq
Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq Philipp Bucher Wednesday January 21, 2009 SIB graduate school course EPFL, Lausanne ChIP-seq against histone variants: Biological
More informationPSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5
PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science Homework 5 Due: 21 Dec 2016 (late homeworks penalized 10% per day) See the course web site for submission details.
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and
More information3. Model evaluation & selection
Foundations of Machine Learning CentraleSupélec Fall 2016 3. Model evaluation & selection Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr
More informationSTATISTICS & PROBABILITY
STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness
More informationIntroduction. Introduction
Introduction We are leveraging genome sequencing data from The Cancer Genome Atlas (TCGA) to more accurately define mutated and stable genes and dysregulated metabolic pathways in solid tumors. These efforts
More informationMultilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives
DOI 10.1186/s12868-015-0228-5 BMC Neuroscience RESEARCH ARTICLE Open Access Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives Emmeke
More informationSupplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE
Supplementary Materials Extracting a Cellular Hierarchy from High-dimensional Cytometry Data with SPADE Peng Qiu1,4, Erin F. Simonds2, Sean C. Bendall2, Kenneth D. Gibbs Jr.2, Robert V. Bruggner2, Michael
More informationSupplementary. properties of. network types. randomly sampled. subsets (75%
Supplementary Information Gene co-expression network analysis reveals common system-level prognostic genes across cancer types properties of Supplementary Figure 1 The robustness and overlap of prognostic
More informationSupplementary Figures
Supplementary Figures Supplementary Figure 1. Heatmap of GO terms for differentially expressed genes. The terms were hierarchically clustered using the GO term enrichment beta. Darker red, higher positive
More informationRegression Discontinuity Analysis
Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income
More informationGlobal estimation of child mortality using a Bayesian B-spline bias-reduction method
Global estimation of child mortality using a Bayesian B-spline bias-reduction method Leontine Alkema and Jin Rou New Department of Statistics and Applied Probability, National University of Singapore,
More informationWDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?
WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters
More informationfl/+ KRas;Atg5 fl/+ KRas;Atg5 fl/fl KRas;Atg5 fl/fl KRas;Atg5 Supplementary Figure 1. Gene set enrichment analyses. (a) (b)
KRas;At KRas;At KRas;At KRas;At a b Supplementary Figure 1. Gene set enrichment analyses. (a) GO gene sets (MSigDB v3. c5) enriched in KRas;Atg5 fl/+ as compared to KRas;Atg5 fl/fl tumors using gene set
More informationSPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.
SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods
More informationA novel approach to estimation of the time to biomarker threshold: Applications to HIV
A novel approach to estimation of the time to biomarker threshold: Applications to HIV Pharmaceutical Statistics, Volume 15, Issue 6, Pages 541-549, November/December 2016 PSI Journal Club 22 March 2017
More informationSupplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well
Supplementary Figure 1: High-throughput profiling of survival after exposure to - radiation. (a) Cells were plated in at least 7 wells in a 384-well plate at cell densities ranging from 25-225 cells in
More informationDeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data
DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data Ting Gong, Joseph D. Szustakowski April 30, 2018 1 Introduction Heterogeneous tissues are frequently
More informationSupplementary Figures
Supplementary Figures Supplementary Fig 1. Comparison of sub-samples on the first two principal components of genetic variation. TheBritishsampleisplottedwithredpoints.The sub-samples of the diverse sample
More informationNature Genetics: doi: /ng Supplementary Figure 1. Workflow of CDR3 sequence assembly from RNA-seq data.
Supplementary Figure 1 Workflow of CDR3 sequence assembly from RNA-seq data. Paired-end short-read RNA-seq data were mapped to human reference genome hg19, and unmapped reads in the TCR regions were extracted
More informationOUTLIER SUBJECTS PROTOCOL (art_groupoutlier)
OUTLIER SUBJECTS PROTOCOL (art_groupoutlier) Paul K. Mazaika 2/23/2009 Outlier subjects are a problem in fmri data sets for clinical populations. This protocol and program are a method to identify outlier
More informationBOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS
BOOTSTRAPPING CONFIDENCE LEVELS FOR HYPOTHESES ABOUT QUADRATIC (U-SHAPED) REGRESSION MODELS 12 June 2012 Michael Wood University of Portsmouth Business School SBS Department, Richmond Building Portland
More informationSupplementary appendix
Supplementary appendix This appendix formed part of the original submission and has been peer reviewed. We post it as supplied by the authors. Supplement to: Callegaro D, Miceli R, Bonvalot S, et al. Development
More informationObservational studies; descriptive statistics
Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation
More informationNumerous hypothesis tests were performed in this study. To reduce the false positive due to
Two alternative data-splitting Numerous hypothesis tests were performed in this study. To reduce the false positive due to multiple testing, we are not only seeking the results with extremely small p values
More informationAP Statistics. Semester One Review Part 1 Chapters 1-5
AP Statistics Semester One Review Part 1 Chapters 1-5 AP Statistics Topics Describing Data Producing Data Probability Statistical Inference Describing Data Ch 1: Describing Data: Graphically and Numerically
More informationCitation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.
University of Groningen Latent instrumental variables Ebbes, P. IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document
More informationStatistical Assessment of the Global Regulatory Role of Histone. Acetylation in Saccharomyces cerevisiae. (Support Information)
Statistical Assessment of the Global Regulatory Role of Histone Acetylation in Saccharomyces cerevisiae (Support Information) Authors: Guo-Cheng Yuan, Ping Ma, Wenxuan Zhong and Jun S. Liu Linear Relationship
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationHERMES Time and Workflow Primary Paper. Statistical Analysis Plan
HERMES Time and Workflow Primary Paper Statistical Analysis Plan I. Study Aims This is a post-hoc analysis of the pooled HERMES dataset, with the following specific aims: A) To characterize the time period
More informationBayesian integration in sensorimotor learning
Bayesian integration in sensorimotor learning Introduction Learning new motor skills Variability in sensors and task Tennis: Velocity of ball Not all are equally probable over time Increased uncertainty:
More informationRNA-seq. Differential analysis
RNA-seq Differential analysis Data transformations Count data transformations In order to test for differential expression, we operate on raw counts and use discrete distributions differential expression.
More informationOrdinal Data Modeling
Valen E. Johnson James H. Albert Ordinal Data Modeling With 73 illustrations I ". Springer Contents Preface v 1 Review of Classical and Bayesian Inference 1 1.1 Learning about a binomial proportion 1 1.1.1
More informationAbstract. Optimization strategy of Copy Number Variant calling using Multiplicom solutions APPLICATION NOTE. Introduction
Optimization strategy of Copy Number Variant calling using Multiplicom solutions Michael Vyverman, PhD; Laura Standaert, PhD and Wouter Bossuyt, PhD Abstract Copy number variations (CNVs) represent a significant
More informationSupplementary Online Content
Supplementary Online Content Basaria S, Harman SM, Travison TG, et al. The effects of testosterone administration for three years on subclinical atherosclerosis progression in older men with low or low
More informationEstimation of effect sizes in the presence of publication bias: a comparison of meta-analysis methods
Estimation of effect sizes in the presence of publication bias: a comparison of meta-analysis methods Hilde Augusteijn M.A.L.M. van Assen R. C. M. van Aert APS May 29, 2016 Today s presentation Estimation
More informationPsychology Research Process
Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:
More informationInsulin Secretion and Hepatic Extraction during Euglycemic Clamp Study: Modelling of Insulin and C-peptide data
Insulin Secretion and Hepatic Extraction during Euglycemic Clamp Study: Modelling of Insulin and C-peptide data Chantaratsamon Dansirikul Mats O Karlsson Division of Pharmacokinetics and Drug Therapy Department
More informationValidation of consistency of Mendelian sampling variance in national evaluation models
Validation of consistency of Mendelian sampling variance in national evaluation models Nordic Cattle Genetic A.-M. Tyrisevä 1, E.A. Mäntysaari 1, J. Jakobsen 2, G.P. Aamand 3, J. Dürr 2, W.F. Fikse 4 and
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationA About Facebook 2. B Data linking and controls 2. C Sampling rates 5. D Activity categories 6. E Models included in Figure 4 9
Online Interaction, Social Support, and Health William R. Hobbs, Moira K. Burke, Nicholas A. Christakis, James H. Fowler A About Facebook 2 B Data linking and controls 2 C Sampling rates 5 D Activity categories
More informationEssentials in Bioassay Design and Relative Potency Determination
BioAssay SCIENCES A Division of Thomas A. Little Consulting Essentials in Bioassay Design and Relative Potency Determination Thomas A. Little Ph.D. 2/29/2016 President/CEO BioAssay Sciences 12401 N Wildflower
More informationNEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES
NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES Amit Teller 1, David M. Steinberg 2, Lina Teper 1, Rotem Rozenblum 2, Liran Mendel 2, and Mordechai Jaeger 2 1 RAFAEL, POB 2250, Haifa, 3102102, Israel
More informationPitfalls in Linear Regression Analysis
Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression
More informationEvaluation of logistic regression models and effect of covariates for case control study in RNA-Seq analysis
Choi et al. BMC Bioinformatics (2017) 18:91 DOI 10.1186/s12859-017-1498-y METHODOLOGY ARTICLE Evaluation of logistic regression models and effect of covariates for case control study in RNA-Seq analysis
More informationBayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm
Journal of Social and Development Sciences Vol. 4, No. 4, pp. 93-97, Apr 203 (ISSN 222-52) Bayesian Logistic Regression Modelling via Markov Chain Monte Carlo Algorithm Henry De-Graft Acquah University
More informationMosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer
Supplementary Information Mosaic loss of chromosome Y in peripheral blood is associated with shorter survival and higher risk of cancer Lars A. Forsberg, Chiara Rasi, Niklas Malmqvist, Hanna Davies, Saichand
More informationMostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies
Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies Arun Advani and Tymon Sªoczy«ski 13 November 2013 Background When interested in small-sample properties of estimators,
More informationAccessing and Using ENCODE Data Dr. Peggy J. Farnham
1 William M Keck Professor of Biochemistry Keck School of Medicine University of Southern California How many human genes are encoded in our 3x10 9 bp? C. elegans (worm) 959 cells and 1x10 8 bp 20,000
More informationThe Importance of Coverage Uniformity Over On-Target Rate for Efficient Targeted NGS
WHITE PAPER The Importance of Coverage Uniformity Over On-Target Rate for Efficient Targeted NGS Yehudit Hasin-Brumshtein, Ph.D., Maria Celeste M. Ramirez, Ph.D., Leonardo Arbiza, Ph.D., Ramsey Zeitoun,
More informationSampling Weights, Model Misspecification and Informative Sampling: A Simulation Study
Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study Marianne (Marnie) Bertolet Department of Statistics Carnegie Mellon University Abstract Linear mixed-effects (LME)
More information