Quality assessment of TCGA Agilent gene expression data for ovary cancer
|
|
- Daniella James
- 6 years ago
- Views:
Transcription
1 Quality assessment of TCGA Agilent gene expression data for ovary cancer Nianxiang Zhang & Keith A. Baggerly Dept of Bioinformatics and Computational Biology MD Anderson Cancer Center Oct 1, 2010 Contents 1 Executive Summary Introduction Methods Data Statistical methods Results and Conclusions Data Analysis Data consistency across versions Level 1 to Level 2 data Level 2 to Level 3 data Load level 2/3 data Sample Labeling consistency of Level 2 and Level 3 data Effects in Level 2 and Level 3 data Appendix File Location SessionInfo List of Figures 1 Correlation of level1 and level 2 data Probes in G4502A 07 2, G4502A 07 3 platform and level 2 data Consistency of level 2 and level 3 data by correlation. Level 2 data are summarized to gene level by taking the mean of probes that belong to the same gene. Then pairwise correlation between summarized level 2 and level 3 data are calculated. Red color represents correlation coefficient> effects in Level 2 Agilent gene expression data. An average across all probes for each sample is calculated using level 2 data. The average gene expression level for samples is shown by batch
2 AgiExpQC.Rnw 2 5 effects in Level 3 Agilent gene expression data. An average across all genes for each sample is calculated using level 3 data. The average gene expression level for samples is shown by batch The top probes with batch effects in level 2 data The top probes with batch effects in level 3 data Executive Summary 1.1 Introduction We are interested in assessing the quality of TCGA data including Agilent gene expression data. We would like to examine the consistency of data in different levels. We also want to access batch effects in Level 2 and Level 3 data. 1.2 Methods Data We use MD Anderson local copy of TCGA Agilent gene expression data at //gcgserv.mdanderson.org/tcga- PUBLIC/tcga/tumor/ov/cgcc/unc.edu for QC assessment of Level 1 data. We use consolidated level2 and level3 data from TCGA data portal located at mdadqsfs02/workspace/nzhangtcgadata/ovarian/expression- Genes Statistical methods We use limma package in R to assess Level 1 data. We use the mean of expression level of probes located on the same gene to summarize Level 2 data to gene expression level. We calculate Pearson correlation coefficient to assess the consistency of Level 2 and Level 3 data. 1.3 Results and Conclusions We found the data are consistent across different versions from level 1 to level 3 by checking random samples. We found the 67 probes in level 2 data do not exist in level 1 data. We do not know how level 2 data were obtained. They are highly correlated to the Feature Extraction Software processed LogRatio data, but not directly from FE. We did not identify the mislabeling problem. We do see batch effects in the data. The average expression levels across all genes are different for samples from different batches. The situations are similar for level 2 and level 3 data. 2 Data Analysis 2.1 Data consistency across versions In order to perform QC assessment of the data, we will need to figure out the data storage structure and retrieve proper files. We define the directories for the two platforms. > datapath2 <- "//gcgserv.mdanderson.org/tcga-public/tcga/tcga-stage/anonsite/tcga/tumor/ov/cgcc/unc.edu > datapath3 <- "//gcgserv.mdanderson.org/tcga-public/tcga/tcga-stage/anonsite/tcga/tumor/ov/cgcc/unc.edu
3 AgiExpQC.Rnw 3 We write 2 functions to get directory name and filenames. > getdir <- function(dp,...) { + Dirs <- list.files(dp,...) + Dirs <- Dirs[-grep("gz", Dirs)] + return(dirs) > extract.datafilename <- function(dpath, level = 1,...) { + allfile <- list.files(dpath,...) + data.file <- allfile[grep("us", allfile)] + data.file <- sort(data.file) + name23 <- grep("level", data.file) + level1 <- data.file[0 - name23] + level2 <- data.file[grep("level2", data.file)] + level3 <- data.file[grep("level3", data.file)] + switch(level, `1` = return(level1), `2` = return(level2), + `3` = return(level3)) We check Level 1 data first. We examine the samples from different versions to make sure the data file names are the same in different versions. > temp.ver <- getdir(datapath2, full.names = T) > identical(extract.datafilename(temp.ver[1]), extract.datafilename(temp.ver[2])) > temp1 <- extract.datafilename(temp.ver[1], full.names = T) > temp2 <- extract.datafilename(temp.ver[2], full.names = T) We choose random 3 samples from the different versions; load the level 1 data which are from Feature Extraction Software. They are identical. > temp.ind <- sample(1:length(temp1), 3) > choosensamp <- extract.datafilename(temp.ver[1])[temp.ind] > RG1 <- read.maimages(files = temp1[temp.ind], source = "agilent", + other.columns = list("controltype", "LogRatio", "gprocessedsignal", + "rprocessedsignal")) > RG2 <- read.maimages(files = temp2[temp.ind], source = "agilent", + other.columns = list("controltype", "LogRatio", "gprocessedsignal", + "rprocessedsignal")) > colnames(rg1) <- colnames(rg2) <- NULL > identical(rg1$r, RG2$R) > identical(rg1$others, RG2$others) We also check if the Level 2 or 3 data are consistent across different versions. The level 2 and 3 data that we checked for the two versions are identical. > temp.ver <- getdir(datapath2, full.names = T) > identical(extract.datafilename(temp.ver[1]), extract.datafilename(temp.ver[2])) > for (level in 2:3) { + temp1 <- extract.datafilename(temp.ver[1], level = level, + full.names = T) + temp2 <- extract.datafilename(temp.ver[2], level = level,
4 AgiExpQC.Rnw 4 + full.names = T) + date1 <- date() + for (ii in 1:length(temp1)) { + templ1 <- read.table(file = temp1[ii], skip = 2, fill = T) + templ2 <- read.table(file = temp2[ii], skip = 2, fill = T) + colnames(templ1) <- colnames(templ2) <- NULL + if (!identical(templ1, templ2)) + cat(paste(temp1[ii], "is different from \n", temp2[ii], + "\n")) + else cat("ok \n") 2.2 Level 1 to Level 2 data We do not know how level 2 data were obtained from Level 1 data. We check the mean of processed LogRatio data in level 1 and see correlation to level 2 data. The correlation coefficient of the logratio mean and level 2 data is (Figure 1). > NC1 <- RG1$genes[RG1$other$ControlType[, 1] == 0, ] > length(unique(nc1$probename)) > table(table(nc1$probename)) > temp.ver <- getdir(datapath2, full.name = T) > temp.level1.file <- extract.datafilename(temp.ver[1], level = 1, + full.names = T) > temp.rg <- read.maimages(files = temp.level1.file[1], source = "agilent", + other.columns = list("controltype", "LogRatio", "gprocessedsignal", + "rprocessedsignal")) > temp.level2.file <- extract.datafilename(temp.ver[1], level = 2, + full.names = T) > chosensamp <- extract.datafilename(temp.ver[1])[1] > temp <- substr(chosensamp, 1, 30) > chosen.level2.file <- unlist(lapply(temp, function(x) temp.level2.file[grep(x, + temp.level2.file)])) > temp <- read.table(chosen.level2.file[1], header = F, skip = 2, + fill = T) > matchlevel1data <- temp.rg[match(temp[, 1], temp.rg$genes$probename), + ] > replevel1data <- temp.rg[duplicated(temp.rg$genes$probename), + ] > replevel1data <- temp.rg[match(temp[, 1], replevel1data$genes$probename), + ] > identical(matchlevel1data$others$logratio, temp[, 2]) > temp.lrmean <- tapply(temp.rg$other$logratio[, 1], INDEX = temp.rg$genes$probename, + mean, na.rm = T) > temp.lrmean.match <- temp.lrmean[match(as.vector(temp[, 1]), + names(table(temp.rg$genes$probename)))]
5 AgiExpQC.Rnw 5 Level Mean LogRatio Level1 Figure 1: Correlation of level1 and level 2 data. > pdf("level1level2corr.pdf") > plot(temp.lrmean.match, temp[, 2], xlab = "Mean LogRatio Level1", + ylab = "Level 2", pch = ".") > invisible(dev.off()) > cor(temp.lrmean.match, temp[, 2], use = "pairwise.complete.obs") We check level 1 data in another platform AgilentG4502A > temp.ver <- getdir(datapath3, full.names = T) > identical(extract.datafilename(temp.ver[2]), extract.datafilename(temp.ver[3])) > temp3 <- extract.datafilename(temp.ver[2], full.names = T) > temp4 <- extract.datafilename(temp.ver[3], full.names = T) We choose a sample from AgilentG4502A 07 3 platform, load the level 1 data which are from Feature Extraction Software. The level 1 data from the different versions are consistent.
6 AgiExpQC.Rnw 6 > sampleid <- substr(extract.datafilename(temp.ver[2])[1], 1, 30) > RG3 <- read.maimages(files = temp3[grep(sampleid, temp3)], source = "agilent", + other.columns = list("controltype", "LogRatio", "gprocessedsignal", + "rprocessedsignal")) > RG4 <- read.maimages(files = temp4[grep(sampleid, temp4)], source = "agilent", + other.columns = list("controltype", "LogRatio", "gprocessedsignal", + "rprocessedsignal")) > colnames(rg3) <- colnames(rg4) <- NULL > identical(rg3$r, RG4$R) > identical(rg3$others, RG4$others) We get rid of the control probes and find out the number of probes. > NC3 <- RG3$genes[RG3$other$ControlType[, 1] == 0, ] > length(unique(nc3$probename)) The results show that the level 1 data are consistent across versions for both platform. Since the two platforms have different probesets. We compare the probes among the two platform level 1 data and Level 2 data. > identical(level2data2[, 1], Level2data3[, 1]) The level2 data for different platforms have the same set of probes. However, there are 67 probes in level2 data are not in level 3 (Figure 2). > allprobe <- unlist(unique(c(nc1$probename, NC3$ProbeName, as.vector(level2data2[, + 1])))) > temp.venn <- matrix(0, length(allprobe), 3) > colnames(temp.venn) <- c("g4502a_07_2", "G4502A_07_3", "Level2_2") > temp.venn[, 1] <- allprobe %in% NC1$ProbeName > temp.venn[, 2] <- allprobe %in% NC3$ProbeName > temp.venn[, 3] <- allprobe %in% Level2data2[, 1] > pdf("probevenn.pdf") > venndiagram(temp.venn, circle.col = c("red", "blue", "green"), + lwd = 3) > dev.off() 2.3 Level 2 to Level 3 data Load level 2/3 data Now we use the consolidated level 2 and level 3 data we just downloaded to do further analysis. We convert the Agilent level 2/3 data into matrix form. > datadir <- c("../../../expression-genes/unc AgilentG4502A_07_2", + "../../../Expression-Genes/UNC AgilentG4502A_07_3") > if (exists("level2data")) rm(level2data) > Agifile <- paste(datadir, "/Level_2/", c("unc.edu AgilentG4502A_07_2 log2_lowess_normalized.txt", + "unc.edu AgilentG4502A_07_3 log2_lowess_normalized.txt"), + sep = "") > temp.data <- NULL
7 AgiExpQC.Rnw 7 G4502A_07_2 G4502A_07_ Level2_2 0 Figure 2: Probes in G4502A 07 2, G4502A 07 3 platform and level 2 data.
8 AgiExpQC.Rnw 8 > for (j in 1:length(Agifile)) { + s.name <- read.delim(file = Agifile[j], sep = "\t", header = F, + nrow = 1, stringsasfactors = F, row.names = 1) + temp.raw <- read.delim(file = Agifile[j], sep = "\t", header = F, + skip = 2, stringsasfactors = F, row.names = 1) + colnames(temp.raw) <- t(s.name) + if (!exists("level2data")) + Level2data <- temp.raw + else { + stopifnot(identical(rownames(level2data), rownames(temp.raw))) + Level2data <- cbind(level2data, temp.raw) > rm(agifile) > rm(list = ls(pattern = "temp")) > for (i in 1:ncol(Level2data)) Level2data[, i] <- as.numeric(level2data[, + i]) > save(level2data, file = file.path("rdataobjects", "AgilentOVLevel2Data.Rda")) > Agifile <- paste(datadir, "/Level_3/", c("unc.edu AgilentG4502A_07_2 gene_expression_analysis_1.txt" + "unc.edu AgilentG4502A_07_3 gene_expression_analysis_1.txt"), + sep = "") > if (exists("level3data")) rm(level3data) > for (j in 1:length(Agifile)) { + temp.raw <- read.delim(file = Agifile[j], sep = "\t", header = T, + stringsasfactors = F) + temp <- matrix(as.numeric(temp.raw[, 3]), ncol = length(table(temp.raw[, + 1])), nrow = length(table(temp.raw[, 2])), dimnames = list(unique(temp.raw[, + 2]), unique(temp.raw[, 1]))) + temp.raw[is.na(as.numeric(temp.raw[, 3])), ] + if (!exists("level3data")) + Level3data <- temp + else { + stopifnot(identical(rownames(level3data), rownames(temp))) + Level3data <- cbind(level3data, temp) > rm(agifile) > rm(list = ls(pattern = "temp")) > save(level3data, file = file.path("rdataobjects", "AgilentOVLevel3Data.Rda")) We make sure the Level 2 and Level 3 data cover the same set of samples. > all(colnames(level2data) %in% colnames(level3data)) > all(colnames(level3data) %in% colnames(level2data)) We reorder the columns of Level3 data so that level 2 and level 3 data have the same sample order. > Level3data <- Level3data[, colnames(level2data)] We get the exclusion inclusion sample list, and retain only the included samples. We also remove one cell line sample without batch assignment.
9 AgiExpQC.Rnw 9 > source("~/project/weinsteintcga062509/tcgafunctions.r") > level.si <- getsi(level2data, batchpath = "../../Effect") > require(gdata) > inex <- read.xls(xls = "/workspace/nzhangtcgadata/ovarian/analysis/tcga_ovarianuseandexcludelist.xls", + sheet = 1) > temp.sample.in <- as.vector(inex$sample.id[inex$include.exclude == + "Include"]) > keep.ind <-!is.na(level.si$batch) & paste("tcga", level.si$siteid, + level.si$patientid, sep = "-") %in% temp.sample.in > final.si <- level.si[keep.ind, ] > Level2data <- Level2data[, keep.ind] > Level3data <- Level3data[, keep.ind] > save(list = c("level3data", "Level2data", "final.si"), file = file.path("rdataobjects", + "AgilentOVData.Rda")) Sample Labeling consistency of Level 2 and Level 3 data We do not have the annotation file for the customized array, we use HGUG4112a instead, which covers some of the probes. Actually, probes are mapped. We only keep the level 2 data that are mapped genes in Level3 data are in this annotation. We only keep the probes that can be mapped to the genes. > require(hgug4112a.db) > symbol <- unlist(mget(as.vector(rownames(level2data)), env = hgug4112asymbol, + ifnotfound = NA)) > sum(!is.na(symbol)) > sum(rownames(level3data) %in% symbol) > mappedgene <- intersect(rownames(level3data), symbol) > Level2map <- Level2data[symbol %in% mappedgene, order(final.si$batch)] > symbolmap <- symbol[symbol %in% mappedgene] > Level3map <- Level3data[rownames(Level3data) %in% mappedgene, + order(final.si$batch)] > save(list = c("level3map", "Level2map", "final.si", "symbolmap"), + file = file.path("rdataobjects", "AgilentOVDataMapped.Rda")) Now, we just take the mean of probes to summarize level 2 data. > Level2Sum <- apply(as.matrix(level2map), 2, function(x) tapply(x, + INDEX = symbolmap, mean, na.rm = T)) > Level2Sum <- Level2Sum[rownames(Level3map), ] Now, we calculate the correlation of the 2 data set. We expect the summarized level 2 data should have high correlation to the level 3 data of the same sample. We set threshold of 0.9 to show the pairwise correlations (Figure 3). There is no mislabeling found since all the high correlations appear on the diagonal line. > level23cor <- matrix(0, ncol(level3map), ncol(level3map)) > for (i in 1:ncol(Level3map)) { + for (j in 1:ncol(Level3map)) { + level23cor[i, j] <- cor(level3map[, i], Level2Sum[, j], + use = "pairwise.complete.obs")
10 AgiExpQC.Rnw 10 > pdf("corrlevel2and3.pdf") > heatmap((level23cor > 0.9) + 0, Colv = NA, Rowv = NA, xlab = "Level2 Average", + ylab = "Level 3", col = c("grey", "red")) > dev.off() 2.4 Effects in Level 2 and Level 3 data We assess the batch effects in level 2 and level 3 data. We calculate the mean across all probes/genes for level 2 and level 3 data. The plots of the cross-gene mean are shown in Figure 4 and 5. > genemeanlevel2 <- apply(as.matrix(level2map), 2, mean, na.rm = T) > genemeanlevel3 <- apply(as.matrix(level3map), 2, mean, na.rm = T) > pdf("level2.pdf") > temp <- boxplot(genemeanlevel2 ~ final.si$batch, xlab = "", + main = "Agilent Level2 data", cex = 0.7) > points(y = genemeanlevel3, x = jitter(rep(1:13, temp$n)), cex = 0.7) > abline(v = 0: , col = "brown") > dev.off() > pdf("level3.pdf") > temp <- boxplot(genemeanlevel3 ~ final.si$batch, xlab = "", + main = "Agilent Level3 data", cex = 0.7) > points(y = genemeanlevel3, x = jitter(rep(1:13, temp$n)), cex = 0.7) > abline(v = 0: , col = "brown") > dev.off() We pick some extreme genes to see how bad it would be. The top probes differentially expressed by batch are shown in Figure 6 and 7. > res <- MultiLinearModel(Y ~ batch, clindata = final.si[, ], arraydata = Level2map) > mad2 <- apply(level2map, 1, mad, na.rm = T) > top6 <- mad2[order(res@p.values, -mad2)][1:6] > pdf("level2batcheffecttop.pdf", height = 8, pointsize = 9) > par(mfrow = c(3, 2)) > for (i in 1:6) { + genedata <- t(level2map[names(top6)[i], ]) + temp <- boxplot(genedata ~ final.si$batch, xlab = "", + main = names(top6)[i], cex = 0.7) + points(y = genedata, x = jitter(rep(1:13, temp$n)), cex = 0.7) + abline(v = 0: , col = "brown") > dev.off() > res3 <- MultiLinearModel(Y ~ batch, clindata = final.si[, ], + arraydata = Level3map) > mad3 <- apply(level3map, 1, mad, na.rm = T) > top6.3 <- mad3[order(res3@p.values, -mad3)][1:6] > pdf("level3batcheffecttop.pdf", height = 8, pointsize = 9) > par(mfrow = c(3, 2))
11 AgiExpQC.Rnw Level2 Average Level 3 Figure 3: Consistency of level 2 and level 3 data by correlation. Level 2 data are summarized to gene level by taking the mean of probes that belong to the same gene. Then pairwise correlation between summarized level 2 and level 3 data are calculated. Red color represents correlation coefficient>0.9.
12 AgiExpQC.Rnw Agilent Level2 data Figure 4: effects in Level 2 Agilent gene expression data. An average across all probes for each sample is calculated using level 2 data. The average gene expression level for samples is shown by batch.
13 AgiExpQC.Rnw Agilent Level3 data Figure 5: effects in Level 3 Agilent gene expression data. An average across all genes for each sample is calculated using level 3 data. The average gene expression level for samples is shown by batch.
14 AgiExpQC.Rnw 14 > for (i in 1:6) { + genedata <- Level3map[names(top6.3)[i], ] + temp <- boxplot(genedata ~ factor(final.si$batch), xlab = "", + main = names(top6.3)[i], cex = 0.7) + points(y = genedata, x = jitter(rep(1:13, temp$n)), cex = 0.7) + abline(v = 0: , col = "brown") > dev.off() 3 Appendix 3.1 File Location > getwd() [1] "/workspace/nzhangtcgadata/ovarian/analysis/baggerlyqc/agilentqc" 3.2 SessionInfo > sessioninfo() R version ( ) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=en_US LC_MONETARY=C LC_MESSAGES=en_US [7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grdevices utils datasets methods [8] base other attached packages: [1] ClassComparison_ Biobase_2.6.1 PreProcess_ [4] oompabase_ limma_3.2.3
15 AgiExpQC.Rnw A_23_P A_23_P A_23_P A_23_P A_23_P A_23_P62741 Figure 6: The top probes with batch effects in level 2 data.
16 AgiExpQC.Rnw EHF F13A IGSF FGL SLC6A HIGD1B Figure 7: The top probes with batch effects in level 3 data.
MethylMix An R package for identifying DNA methylation driven genes
MethylMix An R package for identifying DNA methylation driven genes Olivier Gevaert May 3, 2016 Stanford Center for Biomedical Informatics Department of Medicine 1265 Welch Road Stanford CA, 94305-5479
More informationChecking the Clinical Information for Docetaxel
Checking the Clinical Information for Docetaxel Keith A. Baggerly and Kevin R. Coombes November 13, 2007 1 Introduction In their reply to our correspondence, Potti and Nevins note that there is now more
More informationIntroduction to antiprofiles
Introduction to antiprofiles Héctor Corrada Bravo hcorrada@gmail.com Modified: March 13, 2013. Compiled: April 30, 2018 Introduction This package implements the gene expression anti-profiles method in
More informationMatching the Cisplatin Heatmap
Matching the Cisplatin Heatmap Keith A. Baggerly September 24, 2009 Contents 1 Executive Summary 1 1.1 Introduction.............................................. 1 1.2 Methods................................................
More informationBackcalculating HIV incidence and predicting AIDS in Australia, Cambodia and Vietnam. Australia
Backcalculating HIV incidence and predicting AIDS in Australia, Cambodia and Vietnam The aim of today s practical is to give you some hands-on experience with a nonparametric method for backcalculating
More informationmetaseq: Meta-analysis of RNA-seq count data
metaseq: Meta-analysis of RNA-seq count data Koki Tsuyuzaki 1, and Itoshi Nikaido 2. October 30, 2017 1 Department of Medical and Life Science, Tokyo University of Science. 2 Bioinformatics Research Unit,
More informationGeneOverlap: An R package to test and visualize
GeneOverlap: An R package to test and visualize gene overlaps Li Shen Contact: li.shen@mssm.edu or shenli.sam@gmail.com Icahn School of Medicine at Mount Sinai New York, New York http://shenlab-sinai.github.io/shenlab-sinai/
More informationUsing Messina. Mark Pinese. October 13, Introduction The problem Example: Designing a colon cancer screening test...
Using Messina Mark Pinese October 13, 2014 Contents 1 Introduction 1 2 Using Messina to construct optimal diagnostic classifiers 1 2.1 The problem.................................................. 1 2.2
More informationHour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1
Agenda for Week 3, Hr 1 (Tuesday, Jan 19) Hour 1: - Installing R and inputting data. - Different tools for R: Notepad++ and RStudio. - Basic commands:?,??, mean(), sd(), t.test(), lm(), plot() - t.test()
More informationAIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype
AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype Eric R. Paquet (eric.r.paquet@gmail.com), Michael T. Hallett (michael.t.hallett@mcgill.ca) 1 1 Department of Biochemistry, Breast
More informationThe LiquidAssociation Package
The LiquidAssociation Package Yen-Yi Ho October 30, 2018 1 Introduction The LiquidAssociation package provides analytical methods to study three-way interactions. It incorporates methods to examine a particular
More informationPackage cancer. July 10, 2018
Type Package Package cancer July 10, 2018 Title A Graphical User Interface for accessing and modeling the Cancer Genomics Data of MSKCC. Version 1.14.0 Date 2018-04-16 Author Karim Mezhoud. Nuclear Safety
More informationsplicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data
splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data Kristoffer Knudsen, Johannes Waage 5 Dec 2013 1 Contents 1 Introduction 3 1.1 Alternative
More informationPackage citccmst. February 19, 2015
Version 1.0.2 Date 2014-01-07 Package citccmst February 19, 2015 Title CIT Colon Cancer Molecular SubTypes Prediction Description This package implements the approach to assign tumor gene expression dataset
More informationPackage CLL. April 19, 2018
Type Package Title A Package for CLL Gene Expression Data Version 1.19.0 Author Elizabeth Whalen Package CLL April 19, 2018 Maintainer Robert Gentleman The CLL package contains the
More informationR/Bioconductor Center for Genomic Sciences Universidad Nacional Autónoma de México
R/Bioconductor Center for Genomic Sciences Universidad Nacional Autónoma de México Daniela Azucena García Soriano, dgarcia@lcg.unam.mx Yuvia Alhelí Pérez Rico, yperez@lcg.unam.mx October 23, 2009 Abstract
More informationNature Methods: doi: /nmeth.3115
Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by
More informationWhite Rose Research Online URL for this paper: Version: Supplemental Material
This is a repository copy of How well can body size represent effects of the environment on demographic rates? Disentangling correlated explanatory variables. White Rose Research Online URL for this paper:
More informationCNV PCA Search Tutorial
CNV PCA Search Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Data Preparation 2 A. Join Log Ratio Data with Phenotype Information.............................. 2 B. Activate only
More informationPackage flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version
Package flowtype July 18, 2013 Type Package Title Phenotyping Flow Cytometry Assays Version 1.6.0 Date 2011-04-27 Author Nima Aghaeepour Maintainer Nima Aghaeepour Phenotyping Flow
More informationQPM Lab 9: Contingency Tables and Bivariate Displays in R
QPM Lab 9: Contingency Tables and Bivariate Displays in R Department of Political Science Washington University, St. Louis November 3-4, 2016 QPM Lab 9: Contingency Tables and Bivariate Displays in R 1
More informationChecking Drug Sensitivity of Cell Lines Used in Signatures
Checking Drug Sensitivity of Used in Signatures Keith A. Baggerly Contents 1 Executive Summary 1 1.1 Introduction.............................................. 1 1.2 Methods................................................
More informationPackage diggitdata. April 11, 2019
Type Package Title Example data for the diggit package Version 1.14.0 Date 2014-08-29 Author Mariano Javier Alvarez Package diggitdata April 11, 2019 Maintainer Mariano Javier Alvarez
More informationInfer mirna-mrna interactions using paired expression data from a single sample
Infer mirna-mrna interactions using paired expression data from a single sample Yue Li yueli@cs.toronto.edu October 0, 0 Introduction MicroRNAs (mirnas) are small ( nucleotides) RNA molecules that base-pair
More informationPackage MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based
Package MSstatsTMT February 26, 2019 Title Protein Significance Analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling Version 1.1.2 Date 2019-02-25 Tools
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More information5 To Invest or not to Invest? That is the Question.
5 To Invest or not to Invest? That is the Question. Before starting this lab, you should be familiar with these terms: response y (or dependent) and explanatory x (or independent) variables; slope and
More informationHow To Use SubpathwayGMir
How To Use SubpathwayGMir Li Feng, Chunquan Li and Xia Li May 20, 2015 Contents 1 Overview 1 2 The experimentally verified mirna-target interactions 2 3 Reconstruct KEGG metabolic pathways 2 3.1 Embed
More informationUser Guide. Association analysis. Input
User Guide TFEA.ChIP is a tool to estimate transcription factor enrichment in a set of differentially expressed genes using data from ChIP-Seq experiments performed in different tissues and conditions.
More information# Assessment of gene expression levels between several cell group types is a common application of the unsupervised technique.
# Aleksey Morozov # Microarray Data Analysis Using Hierarchical Clustering. # The "unsupervised learning" approach deals with data that has the features X1,X2...Xp, but does not have an associated response
More informationPackage MethPed. September 1, 2018
Type Package Version 1.8.0 Date 2016-01-01 Package MethPed September 1, 2018 Title A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes Depends R (>= 3.0.0), Biobase
More informationTo open a CMA file > Download and Save file Start CMA Open file from within CMA
Example name Effect size Analysis type Level Tamiflu Hospitalized Risk ratio Basic Basic Synopsis The US government has spent 1.4 billion dollars to stockpile Tamiflu, in anticipation of a possible flu
More informationMS/MS Library Creation of Q-TOF LC/MS Data for MassHunter PCDL Manager
MS/MS Library Creation of Q-TOF LC/MS Data for MassHunter PCDL Manager Quick Start Guide Step 1. Calibrate the Q-TOF LC/MS for low m/z ratios 2 Step 2. Set up a Flow Injection Analysis (FIA) method for
More informationData Input/Output. Introduction to R for Public Health Researchers
Data Input/Output Introduction to R for Public Health Researchers Common new user mistakes we have seen 1. Working directory problems: trying to read files that R can t find RStudio can help, and so do
More informationUsing the DART package: Denoising Algorithm based on Relevance network Topology
Using the DART package: Denoising Algorithm based on Relevance network Topology Katherine Lawler, Yan Jiao, Andrew E Teschendorff, Charles Shijie Zheng October 30, 2018 Contents 1 Introduction 1 2 Load
More informationHow To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2
How To Use MiRSEA Junwei Han July 1, 2015 Contents 1 Overview 1 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2 3 Discovering the dysregulated pathways(or prior gene sets) based
More informationVega: Variational Segmentation for Copy Number Detection
Vega: Variational Segmentation for Copy Number Detection Sandro Morganella Luigi Cerulo Giuseppe Viglietto Michele Ceccarelli Contents 1 Overview 1 2 Installation 1 3 Vega.RData Description 2 4 Run Vega
More informationS1 Appendix: Figs A G and Table A. b Normal Generalized Fraction 0.075
Aiello & Alter (216) PLoS One vol. 11 no. 1 e164546 S1 Appendix A-1 S1 Appendix: Figs A G and Table A a Tumor Generalized Fraction b Normal Generalized Fraction.25.5.75.25.5.75 1 53 4 59 2 58 8 57 3 48
More informationSubLasso:a feature selection and classification R package with a. fixed feature subset
SubLasso:a feature selection and classification R package with a fixed feature subset Youxi Luo,3,*, Qinghan Meng,2,*, Ruiquan Ge,2, Guoqin Mai, Jikui Liu, Fengfeng Zhou,#. Shenzhen Institutes of Advanced
More informationLAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*
LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,
More informationGene-microRNA network module analysis for ovarian cancer
Gene-microRNA network module analysis for ovarian cancer Shuqin Zhang School of Mathematical Sciences Fudan University Oct. 4, 2016 Outline Introduction Materials and Methods Results Conclusions Introduction
More informationOECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements
OECD QSAR Toolbox v.4.2 An example illustrating RAAF scenario 6 and related assessment elements Outlook Background Objectives Specific Aims Read Across Assessment Framework (RAAF) The exercise Workflow
More informationContent. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes
Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General
More informationR documentation. of GSCA/man/GSCA-package.Rd etc. June 8, GSCA-package. LungCancer metadi... 3 plotmnw... 5 plotnw... 6 singledc...
R topics documented: R documentation of GSCA/man/GSCA-package.Rd etc. June 8, 2009 GSCA-package....................................... 1 LungCancer3........................................ 2 metadi...........................................
More informationVariant Classification. Author: Mike Thiesen, Golden Helix, Inc.
Variant Classification Author: Mike Thiesen, Golden Helix, Inc. Overview Sequencing pipelines are able to identify rare variants not found in catalogs such as dbsnp. As a result, variants in these datasets
More informationEstimation of the Area-Under-the-Curve of Mycophenolic Acid using population pharmacokinetic and multi-linear regression models simultaneously.
Estimation of the Area-Under-the-Curve of Mycophenolic Acid using population pharmacokinetic and multi-linear regression models simultaneously. Michał J. Figurski & Leslie M. Shaw Biomarker Research Laboratory
More informationStat 13, Lab 11-12, Correlation and Regression Analysis
Stat 13, Lab 11-12, Correlation and Regression Analysis Part I: Before Class Objective: This lab will give you practice exploring the relationship between two variables by using correlation, linear regression
More informationCancer Informatics Lecture
Cancer Informatics Lecture Mayo-UIUC Computational Genomics Course June 22, 2018 Krishna Rani Kalari Ph.D. Associate Professor 2017 MFMER 3702274-1 Outline The Cancer Genome Atlas (TCGA) Genomic Data Commons
More informationModule 3: Pathway and Drug Development
Module 3: Pathway and Drug Development Table of Contents 1.1 Getting Started... 6 1.2 Identifying a Dasatinib sensitive cancer signature... 7 1.2.1 Identifying and validating a Dasatinib Signature... 7
More informationBackground Information. Instructions. Problem Statement. HOMEWORK INSTRUCTIONS Homework #2 HIV Statistics Problem
Background Information HOMEWORK INSTRUCTIONS The scourge of HIV/AIDS has had an extraordinary impact on the entire world. The spread of the disease has been closely tracked since the discovery of the HIV
More informationOn the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles
On the Reproducibility of TCGA Ovarian Cancer MicroRNA Profiles Ying-Wooi Wan 1,2,4, Claire M. Mach 2,3, Genevera I. Allen 1,7,8, Matthew L. Anderson 2,4,5 *, Zhandong Liu 1,5,6,7 * 1 Departments of Pediatrics
More informationSUPPLEMENTARY FIGURES: Supplementary Figure 1
SUPPLEMENTARY FIGURES: Supplementary Figure 1 Supplementary Figure 1. Glioblastoma 5hmC quantified by paired BS and oxbs treated DNA hybridized to Infinium DNA methylation arrays. Workflow depicts analytic
More informationPackage DeconRNASeq. November 19, 2017
Type Package Package DeconRNASeq November 19, 2017 Title Deconvolution of Heterogeneous Tissue Samples for mrna-seq data Version 1.20.0 Date 2013-01-22 Author Ting Gong Joseph D. Szustakowski
More informationData Exploration and Visualization
Data Exploration and Visualization Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul
More informationTCGA. The Cancer Genome Atlas
TCGA The Cancer Genome Atlas TCGA: History and Goal History: Started in 2005 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) with $110 Million to catalogue
More informationPackage xseq. R topics documented: September 11, 2015
Package xseq September 11, 2015 Title Assessing Functional Impact on Gene Expression of Mutations in Cancer Version 0.2.1 Date 2015-08-25 Author Jiarui Ding, Sohrab Shah Maintainer Jiarui Ding
More informationPackage leukemiaseset
Package leukemiaseset August 14, 2018 Type Package Title Leukemia's microarray gene expression data (expressionset). Version 1.16.0 Date 2013-03-20 Author Sara Aibar, Celia Fontanillo and Javier De Las
More informationNERVE ACTION POTENTIAL SIMULATION version 2013 John Cornell
NERVE ACTION POTENTIAL SIMULATION version 2013 John Cornell http://www.jccornell.net In 1963 Alan Hodgkin and Andrew Huxley received the Nobel Prize in Physiology and Medicine for their work on the mechanism
More informationAssignment 5: Integrative epigenomics analysis
Assignment 5: Integrative epigenomics analysis Due date: Friday, 2/24 10am. Note: no late assignments will be accepted. Introduction CpG islands (CGIs) are important regulatory regions in the genome. What
More informationCHAPTER 1 COMMUNITY PHARMACY M.ASHOKKUMAR DEPT OF PHARMACY PRACTICE SRM COLLEGE OF PHARMACY SRM UNIVERSITY
CHAPTER 1 COMMUNITY PHARMACY M.ASHOKKUMAR DEPT OF PHARMACY PRACTICE SRM COLLEGE OF PHARMACY SRM UNIVERSITY COMMUNITY PHARMACY OPERATIONS Technician Duties Related to Dispensing Over-the-Counter Drugs and
More informationPackage cssam. February 19, 2015
Type Package Package cssam February 19, 2015 Title cssam - cell-specific Significance Analysis of Microarrays Version 1.2.4 Date 2011-10-08 Author Shai Shen-Orr, Rob Tibshirani, Narasimhan Balasubramanian,
More informationGridMAT-MD: A Grid-based Membrane Analysis Tool for use with Molecular Dynamics
GridMAT-MD: A Grid-based Membrane Analysis Tool for use with Molecular Dynamics William J. Allen, Justin A. Lemkul, and David R. Bevan Department of Biochemistry, Virginia Tech User s Guide Version 1.0.2
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationMicro-RNA web tools. Introduction. UBio Training Courses. mirnas, target prediction, biology. Gonzalo
Micro-RNA web tools UBio Training Courses Gonzalo Gómez//ggomez@cnio.es Introduction mirnas, target prediction, biology Experimental data Network Filtering Pathway interpretation mirs-pathways network
More informationdataset1 <- read.delim("c:\dca_example_dataset1.txt", header = TRUE, sep = "\t") attach(dataset1)
Worked examples of decision curve analysis using R A note about R versions The R script files to implement decision curve analysis were developed using R version 2.3.1, and were tested last using R version
More informationPackage Actigraphy. R topics documented: January 15, Type Package Title Actigraphy Data Analysis Version 1.3.
Type Package Title Actigraphy Data Analysis Version 1.3.2 Date 2016-01-14 Package Actigraphy January 15, 2016 Author William Shannon, Tao Li, Hong Xian, Jia Wang, Elena Deych, Carlos Gonzalez Maintainer
More informationSupervised analysis of MS images using Cardinal
Supervised analsis of MS images using Cardinal Klie A. Bemis and April Harr November, 28 Contents Introduction.............................. 2 Analsis of a renal cell carcinoma (RCC) dataset.... 2 2. Pre-processing..........................
More informationExtracting progression models for TCGA MSI/MSS colorectal tumors from the COADREAD project with the TRONCO package
Extracting progression models for TCGA MSI/MSS colorectal tumors from the COADREAD project with the TRONCO package Giulio Caravagna, Luca De Sano, Daniele Ramazzotti, Alex Graudenzi, Giancarlo Mauri, Marco
More informationSupplementary Data. Correlation analysis. Importance of normalizing indices before applying SPCA
Supplementary Data Correlation analysis The correlation matrix R of the m = 25 GV indices calculated for each dataset is reported below (Tables S1 S3). R is an m m symmetric matrix, whose entries r ij
More informationAnalysis of gene expression in blood before diagnosis of ovarian cancer
Analysis of gene expression in blood before diagnosis of ovarian cancer Different statistical methods Note no. Authors SAMBA/10/16 Marit Holden and Lars Holden Date March 2016 Norsk Regnesentral Norsk
More informationPackage AIMS. June 29, 2018
Type Package Package AIMS June 29, 2018 Title AIMS : Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype Version 1.12.0 Date 2014-06-25 Description This package contains the AIMS implementation.
More informationMyWindFit Member Analytics Portal
Member Analytics Portal The member section of is a private and personalized cloud database that contains the detailed records of all your activity. It also contains an analysis package that enables you
More informationHands-On Ten The BRCA1 Gene and Protein
Hands-On Ten The BRCA1 Gene and Protein Objective: To review transcription, translation, reading frames, mutations, and reading files from GenBank, and to review some of the bioinformatics tools, such
More informationName: Date: Period: Human Traits Genetics Activity
Name: Date: Period: Human Traits Genetics Activity The following are considered by many to be single-gene traits, which mean that there are two alleles (versions of a gene) for a trait. It is important
More informationCerebral Cortex. Edmund T. Rolls. Principles of Operation. Presubiculum. Subiculum F S D. Neocortex. PHG & Perirhinal. CA1 Fornix CA3 S D
Cerebral Cortex Principles of Operation Edmund T. Rolls F S D Neocortex S D PHG & Perirhinal 2 3 5 pp Ento rhinal DG Subiculum Presubiculum mf CA3 CA1 Fornix Appendix 4 Simulation software for neuronal
More informationWeighted Gene Co-expression Network Analysis (WGCNA) R Tutorial, Part C Summary The data and biological implications are described in
Weighted Gene Co-expression Network Analysis (WGCNA) R Tutorial, Part C Breast Cancer Microarray Data. Steve Horvath, Paul Mischel Correspondence: shorvath@mednet.ucla.edu, http://www.ph.ucla.edu/biostat/people/horvath.htm
More informationData mining with Ensembl Biomart. Stéphanie Le Gras
Data mining with Ensembl Biomart Stéphanie Le Gras (slegras@igbmc.fr) Guidelines Genome data Genome browsers Getting access to genomic data: Ensembl/BioMart 2 Genome Sequencing Example: Human genome 2000:
More informationPackage ega. March 21, 2017
Title Error Grid Analysis Version 2.0.0 Package ega March 21, 2017 Maintainer Daniel Schmolze Functions for assigning Clarke or Parkes (Consensus) error grid zones to blood glucose values,
More informationNew Enhancements: GWAS Workflows with SVS
New Enhancements: GWAS Workflows with SVS August 9 th, 2017 Gabe Rudy VP Product & Engineering 20 most promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences
More information[ APPLICATION NOTE ] High Sensitivity Intact Monoclonal Antibody (mab) HRMS Quantification APPLICATION BENEFITS INTRODUCTION WATERS SOLUTIONS KEYWORDS
Yun Wang Alelyunas, Henry Shion, Mark Wrona Waters Corporation, Milford, MA, USA APPLICATION BENEFITS mab LC-MS method which enables users to achieve highly sensitive bioanalysis of intact trastuzumab
More informationIMPaLA tutorial.
IMPaLA tutorial http://impala.molgen.mpg.de/ 1. Introduction IMPaLA is a web tool, developed for integrated pathway analysis of metabolomics data alongside gene expression or protein abundance data. It
More informationUnit 1 Outline Science Practices. Part 1 - The Scientific Method. Screencasts found at: sciencepeek.com. 1. List the steps of the scientific method.
Screencasts found at: sciencepeek.com Part 1 - The Scientific Method 1. List the steps of the scientific method. 2. What is an observation? Give an example. Quantitative or Qualitative Data? 35 grams?
More informationBenchmark Dose Modeling Cancer Models. Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S.
Benchmark Dose Modeling Cancer Models Allen Davis, MSPH Jeff Gift, Ph.D. Jay Zhao, Ph.D. National Center for Environmental Assessment, U.S. EPA Disclaimer The views expressed in this presentation are those
More informationGene Expression Analysis Web Forum. Jonathan Gerstenhaber Field Application Specialist
Gene Expression Analysis Web Forum Jonathan Gerstenhaber Field Application Specialist Our plan today: Import Preliminary Analysis Statistical Analysis Additional Analysis Downstream Analysis 2 Copyright
More informationLesson 3 Profex Graphical User Interface for BGMN and Fullprof
Lesson 3 Profex Graphical User Interface for BGMN and Fullprof Nicola Döbelin RMS Foundation, Bettlach, Switzerland March 01 02, 2016, Freiberg, Germany Background Information Developer: License: Founded
More informationNature Medicine: doi: /nm.3967
Supplementary Figure 1. Network clustering. (a) Clustering performance as a function of inflation factor. The grey curve shows the median weighted Silhouette widths for varying inflation factors (f [1.6,
More informationOne-Way Independent ANOVA
One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.
More informationVIEW AS Fit Page! PRESS PgDn to advance slides!
VIEW AS Fit Page! PRESS PgDn to advance slides! UNDERSTAND REALIZE CHANGE WHY??? CHANGE THE PROCESSES OF YOUR BUSINESS CONNECTING the DOTS Customer Focus (W s) Customer Focused Metrics Customer Focused
More informationBioinformatics Laboratory Exercise
Bioinformatics Laboratory Exercise Biology is in the midst of the genomics revolution, the application of robotic technology to generate huge amounts of molecular biology data. Genomics has led to an explosion
More informationItem Response Theory for Polytomous Items Rachael Smyth
Item Response Theory for Polytomous Items Rachael Smyth Introduction This lab discusses the use of Item Response Theory (or IRT) for polytomous items. Item response theory focuses specifically on the items
More informationPackage AbsFilterGSEA
Type Package Package AbsFilterGSEA September 21, 2017 Title Improved False Positive Control of Gene-Permuting GSEA with Absolute Filtering Version 1.5.1 Author Sora Yoon Maintainer
More informationTwo-Way Independent ANOVA
Two-Way Independent ANOVA Analysis of Variance (ANOVA) a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment. There
More informationFigure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022
96 APPENDIX B. Supporting Information for chapter 4 "changes in genome content generated via segregation of non-allelic homologs" Figure S1. Potential de novo CNV probes and sizes of apparently de novo
More informationHow to compute a semantic similarity threshold. Charles Bettembourg, Christian Diot, Olivier Dameron
How to compute a semantic similarity threshold Charles Bettembourg, Christian Diot, Olivier Dameron Abstract The analysis of gene annotations related to Gene Ontology plays an important role in the interpretation
More informationPackage wally. May 25, Type Package
Type Package Package wally May 25, 2017 Title The Wally Calibration Plot for Risk Prediction Models Version 1.0.9 Date 2017-04-28 Author Paul F Blanche , Thomas A. Gerds
More informationA Quick-Start Guide for rseqdiff
A Quick-Start Guide for rseqdiff Yang Shi (email: shyboy@umich.edu) and Hui Jiang (email: jianghui@umich.edu) 09/05/2013 Introduction rseqdiff is an R package that can detect differential gene and isoform
More informationPackage prognosticroc
Type Package Package prognosticroc February 20, 2015 Title Prognostic ROC curves for evaluating the predictive capacity of a binary test Version 0.7 Date 2013-11-27 Author Y. Foucher
More informationCSDplotter user guide Klas H. Pettersen
CSDplotter user guide Klas H. Pettersen [CSDplotter user guide] [0.1.1] [version: 23/05-2006] 1 Table of Contents Copyright...3 Feedback... 3 Overview... 3 Downloading and installation...3 Pre-processing
More informationJ. A. Mayfield et al. FIGURE S1. Methionine Salvage. Methylthioadenosine. Methionine. AdoMet. Folate Biosynthesis. Methylation SAH.
FIGURE S1 Methionine Salvage Methionine Methylthioadenosine AdoMet Folate Biosynthesis Methylation SAH Homocysteine Homocystine CBS Cystathionine Cysteine Glutathione Figure S1 Biochemical pathway of relevant
More information