Package MethPed. September 1, 2018

Similar documents
Package diggitdata. April 11, 2019

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

Package inctools. January 3, 2018

MethylMix An R package for identifying DNA methylation driven genes

Package leukemiaseset

Package appnn. R topics documented: July 12, 2015

Package IMAS. March 10, 2019

Package xseq. R topics documented: September 11, 2015

Package citccmst. February 19, 2015

Package BUScorrect. September 16, 2018

Package AIMS. June 29, 2018

Nature Methods: doi: /nmeth.3115

Package PELVIS. R topics documented: December 18, Type Package

Package nopaco. R topics documented: April 13, Type Package Title Non-Parametric Concordance Coefficient Version 1.0.

Package cancer. July 10, 2018

Package pepdat. R topics documented: March 7, 2015

Package QualInt. February 19, 2015

Package CompareTests

Validation of the MethylationEPIC BeadChip for fresh-frozen and formalin-fixed paraffinembedded

Package AbsFilterGSEA

Package msgbsr. January 17, 2019

Package hei. August 17, 2017

Package inote. June 8, 2017

Package HAP.ROR. R topics documented: February 19, 2015

Package p3state.msm. R topics documented: February 20, 2015

Package seqcombo. R topics documented: March 10, 2019

Package ega. March 21, 2017

Package TFEA.ChIP. R topics documented: January 28, 2018

USING DNA METHYLATION TO PREDICT WHITE BLOOD CELLS FREQUENCIES IN TUMOR TISSUE SAMPLES CS 229 FINAL PROJECT WRITE-UP FALL 2015

Package flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version

Package longroc. November 20, 2017

Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

Package pollen. October 7, 2018

Package propoverlap. R topics documented: February 20, Type Package

Package ORIClust. February 19, 2015

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Modeling Sentiment with Ridge Regression

Package DeconRNASeq. November 19, 2017

Package fabiadata. July 19, 2018

Package wally. May 25, Type Package

Risk-prediction modelling in cancer with multiple genomic data sets: a Bayesian variable selection approach

Supplementary Figures

Solving Problems of Clustering and Classification of Cancer Diseases Based on DNA Methylation Data 1,2

Package ICCbin. November 14, 2017

Package xmeta. August 16, 2017

Package biocancer. August 29, 2018

Supplementary Note. Nature Genetics: doi: /ng.2928

Package sbpiper. April 20, 2018

Package CLL. April 19, 2018

DNA methylation signatures for 2016 WHO classification subtypes of diffuse gliomas

SubLasso:a feature selection and classification R package with a. fixed feature subset

Package clust.bin.pair

Package Actigraphy. R topics documented: January 15, Type Package Title Actigraphy Data Analysis Version 1.3.

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Introduction to LOH and Allele Specific Copy Number User Forum

Package pathifier. August 17, 2018

Package TargetScoreData

Package medicalrisk. August 29, 2016

Package bgafun. R topics documented: August 12, Type Package

Exercises: Differential Methylation

Package noia. February 20, 2015

Package ClusteredMutations

WHO 2016 CNS Tumor Classification Update. DISCLOSURES (Arie Perry, MD) PATTERN RECOGNITION. Arie Perry, M.D. Director, Neuropathology

Vega: Variational Segmentation for Copy Number Detection

Title: Pinpointing resilience in Bipolar Disorder

What are the determinants of DNA demethylation following treatment of AML cell lines and patient samples with decitabine?

ivlewbel: Uses heteroscedasticity to estimate mismeasured and endogenous regressor models

Package speff2trial. February 20, 2015

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Package CorMut. January 21, 2019

SUPPLEMENTARY INFORMATION

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Package mirlab. R topics documented: June 29, Type Package

Package memapp. R topics documented: June 2, Version 2.10 Date

Pediatric Brain Tumors: Updates in Treatment and Care

Nearest Shrunken Centroid as Feature Selection of Microarray Data

Package rvtdt. February 20, rvtdt... 1 rvtdt.example... 3 rvtdts... 3 TrendWeights Index 6. population control weighted rare-varaints TDT

Propensity scores and causal inference using machine learning methods

Package cssam. February 19, 2015

The 16th KJC Bioinformatics Symposium Integrative analysis identifies potential DNA methylation biomarkers for pan-cancer diagnosis and prognosis

New Imaging Concepts in Central Nervous System Neoplasms

Comparison of discrimination methods for the classification of tumors using gene expression data

Phenotype prediction based on genome-wide DNA methylation data

Package frailtyhl. February 19, 2015

Package PROcess. R topics documented: June 15, Title Ciphergen SELDI-TOF Processing Version Date Author Xiaochun Li

Package prognosticroc

Supplementary Figure 1

Cancer outlier differential gene expression detection

WHO 2016 CNS TUMOR CLASSIFICATION UPDATE. Arie Perry, M.D. Director, Neuropathology

Supplemental Information. Molecular, Pathological, Radiological, and Immune. Profiling of Non-brainstem Pediatric High-Grade

Ward Headstrom Institutional Research Humboldt State University CAIR

PROGRESS UPDATE. Evan T. Mandeville DIPG Research Fund

Package tumgr. February 4, 2016

Weight Adjustment Methods using Multilevel Propensity Models and Random Forests

The LiquidAssociation Package

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Package preference. May 8, 2018

Transcription:

Type Package Version 1.8.0 Date 2016-01-01 Package MethPed September 1, 2018 Title A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes Depends R (>= 3.0.0), Biobase Suggests BiocStyle, knitr, markdown, impute Imports randomforest, grdevices, graphics, stats VignetteBuilder knitr biocviews DNAMethylation, Classification, Epigenetics Encoding UTF-8 Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro). License GPL-2 RoxygenNote 5.0.1 NeedsCompilation no Author Mohammad Tanvir Ahamed [aut, trl], Anna Danielsson [aut], Szilárd Nemes [aut, trl], Helena Carén [aut, cre, cph] Maintainer Helena Carén <helenacarenlab@gmail.com> git_url https://git.bioconductor.org/packages/methped git_branch RELEASE_3_7 git_last_commit d7639ea git_last_commit_date 2018-04-30 Date/Publication 2018-08-31 1

2 MethPed R topics documented: checkna.......................................... 2 MethPed........................................... 2 MethPed_900probes.................................... 5 MethPed_sample...................................... 6 plot.methped........................................ 7 probemis.......................................... 7 summary.methped...................................... 8 Index 9 checkna Check missing valus in dataset Check missing valus in dataset checkna(data) data Object in class data.frame or matrix or ExpressionSet. #################### Loading and view sample data head(methped_sample,10) ################### Check missing value checkna(methped_sample) MethPed MethPed: a DNA methylation classifier tool for the identification of pediatric brain tumor subtypes. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro). MethPed(data, TargetID="TargetID", prob = TRUE,...)

MethPed 3 data TargetID prob Details Data for classification. Data can be in class "ExpressionSet", "matrix" or "data.frame". See MethPed vignette for details. Name of the "Probe" column in data If TRUE (Default value), the return value will be conditional probability from which tumor group the sample belongs to. FALSE will only return value for the tumor group which has the maximum conditional probability for a sample and other non-maximum values will be zero.... More parameter to add Classification of pediatric tumors into biologically defined subtypes is challenging, and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The MethPed classifier uses the Random Forest (RF) algorithm to classify unknown pediatric brain tumor samples into sub-types. The classification proceeds with the selection of the beta values needed for the classification. The computational process proceeded in two stages. The first stage commences with a reduction of the probe pool or building the training probe dataset for classification. We have named this dataset as predictors. The second stage is to apply the RF algorithm to classify the probe data of interest based on the training probe dataset (predictors). For the construction of the training probe pool (predictors), methylation data generated by the Illumina Infinium HumanMethylation 450 BeadChip arrays were downloaded from the Gene Expression Omnibus (GEO). Four hundred seventy-two cases were available, representing several brain tumor diagnoses (DIPG, glioblastoma, ETMR, medulloblastoma, ependymoma, pilocytic astrocytoma) and their further subgroups. The data sets were merged and probes that did not appear in all data sets were filtered away. In addition, about 190,000 CpGs were removed due to SNPs, repeats and multiple mapping sites. The final data set contained 206,823 unique probes and nine tumor classes including the medulloblastoma subgroups. K neighbor imputation was used for missing probe data. After that, a large number of regression analyses were performed to select the 100 probes per tumor class that had the highest predictive power (AUC values). Based on the identified 900 methylation sites, the nine pediatric brain tumor types could be accurately classified using the multiclass classification algorithm MethPed. The output of the algorithm is partitioned in 6 parts : "$target_id" Probes name in main data set "$probes" Number of probes "$sample" Number of samples "$probes_missing" Names of the probes that were included in the original classifier but are missing from the data at hand. $oob_err.rate If a large number of probes are missing this could potentially lead to a drop in the predictive accuracy of the model. Note that, the out-of-bag error is 1.7 percent of the original MethPed classifier.

4 MethPed $predictions The last and most interesting output contains the predictions. We chose not to assign tumors to one or other group but give probabilities of belonging in one or other subtype. See vignette for details. Author(s) Anna Danielsson, Mohammad Tanvir Ahamed, Szilárd Nemes and Helena Carén, University of Gothenburg, Sweden. References [1] Anna Danielsson, Szilárd Nemes, Magnus Tisell, Birgitta Lannering, Claes Nordborg, Magnus Sabel, and Helena Carén. "MethPed: A DNA Methylation Classifier Tool for the Identification of Pediatric Brain Tumor Subtypes". Clinical Epigenetics 2015, 7:62, 2015 [2] Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5-32. doi:10.1023/a:1010933404324 [3] Troyanskaya, O., M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. "Missing Estimation Methods for DNA Microarrays." Bioinformatics 17.6 (2001): 520-25. See Also See http://www.clinicalepigeneticsjournal.com/content/7/1/62 for more details. #################### Loading and view sample data head(methped_sample,10) #################### Check dimention of sample data dim(methped_sample) # Check number pof probes and samples in data #################### Checking missing value in the data missingindex <- checkna(methped_sample) #################### Apply MethPed to sample data (Probability for all tumor group) myclassification<-methped(methped_sample) myclassification<-methped(methped_sample,prob=true) #################### Apply MethPed to sample data (Only maximum probability expected) myclassification_max<-methped(methped_sample,prob=false) #################### Summary of results summary(myclassification) summary(myclassification) #################### Barplot of conditional prediction probability on different samples par(mai = c(1, 1, 1, 2), xpd=true) mat<-t(myclassification$predictions) mycols <- c("green",rainbow(nrow(mat),start=0,end=1)[nrow(mat):1],"red") barplot(mat, col = mycols, beside=false,axisnames=true, ylim=c(0,1),xlab= "Sample",ylab="Probability") legend( ncol(mat)+0.5,1,

MethPed_900probes 5 legend = rownames(mat),fill = mycols,xpd=true, cex = 0.6) ## Generic function to plot plot(myclassification) # myclassification should be an object in "methped" class MethPed_900probes List of 900 probes in the predictor data List of 900 probes in the predictor (Training data for Random Forest model) data(methped_900probes) Format Data frame of 900 probes. Details For the construction of the training probe pool (predictors), methylation data generated by the Illumina Infinium HumanMethylation 450 BeadChip arrays were downloaded from the Gene Expression Omnibus (GEO). Four hundred seventy-two cases were available, representing several brain tumor diagnoses (DIPG, glioblastoma, ETMR, medulloblastoma, ependymoma, pilocytic astrocytoma) and their further subgroups. The data sets were merged and probes that did not appear in all data sets were filtered away. In addition, about 190,000 CpGs were removed due to SNPs, repeats and multiple mapping sites. The final data set contained 206,823 unique probes and nine tumor classes including the medulloblastoma subgroups. K neighbor imputation was used for missing probe data. After that, a large number of regression analyses were performed to select the 100 probes per tumor class that had the highest predictive power (AUC values). Based on the identified 900 methylation sites, the nine pediatric brain tumor types could be accurately classified using the multiclass classification algorithm MethPed. 450k methylation array probe name References [1] Anna Danielsson, Szilárd Nemes, Magnus Tisell, Birgitta Lannering, Claes Nordborg, Magnus Sabel, and Helena Carén. "MethPed: A DNA Methylation Classifier Tool for the Identification of Pediatric Brain Tumor Subtypes". Clinical Epigenetics 2015, 7:62, 2015 See Also See http://www.clinicalepigeneticsjournal.com/content/7/1/62 for more details.

6 MethPed_sample #################### Loading and view sample data data(methped_900probes) head(methped_900probes) MethPed_sample Sample dataset to MethPed package. Methylation beta-values generated with the Infinium HumanMethylation450 BeadChips (Illumina). Format A data frame with 468821 probes and 2 tumor samples DNA Methylation beta-values References [1] Anna Danielsson, Szilárd Nemes, Magnus Tisell, Birgitta Lannering, Claes Nordborg, Magnus Sabel, and Helena Carén. "MethPed: A DNA Methylation Classifier Tool for the Identification of Pediatric Brain Tumor Subtypes". Clinical Epigenetics 2015, 7:62, 2015 See Also See http://www.clinicalepigeneticsjournal.com/content/7/1/62 for more details. #################### Loading and view sample data head(methped_sample) #################### Check dimention of sample data dim(methped_sample) # Check number of probes and samples in data #################### Checking missing value in the data missingindex <- checkna(methped_sample)

plot.methped 7 plot.methped Plot conditional probability of samples that belongs to different tumor subtypes. Plot conditional probability of samples that belongs to different tumor subtypes. ## S3 method for class 'methped' plot(x,...) x Object in "methped" class. Output of function MethPed.... More arguments from function barplot. Object in "methped" class. Output of function MethPed. #################### Loading sample data #################### Applying MethPed to sample data res<-methped(methped_sample) #################### Plot conditional probability plot(res) probemis Missing probe names from training data compared with input samples. Missing probe names from training data compared with input samples. probemis(x) x Object in MethPed class. Output of function MethPed.

8 summary.methped Object in "methped" class. Output of function MethPed. #################### Loading sample data #################### Applying MethPed to sample data res<-methped(methped_sample) #################### The names of the probes that were included in the original # classifier but are missing from the data at hand probemis(res) summary.methped Summary of conditional probability or binary classification of samples that belong to different tumor subtypes. Summary of conditional probability or binary classification of samples that belong to different tumor subtypes. ## S3 method for class 'methped' summary(object,...) object Object in methped class. Output of function MethPed.... Additional arguments affecting the summary produced Object in "methped" class. Output of function MethPed. #################### Loading sample data #################### Applying MethPed to sample data res<-methped(methped_sample) #################### Summary function of MethPed output summary (res)

Index Topic data MethPed_900probes, 5 MethPed_sample, 6 barplot, 7 checkna, 2 MethPed, 2, 7, 8 MethPed_900probes, 5 MethPed_sample, 6 plot.methped, 7 probemis, 7 summary.methped, 8 9