Package IMAS. March 10, 2019

Similar documents
Package msgbsr. January 17, 2019

splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

Supplemental Data. Integrating omics and alternative splicing i reveals insights i into grape response to high temperature

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Package MethPed. September 1, 2018

Package splicer. R topics documented: August 19, Version Date 2014/04/29

Package TFEA.ChIP. R topics documented: January 28, 2018

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

cn.mops - Mixture of Poissons for CNV detection in NGS data Günter Klambauer Institute of Bioinformatics, Johannes Kepler University Linz

Package nopaco. R topics documented: April 13, Type Package Title Non-Parametric Concordance Coefficient Version 1.0.

Package leukemiaseset

Package AbsFilterGSEA

Package AIMS. June 29, 2018

Package cancer. July 10, 2018

SUPPLEMENTARY FIGURES: Supplementary Figure 1

Package HAP.ROR. R topics documented: February 19, 2015

Package rvtdt. February 20, rvtdt... 1 rvtdt.example... 3 rvtdts... 3 TrendWeights Index 6. population control weighted rare-varaints TDT

MethylMix An R package for identifying DNA methylation driven genes

Package BUScorrect. September 16, 2018

Computational Analysis of UHT Sequences Histone modifications, CAGE, RNA-Seq

Package CorMut. January 21, 2019

Package diggitdata. April 11, 2019

Package DeconRNASeq. November 19, 2017

Data mining with Ensembl Biomart. Stéphanie Le Gras

A Quick-Start Guide for rseqdiff

Package pathifier. August 17, 2018

Figure S2. Distribution of acgh probes on all ten chromosomes of the RIL M0022

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

ivlewbel: Uses heteroscedasticity to estimate mismeasured and endogenous regressor models

An Analysis of MDM4 Alternative Splicing and Effects Across Cancer Cell Lines

Supplemental Information For: The genetics of splicing in neuroblastoma

Accessing and Using ENCODE Data Dr. Peggy J. Farnham

Introduction. Introduction

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Package citccmst. February 19, 2015

Package NarrowPeaks. August 3, Version Date Type Package

SUPPLEMENTARY INFORMATION. Intron retention is a widespread mechanism of tumor suppressor inactivation.

Package inctools. January 3, 2018

The FunCluster Package

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Analyse de données de séquençage haut débit

User Guide. Association analysis. Input

Package flowtype. R topics documented: July 18, Type Package. Title Phenotyping Flow Cytometry Assays. Version

Inference of Isoforms from Short Sequence Reads

Quantification of DNA Methylation and Epimutations from Bisulfite Sequencing Data the BEAT package

A Practical Guide to Integrative Genomics by RNA-seq and ChIP-seq Analysis

Package ORIClust. February 19, 2015

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Alternative splicing analysis for finding novel molecular signatures in renal carcinomas

Supplementary Table S1. List of PTPRK-RSPO3 gene fusions in TCGA's colon cancer cohort. Chr. # of Gene 2. Chr. # of Gene 1

Applying Metab. Raphael Aggio. October 30, This document describes how to use the function included in the R package Metab.

REGULATED SPLICING AND THE UNSOLVED MYSTERY OF SPLICEOSOME MUTATIONS IN CANCER

Bio 111 Study Guide Chapter 17 From Gene to Protein

Performing. linkage analysis using MERLIN

genomics for systems biology / ISB2020 RNA sequencing (RNA-seq)

MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data

Problem 3: Simulated Rheumatoid Arthritis Data

Package inote. June 8, 2017

Package noia. February 20, 2015

Computational aspects of ChIP-seq. John Marioni Research Group Leader European Bioinformatics Institute European Molecular Biology Laboratory

Package CorMut. August 2, 2013

Package ClusteredMutations

Package QualInt. February 19, 2015

Package seqcombo. R topics documented: March 10, 2019

Multi-omics data integration colon cancer using proteogenomics approach

Package xseq. R topics documented: September 11, 2015

Predicting clinical outcomes in neuroblastoma with genomic data integration

EXPression ANalyzer and DisplayER

Cancer Informatics Lecture

Ambient temperature regulated flowering time

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

Supplementary Figures

RNA SEQUENCING AND DATA ANALYSIS

Package CompareTests

V23 Regular vs. alternative splicing

GeneOverlap: An R package to test and visualize

Package metaseq. R topics documented: August 17, Type Package

Assignment 5: Integrative epigenomics analysis

TITAN: Subclonal copy number and LOH prediction from whole genome sequencing of tumours

Supplementary note: Comparison of deletion variants identified in this study and four earlier studies

DMD Genetics: complicated, complex and critical to understand

README file for GRASTv1.0.pl

7SK ChIRP-seq is specifically RNA dependent and conserved between mice and humans.

MODULE 3: TRANSCRIPTION PART II

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Methods: doi: /nmeth.3115

Package pepdat. R topics documented: March 7, 2015

Package ICCbin. November 14, 2017

Package singlecase. April 19, 2009

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Metabolomic and Proteomics Solutions for Integrated Biology. Christine Miller Omics Market Manager ASMS 2015

Supplementary Figure S1. Gene expression analysis of epidermal marker genes and TP63.

Package sbpiper. April 20, 2018

Nature Medicine: doi: /nm.4439

Where Splicing Joins Chromatin And Transcription. 9/11/2012 Dario Balestra

Package wally. May 25, Type Package

Transcription:

Type Package Package IMAS March 10, 2019 Title Integrative analysis of Multi-omics data for Alternative Splicing Version 1.6.1 Author Seonggyun Han, Younghee Lee Maintainer Seonggyun Han <hangost@ssu.ac.kr> Integrative analysis of Multi-omics data for Alternative splicing. License GPL-2 Depends R (> 3.0.0),GenomicFeatures, ggplot2, IVAS Imports doparallel, lme4, BiocGenerics, GenomicRanges, IRanges, foreach, AnnotationDbi, S4Vectors, GenomeInfoDb, stats, ggfortify, grdevices, methods, Matrix, utils, graphics, gridextra, grid, lattice, Rsamtools, survival, BiocParallel, GenomicAlignments, parallel Suggests BiocStyle, RUnit biocviews ImmunoOncology, AlternativeSplicing, DifferentialExpression, DifferentialSplicing, GeneExpression, GeneRegulation, Regression, RNASeq, Sequencing, SNP, Software, Transcription git_url https://git.bioconductor.org/packages/imas git_branch RELEASE_3_8 git_last_commit a03cea5 git_last_commit_date 2019-01-04 Date/Publication 2019-03-09 R topics documented: IMAS-package....................................... 2 ASvisualization....................................... 2 Clinical.data......................................... 3 ClinicAnalysis........................................ 4 CompGroupAlt....................................... 5 ExonsCluster........................................ 7 GroupSam.......................................... 8 MEsQTLFinder....................................... 8 RatioFromReads...................................... 10 1

2 ASvisualization samplebamfiles....................................... 11 samplemedata........................................ 12 samplemelocus....................................... 13 samplesnp.......................................... 13 samplesnplocus....................................... 14 SplicingReads........................................ 14 Index 16 IMAS-package IMAS: Integrative analysis of Multi-omics data for Alternative Splicing IMAS offers two components. First, RatioFromReads estimates PSI values of a given alternatively spliced exon using both of paired-end and junction reads. See the examples at RatioFromReads. Second, CompGroupAlt, MEsQTLFinder, and ClinicAnalysis can be used for further analysis using estimated PSI values. We described more detailed information on usage at the package vignette. Author(s) Seonggyun Han, Younghee Lee ASvisualization Visualize the results of the ASdb object. This function makes a pdf file consisting of plots for results in the ASdb object. Arguments ASvisualization(ASdb,CalIndex=NULL,txTable=NULL,exon.range=NULL,snpdata=NULL, snplocus=null,methyldata=null,methyllocus=null,groupsam=null, ClinicalInfo=NULL,out.dir=NULL) ASdb CalIndex txtable exon.range snpdata snplocus methyldata A ASdb object. An index number in the ASdb object which will be tested in this function. A data frame of transcripts including transcript IDs, Ensembl gene names, Ensembl transcript names, transcript start sites, and transcript end sites. A list of GRanges objects including total exon ranges in each transcript resulted from the exonsby function in GenomicFeatures. A data frame of genotype data. A data frame consisting of locus information of SNP markers in the snpdata. A data frame consisting of methylation levels.

Clinical.data 3 methyllocus GroupSam ClinicalInfo out.dir A data frame consisting of methylation locus. A list object of a group of each sample. A data frame consisting of a path of bam file and identifier of each sample. An output directory This function makes pdf for plots. Author(s) Seonggyun Han, Younghee Lee data(samplegroups) data(samplemethyl) data(samplemethyllocus) data(samplesnp) data(samplesnplocus) data(sampleclinical) data(bamfilestest) ext.dir <- system.file("extdata", package="imas") samplebamfiles[,"path"] <- paste(ext.dir,"/samplebam/",samplebamfiles[,"path"],".bam",sep="") sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) ASdb <- RatioFromReads(ASdb,samplebamfiles,"paired",50,40,3,CalIndex="ES3") ASdb <- sqtlsfinder(asdb,samplesnp,samplesnplocus,method="lm") ASdb <- CompGroupAlt(ASdb,GroupSam,CalIndex="ES3") ASdb <- MEsQTLFinder(ASdb,sampleMedata,sampleMelocus,CalIndex="ES3",GroupSam=GroupSam,out.dir=NULL) Sdb <- ClinicAnalysis(ASdb,Clinical.data,CalIndex="ES3",out.dir=NULL) exon.range <- exonsby(transdb,by="tx") sel.cn <- c("txchrom","txname","geneid","txstart","txend","txstrand") txtable <- select(transdb, keys=names(exon.range),columns=sel.cn,keytype="txid") ASvisualization(ASdb,CalIndex="ES3",txTable,exon.range,samplesnp,samplesnplocus, samplemedata,samplemelocus,groupsam,clinical.data,out.dir="./") Clinical.data A data frame for clinical data A data frame including survival status and time for each sample. This data is a simulated clinical data for 50 samples (half of whom are assigned as PR-positive and the other half PR-negative), which is used in analysis with IMAS. The detailed overview of the data is described in the vignette. data(sampleclinical)

4 ClinicAnalysis Format A data frame with survival information and times on the 50 samples A data frame with survival information and times on the 50 samples ClinicAnalysis Analysis for differential clinical outcomes across PSI values This function separate a set of samples into two groups (low and high PSI values) using K-means clustering and perform a statistical test to identify differential survival outcomes between the groups. Internally, this function calls the kmeans and survdiff functions in the stats and survival packages, respectively. ClinicAnalysis(ASdb, ClinicalInfo = NULL, CalIndex = NULL, display = FALSE, Ncor = 1, out.dir = NULL) Arguments ASdb ClinicalInfo CalIndex display Ncor out.dir An ASdb object containing "SplicingModel" and "Ratio" slots from the Splicingfinder and RatioFromFPKM functions, respectively. A data frame consisting of a path of bam file and identifier of each sample. An index number in the ASdb object which will be tested in this function. The option returns the survival Kaplan-Meier plot. (TRUE = it will return the list object with a ggplot object and table showing the result of this function, FALSE = it will return P-value.) The number of cores for multi-threads function. An output directory. ASdb with the slot (labeled by "Clinical") containing results from the ClinicAnalysis function. The "Clinical" slot contains a list object and each element of the list object returns the results assigned to three elements, which is of each alternative splicing type (i.e. Exon skipping, Alternative splice site, Intron retention). Three elements are as follows; ES A data frame for the result of Exon skipping, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), 1stEX (alternatively spliced target exon), 2ndEX (second alternatively spliced target exon which is the other one of the mutually exclusive spliced exons), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), Pvalue (P-value of Kaplan-Meier test for differential survival outcomes between low and high PSI groups), and Fdr.p (FDR values).

CompGroupAlt 5 ASS IR A data frame for the result of Alternative splice sites, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), ShortEX (shorter spliced target exon), LongEX (longer spliced target exon), NeighborEX (neighboring down or upstream exons), Types (splicing type), Pvalue (P-value of Kaplan-Meier test for differential survival outcomes between low and high PSI groups), and Fdr.p (FDR values). A data frame for the result of Intron retention, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), RetainEX (retained intron exon), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), Pvalue (P-values of Kaplan- Meier test for differential survival outcomes between low and high PSI groups), and Fdr.p (FDR values). Author(s) Seonggyun Han, Younghee Lee See Also kmeans, survdiff, survfit data(bamfilestest) data(sampleclinical) ext.dir <- system.file("extdata", package="imas") samplebamfiles[,"path"] <- paste(ext.dir,"/samplebam/",samplebamfiles[,"path"],".bam",sep="") sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ## Not run: ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) ASdb <- RatioFromReads(ASdb,samplebamfiles,"paired",50,40,3,CalIndex="ES3") ASdb <- ClinicAnalysis(ASdb,Clinical.data,CalIndex="ES3",out.dir=NULL) ## End(Not run) CompGroupAlt Identify alternatively spliced exons with a differential PSIs between the groups This function performs a regression test to identify alternatively spliced exons that are differentially expressed between two groups. It will call the lm function to test a linear regression model. CompGroupAlt(ASdb, GroupSam = NULL, Ncor = 1, CalIndex = NULL, out.dir = NULL)

6 CompGroupAlt Arguments ASdb GroupSam Ncor CalIndex out.dir An ASdb object containing "SplicingModel" and "Ratio" slots from the Splicingfinder and RatioFromFPKM functions, respectively. A list object of a group of each sample. The number of cores for multi-threads function. An index number in the ASdb object which will be tested in this function. An output directory. ASdb with the slot (labeled by "GroupDiff") containing results from the CompGroupAlt function. The "GroupDiff" slot consists of a list object and each element of the list object returns the results assigned to three elements, which is of each alternative splicing type (i.e. Exon skipping, Alternative splice site, Intron retention). Three elements are as follows; ES ASS IR A data frame for the result of Exon skipping, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), 1stEX (alternatively spliced target exon), 2ndEX (second alternatively spliced target exon which is the other one of the mutually exclusive spliced exons), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), Diff.P (P-value of linear regression test for differential expression between groups), and Fdr.p (FDR values). A data frame for the result of Alternative splice sites, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome nam), ShortEX (shorter spliced target exon), LongEX (longer spliced target exon), NeighborEX (neighboring down or upstream exons), Types (splicing type), Diff.P (P-value of linear regression test for differential expression between groups), and Fdr.p (FDR values). A data frame for the result of Intron retention, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), RetainEX (retained intron exon), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), Diff.P (P-value of linear regression test for differential expression between groups), and Fdr.p (FDR values). Author(s) Seonggyun Han, Younghee Lee References Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole. See Also lm

ExonsCluster 7 data(bamfilestest) data(samplegroups) ext.dir <- system.file("extdata", package="imas") samplebamfiles[,"path"] <- paste(ext.dir,"/samplebam/",samplebamfiles[,"path"],".bam",sep="") sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ## Not run: ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) ASdb <- RatioFromReads(ASdb,samplebamfiles,"paired",50,40,3,CalIndex="ES3") ASdb <- CompGroupAlt(ASdb,GroupSam,CalIndex="ES3") ## End(Not run) ExonsCluster Construct representative Exons This function constructs representative Exons. Arguments ExonsCluster(ASdb,GTFdb,Ncor=1,txTable=NULL) ASdb GTFdb Ncor txtable An ASdb object containing "SplicingModel" from the Splicingfinder funtion. A TxDb object in the GenomicFeatures package. The number of cores for multi-threads function. The matrix of transcripts including transcript IDs, Ensembl gene names, Ensembl transcript names, transcript start sites, and transcript end sites. ASdb containing representative exons. Author(s) Seonggyun Han, Younghee Lee sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ## Not run: ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) ## End(Not run)

8 MEsQTLFinder GroupSam Group of each sample. Format A list object comprising sample names belonging to each group, PR-positive and PR-negative. This data is a simulated clinical data for 50 samples (half of whom are assigned as PR-positive and the other half PR-negative). The detailed overview of the data is described in the vignette. data(samplegroups) A list object including a group information on the 50 samples A list object including a group information on the 50 samples data(samplegroups) MEsQTLFinder Identify methylation loci that are significantly associated with alternatively spliced exons This function performs a regression test to identify significant association between methylation levels and PSI values using a linear regression model of lm function. Arguments MEsQTLFinder(ASdb, Total.Medata = NULL, Total.Melocus = NULL, GroupSam = NULL, Ncor = 1, CalIndex = NULL, out.dir = NULL) ASdb Total.Medata Total.Melocus GroupSam Ncor CalIndex out.dir An ASdb object including "SplicingModel" and "Ratio" slots from the Splicingfinder and RatioFromFPKM functions, respectively. A data frame consisting of methylation levels. A data frame consisting of methylation locus. A list object of a group of each sample. The number of cores for multi-threads. An index number in the ASdb object which will be tested in this function. An output directory.

MEsQTLFinder 9 ASdb with the slot (labeled by "Me.sQTLs") containing the results from the MEsQTLFinder function. The "Me.sQTLs" slot is consists of a list object and each element of the list object returns the results assigned to three elements, which is of each alternative splicing type (i.e. Exon skipping, Alternative splice site, Intron retention). Three elements are as follows; ES ASS IR A data frame for the result of Exon skipping, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), 1stEX (alternatively spliced target exon), 2ndEX (second alternatively spliced target exon which is the other one of the mutually exclusive spliced exons), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), pbymet (P-values of linear regression test for association between methylation levels and PSI values), fdrbymet (FDR values for the pbymet column), pbygroups (P-values of t-test for differential methylation levels between two groups, and fdrbygroups ( FDR values for the pbygroups column). A data frame for the result of Alternative splice sites, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome nam), ShortEX (shorter spliced target exon), LongEX (longer spliced target exon), NeighborEX (neighboring down or upstream exons), Types (splicing type), pbymet (P-values of linear regression test for association between methylation levels and PSI values), fdrbymet (FDR values for the pbymet column), pbygroups (P-values of t-test for differential methylation levels between groups, and fdrbygroups (adjust FDR values for the pbygroups column. A data frame for the result of Intron retention, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), RetainEX (retained intron exon), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), pbymet (P-values of linear regression test for association between methylation levels and PSI values), fdr- ByMet (adjust FDR values for the pbymet column), pbygroups (P-values of t-test for differential methylation levels between the groups, and fdrbygroups (adjust FDR values for the pbygroups column. Author(s) Seonggyun Han, Younghee Lee References Chambers, J. M. (1992) Linear models. Chapter 4 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole. See Also lm data(bamfilestest) data(samplemethyl) data(samplemethyllocus) data(samplegroups) ext.dir <- system.file("extdata", package="imas") samplebamfiles[,"path"] <- paste(ext.dir,"/samplebam/",samplebamfiles[,"path"],".bam",sep="")

10 RatioFromReads sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ## Not run: ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) ASdb <- RatioFromReads(ASdb,samplebamfiles,"paired",50,40,3,CalIndex="ES3") ASdb <- MEsQTLFinder(ASdb,sampleMedata,sampleMelocus,CalIndex="ES3",GroupSam=GroupSam,out.dir=NULL) ## End(Not run) RatioFromReads Calculate expression ratio (PSI) from bamfiles This function extracts reads information from bamfile using Rsamtools and calculates expression ratio (denoted as Percent Splice-In, PSI) of each alternatively spliced exon (i.e., exon skipping, intro retention, and 5- and 3- prime splice sites). Arguments ASdb RatioFromReads(ASdb=NULL,Total.bamfiles=NULL,readsInfo=c("paired","single"), readlen=null,insersize=null,minr=3,calindex=null,ncor=1,out.dir=null) An ASdb object including "SplicingModel" slot from the Splicingfinder function. Total.bamfiles A data frame containing the path and name of a bamfile from RNA-seq readsinfo readlen insersize minr CalIndex Ncor out.dir Information of RNA-seq types (single- or paired-end reads) The read length The insert size between paired-end reads. A minimum number of testable reads mapping to a given exon. An index number in the ASdb object which will be tested in this function. The number of cores for multi-threads. An output directory. ASdb with the slot (labeled by "Ratio") containing results from the the RatioFromReads function. The "Ratio" slot contains a list object and each element of the list object returns the results assigned to three elements, which is of each alternative splicing type (i.e. Exon skipping, Alternative splice site, Intron retention). Three elements are as follows; ES A data frame for the result of Exon skipping, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), 1stEX (alternatively spliced target exon), 2ndEX (second alternatively spliced target exon which is the other one of the mutually exclusive spliced exons), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), and names of individuals.

samplebamfiles 11 ASS IR A data frame for the result of Alternative splice sites, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), ShortEX (shorter spliced target exon), LongEX (longer spliced target exon), NeighborEX (neighboring down or upstream exons), Types (splicing type), and names of individuals. A data frame for the result of Intron retention, consisting of the columns named as follows; Index (index number), EnsID (gene name), Nchr (chromosome name), RetainEX (retained intron exon), DownEX (downstream exon range), UpEX (upstream exon range), Types (splicing type), and names of individuals. Author(s) Seonggyun Han, Younghee Lee See Also SplicingReads data(bamfilestest) ext.dir <- system.file("extdata", package="imas") samplebamfiles[,"path"] <- paste(ext.dir,"/samplebam/",samplebamfiles[,"path"],".bam",sep="") sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ## Not run: ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) ASdb <- RatioFromReads(ASdb,samplebamfiles,"paired",50,40,3,CalIndex="ES3") ## End(Not run) samplebamfiles A data frame for example expression bam files. A path and identifier of bam files for 50 samples. For each bam file, mapped reads were randomly generated that came from the genomic region of chr11: 100,933,178-100,996,889. With each simulated bam file of 50 samples, PSI level is calculated for the exon that is located in chr11: 100,962,491-100,962,607. The simulated PSI values are in the range of 0.6 to 1.0. The range of 0.9 to 1.0 of PSI values are assigned to PR-positive group and 0.5 to 0.6 to PR-negative group. The detailed overview of the data is described in the vignette. data(bamfilestest) Format A data frame with paths and identifiers on the 50 samples

12 samplemedata A data frame with paths and identifiers on the 50 samples Source The data was provided from IMAS data(bamfilestest) samplemedata Methylation level data Methylation level of 5 loci (beta value), which are located in the PRGA gene for 50 samples. We generated a simulation data set of methylation level (beta value) for each locus, that significantly differs between two groups (PR-positive and PR-negative), while other methylation loci are not different. The detailed overview of the data is described in the vignette. data(samplemethyl) Format A data frame with levels of 5 methylation locus on the 50 samples A data frame with levels of 5 methylation locus on the 50 samples data(samplemethyl)

samplemelocus 13 samplemelocus Genomic locus of methylations Format Genomic location of 5 methylation loci located in the PRGA gene for 50 samples, which are matched with methylation level data provided in IMAS. The detailed overview of the data is described in the vignette. data(samplemethyllocus) A data frame with genomic locus of 5 methylations A data frame with genomic locus of 5 methylations data(samplemethyllocus) samplesnp Genotype data Format Source Genotype data of five SNPs located in the PRGA gene for 50 samples (half of whom are assigned as PR-positive and the other half PR-negative), which is used in analysis with IMAS. We generated a simulation data set of genotypes for each SNP. Among five SNPs, three are associated with PSI levels for 50 samples, while two SNPs are not. The detailed overview of the data is described in the vignette. data(samplesnp) A data frame with genotypes of 5 SNPs on the 50 samples A data frame with genotypes of 5 SNPs on the 50 samples The data was provided from IMAS

14 SplicingReads data(samplesnp) samplesnplocus Genomic locus of SNPs Format Genomic locus of five SNPs located in the PRGA gene for 50 samples, which are matched with SNP genotype data provided in IMAS. The detailed overview of the data is described in the vignette. data(samplesnplocus) A data frame with genomic locus of 5 SNPs A data frame with genomic locus of 5 SNPs data(samplesnplocus) SplicingReads Count a junction and paired-end reads This function counts the reads that are mapped to two separate exons, mapped to either splice site of two exons (called junction reads) or within each of two exons (paired end reads). Arguments SplicingReads(bamfile=NULL,test.exon=NULL,spli.jun=NULL,e.ran=NULL, SNPchr=NULL,readsinfo="paired",inse=40) bamfile test.exon spli.jun e.ran SNPchr readsinfo inse A path of mapped bamfile. A data frame containing an alternative target exon and their neighboring exons. A data frame containing spliced junction information. A range for parsing reads from a bamfile. A chromosome number Information of RNA-seq types (single- or paired- end reads). An insert size

SplicingReads 15 This function returns the list object providing counts the reads that are mapped to two separate exons, mapped to either splice site of two exons (called junction reads) or within each of two exons (paired end reads). Author(s) Seonggyun Han, Younghee Lee data(bamfilestest) ext.dir <- system.file("extdata", package="imas") samplebamfiles[,"path"] <- paste(ext.dir,"/samplebam/",samplebamfiles[,"path"],".bam",sep="") sampledb <- system.file("extdata", "sampledb", package="imas") transdb <- loaddb(sampledb) ## Not run: ASdb <- Splicingfinder(transdb,Ncor=1) ASdb <- ExonsCluster(ASdb,transdb) bamfiles <- rbind(samplebamfiles[,"path"]) Total.splicingInfo <- ASdb@SplicingModel$"ES" each.es.re <- rbind(es.fi.result[es.fi.result[,"index"] == "ES3",]) each.ranges <- rbind(unique(cbind(do.call(rbind,strsplit(each.es.re[,"downex"],"-"))[,1], do.call(rbind,strsplit(each.es.re[,"upex"],"-"))[,2]))) group.1.spl <- c(split.splice(each.es.re[,"do_des"],each.es.re[,"1st_des"]), split.splice(each.es.re[,"1st_des"],each.es.re[,"up_des"])) group.2.spl <- split.splice(each.es.re[,"do_des"],each.es.re[,"up_des"]) total.reads <- SplicingReads(bamfiles[1],each.ES.re[,c("DownEX","1stEX","UpEX")], c(group.1.spl,group.2.spl),each.ranges,each.es.re[,"nchr"],"paired") ## End(Not run)

Index Topic datasets Clinical.data, 3 GroupSam, 8 samplebamfiles, 11 samplemedata, 12 samplemelocus, 13 samplesnp, 13 samplesnplocus, 14 Topic package IMAS-package, 2 ASvisualization, 2 Clinical.data, 3 ClinicAnalysis, 2, 4, 4 CompGroupAlt, 2, 5, 6 exonsby, 2 ExonsCluster, 7 GroupSam, 8 IMAS (IMAS-package), 2 IMAS-package, 2 kmeans, 4, 5 lm, 5, 6, 8, 9 MEsQTLFinder, 2, 8, 9 RatioFromFPKM, 4, 6, 8 RatioFromReads, 2, 10, 10 samplebamfiles, 11 samplemedata, 12 samplemelocus, 13 samplesnp, 13 samplesnplocus, 14 Splicingfinder, 4, 6 8, 10 SplicingReads, 11, 14 survdiff, 4, 5 survfit, 5 16