Supervised analysis of MS images using Cardinal

Size: px
Start display at page:

Download "Supervised analysis of MS images using Cardinal"

Transcription

1 Supervised analsis of MS images using Cardinal Klie A. Bemis and April Harr November, 28 Contents Introduction Analsis of a renal cell carcinoma (RCC) dataset Pre-processing Normalization Resampling to unit resolution Subsetting the dataset Visualizing the dataset Visualization of molecular ion images Eplorator analsis using PCA Classification using PLS-DA Cross-validation with partial least squares Plotting the classified images Plotting and interpretting the coefficients of the m/z values Classification using O-PLS-DA Cross-validation with partial least squares Plotting the classified images Plotting and interpretting the coefficients of the m/z values Classification using spatial shrunken centroids Cross-validation with spatial shrunken centroids Plotting the classified images Plotting and interpreting the t-statistics of the m/z values Session info Introduction For eperiments in which analzed samples come from different classes or conditions, a common goal of supervised analsis is to predict the class of a new sample, given a labeled training set for which classes are alread known. This task is called classification. Unlike unsupervised analsis such as clustering, classification requires biological replicates for testing and validation, to avoid biased reporting of accurac. Cardinal implements crossvalidation for classification.

2 Supervised analsis of MS images using Cardinal In this vignette, an eample classification workflow in Cardinal is presented, together with plots of the results. 2 Analsis of a renal cell carcinoma (RCC) dataset This eample uses a renal cell carcinoma (RCC) dataset consisting of 8 matched pairs of human kidne tissue. Each tissue pair consists of a tissue sample and a ous tissue sample. The goal of the workflow is to develop classifiers for predicting whether a new tissue sample is or. > librar(cardinalworkflows) > data(rcc, rcc_analses) In this RCC dataset, we epect that tissue and ous tissue will have unique chemical profiles, which we can use to classif new tissue based on the mass spectra. In Figure we show the H&E stained tissue samples. In Figure i we plot the ion images for m/z 8.5, which we know from previous studies to be abundant in approimatel equal intensities in both ous and tissue []. > image(rcc, mz = 8.5, ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian", laout = c(4, 2)) > summar(rcc) Class: MSImageSet Features: m/z = m/z = (2 total) Piels: =, = 3, sample = MH24_33... = 77, = 5, sample = UH992_ (6 total) : : Size in memor: Mb As can be seen in Figure, each matched pair of tissues belonging to the same subject are on the same slide. Note also the the tissue is on the left and the tissue is on the right on each slide. The image contains 6 piels with 2 spectral features measured at each location (m/z range from 5 to ). 2. Pre-processing For statistical analsis, some form of dimension reduction is necessar so that computation times are reasonable. However, the usual form of dimension reduction for mass spectra peak-picking is often unsuitable for classification. This is because classification requires testing and validation to avoid bias in the reported accurac. If we perform peak-picking on the whole dataset, then the accurac reported for the validation set will be biased, because the selected peaks are also coming from the validation set. 2

3 Supervised analsis of MS images using Cardinal 8 m/z = 8.5 UH99_ m/z = 8.5 UH992_ m/z = 8.5 UH995_ m/z = 8.5 UH982_ m/z = 8.5 UH96_ m/z = 8.5 UH7_ m/z = 8.5 UH55_2 2 (h) UH992_ (g) UH99_5 2 m/z = 8.5 MH24_33 (d) UH96_5 2 (f) UH995_8 (c) UH7_33 2 (e) UH982_3 2 (b) UH55_2 2 (a) MH24_ (i) Corresponding ion images for m/z 8.5 Figure : Optical images and ion images for the eight samples showing general morphol- og Therefore, we recommend resampling or binning as the preferred method of dimension reduction for classification workflows. We will use resampling. 2.. Normalization Before resampling or binning, ization is necessar to correct for piel-to-piel variation. We will use total ion current (TIC) standardization, which is a popular choice for mass spectrometr imaging datasets. > rcc.norm <- ize(rcc, method = "tic") 2..2 Resampling to unit resolution The ized data is then resampled to unit resolution. Binning would also be an appropriate alternative, and could be used b setting method="bin" in the reducedimension method. 3

4 Supervised analsis of MS images using Cardinal > rcc.resample <- reducedimension(rcc.norm, method = "resample") As discussed above, resampling or binning is preferred to peak-picking for classification. However, if peak-picking is preferred, this can be worked around b performing peak-picking separatel on the training set onl, and using the same peaks in the testing and validation sets. This can become a comple procedure if cross-validation is desired, and will not be covered in this vignette Subsetting the dataset Lastl, we will subset the dataset to drop piels that contain onl the slide background, so that the final dataset will onl consist of mass spectra from actual tissue. To subset the data, we will use the diagnosis variable stored in the object s pieldata. This variable is a factor with the disease condition for each piel, as annotated b a pathologist. > summar(rcc$diagnosis) NA's We drop the 9923 piels without annotation. > rcc.small <- rcc.resample[, rcc$diagnosis %in% c("", "")] > summar(rcc.small) Class: MSImageSet Features: m/z = 5... m/z = (85 total) Piels: = 7, = 5, sample = MH24_33... = 6, = : : Size in memor: 4.7 Mb 6, sample = UH992_ (677 total) Now the dataset contains onl the 677 mass spectra we need to train and test a classifier. 2.2 Visualizing the dataset In this section, we will walk through several visualization methods to eplore the dataset before training our classifiers Visualization of molecular ion images To begin visualizing the dataset, we will plot ion images for m/z values we alread know to be useful in distinguishing tissue versus. First, we plot the ion images for m/z 25.3, known to be more abundant in tissue (right) [], shown in Figure 2a. > image(rcc, mz = 25.3, ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian", laout = c(4, 2)) 4

5 Supervised analsis of MS images using Cardinal Likewise, we plot the ion images for m/z 885.7, known to be more abundant in ous tissue (left) [], shown in Figure 2b. > image(rcc, mz = 885.7, ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian", laout = c(4, 2)) m/z = MH24_ m/z = UH55_ m/z = UH7_ m/z = UH96_ m/z = UH982_ m/z = UH995_ m/z = UH99_ m/z = UH992_ (a) m/z 25.3 (more abundant in tissue) m/z = MH24_ m/z = UH55_ m/z = UH7_ m/z = UH96_ m/z = UH982_ m/z = UH995_ m/z = UH99_ m/z = UH992_ (b) m/z (more abundant in ous tissue) Figure 2: Ion images showing ions associated with and ous tissue From Figure 2a and Figure 2b, we note that there is still a great deal of variation in these images for ions that should be associated with a particular disease condition. For eample, m/z 25.3 which should be more abundant in tissue is also abundant in ous tissue for samples UH55_2 and UH995_8. This shows that multiple ions will be necessar for classification Eplorator analsis using PCA Although man use principal components analsis (PCA) combined with linear regression for classification, it is a method for unsupervised analsis most appropriatel used for eploring a dataset, prior to appling a method designed for classification. We will use PCA for visualization. Here we fit the first 5 principal components using the PCA method. 5

6 Supervised analsis of MS images using Cardinal > rcc.pca <- PCA(rcc.small, ) > summar(rcc.pca) PC PC2 PC3 PC4 PC5 Standard deviation The summar of the first 5 principal components show that PCA is not ver useful for this dataset, since the first 5 components cumulativel eplain approimatel onl 5% of the variation in the data. To further eplore the dataset with PCA, we plot images of the scores of the first principal component, shown in Figure 3. > image(rcc.pca, column = "PC", superpose = FALSE, col.regions = risk.colors(), + laout = c(4, 2)) PC MH24_ PC UH55_ PC UH7_ PC UH96_ PC UH982_ PC UH995_ PC UH99_ PC UH992_ Figure 3: PC scores for the first principal component Figure 3 show that the images based on the PC scores do not seem to show a strong pattern useful for classification of versus tissue, although tissue seems to have slightl higher PC scores. We also plot the PC loadings for the first 2 principal components, shown in Figure 5. > plot(rcc.pca, column = c("pc", "PC2", "PC3"), superpose = FALSE, laout = c(3, + )) loadings PC loadings PC2 loadings PC m z m z m z Figure 4: PC loadings for the first three principal components 6

7 Supervised analsis of MS images using Cardinal Another useful PCA plot in a classification setting is to plot the scores of different components against each other, plotting each class separatel, which we do below for disease condition. > pca. <- as.data.frame(rcc.pca[[]]$scores[rcc.small$diagnosis == "", + ]) > pca. <- as.data.frame(rcc.pca[[]]$scores[rcc.small$diagnosis == "", + ]) We show PC versus PC2 in Figure 5a. > plot(pc2 ~ PC, data = pca., col = "blue") > points(pc2 ~ PC, data = pca., col = "red") > legend("top", legend = c("", ""), col = c("blue", "red"), pch =, + bg = rgb(,,,.75)) Now PC versus PC3 in Figure 5b. > plot(pc3 ~ PC, data = pca., col = "blue") > points(pc3 ~ PC, data = pca., col = "red") > legend("top", legend = c("", ""), col = c("blue", "red"), pch =, + bg = rgb(,,,.75)) And PC2 versus PC3 in Figure 5c. > plot(pc3 ~ PC2, data = pca., col = "blue") > points(pc3 ~ PC2, data = pca., col = "red") > legend("top", legend = c("", ""), col = c("blue", "red"), pch =, + bg = rgb(,,,.75)) PC PC2 (a) PC versus PC2 scores PC PC3 (b) PC versus PC3 scores PC2 PC3 (c) PC2 versus PC3 scores Figure 5: PCA plots showing PC scores according to disease condition The PC score plots shown in Figure 5 show that there is indeed separation between the disease conditions in the data. However, it is often difficult to interpret the comple relationship between PC loadings and how the PC scores relate to condition. Therefore, we will now move on to methods designed for classification. 2.3 Classification using PLS-DA Partial least squares discriminant analsis (PLS-DA) also known as projection to latent structures is a multivariate method that has been shown to be useful in the classification of MS images []. We will now demonstrate classification of the RCC dataset using PLS-DA. 7

8 Supervised analsis of MS images using Cardinal Note that although we show PLS-DA prediction on two conditions ( and ), it can also be used for classification on more than two conditions Cross-validation with partial least squares An important step in classification is testing and validation. If the accurac of a classifier is tested on the same dataset that was used to train the classifier, the reported accurac will be biased and too optimistic. Therefore, Cardinal implements the cvappl method, which performs cross-validation for an of the supplied classification methods, including PLS. See?cvAppl for further details on how to use cross-validation in Cardinal. B default, cvappl considers each unique sample (as given b the sample variable in an MSImageSet object s pieldata) as a fold for n-fold cross-validation. In most cases, these should correspond to biological replicates, which is our recommended workflow. This is the case for the RCC dataset, where each matched pair on a separate slide constitutes a unique sample. > summar(rcc.small$sample) MH24_33 UH55_2 UH7_33 UH96_5 UH982_3 UH995_8 UH99_5 UH992_ Generall, biological replicates should be used to partition the dataset rather than technical replicates or individual piels. The onl eception would be in the case of a sample size of one, in which case there are no biological replicates. However, a sample size of one is a worst case scenario, and biological replicates should alwas be preferred. We now perform cross-validation using PLS-DA as our classification method, using from to 5 PLS components. > rcc.cv.pls <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "PLS", ncomp = :5) We plot the cross-validated accurac to determine the best number of components for prediction, shown in Figure 6. > plot(summar(rcc.cv.pls)) As seen in Figure 6, PLS components produce the best prediction rate, with 96.8% crossvalidated accurac. > summar(rcc.cv.pls)$accurac[["ncomp = "]] Accurac Sensitivit Specificit FDR Figure 6 tells us that if we wanted to use PLS-DA for prediction on new data, we should train a classifier on this data using PLS components. Note that rcc.cv.pls is a CrossValidated object, which contains 8 objects in its resultdata slot one for each cross-validation fold each of which is a PLS object containing the results of prediction for that fold. 8

9 Supervised analsis of MS images using Cardinal Accurac Number of Components Figure 6: Accurac of PLS-DA classification for number of components used CrossValidated inherits from ResultSet. See?ResultSet for details Plotting the classified images Now we plot the images of the PLS-DA fitted values, to visualize the cross-validated prediction rate, shown in Figure 7. > image(rcc.cv.pls, model = list(ncomp = ), laout = c(4, 2)) ncomp = MH24_ ncomp = UH55_ ncomp = UH7_ ncomp = UH96_ ncomp = UH982_ ncomp = UH995_ ncomp = UH99_ ncomp = UH992_ Figure 7: PLS-DA fitted values indicating or tissue For prediction, PLS-DA creates indicator variables (with values or ) for each condition. The predicted condition is the one with the highest fitted value. (Since the fitted values can fall outside the range to, these are not interprettable as probabilities.) Figure 7 shows the tissues on the left are more predominantl red than blue, indicating that the are predicted to be, which corresponds with the true disease conditions. Since we used cross-validation, the prediction on each matched pair is based onl on the data from the other 7 matched pairs of tissue samples Plotting and interpretting the coefficients of the m/z values To interpret the relative importance of the mass features in the classification, we can look at the PLS coefficients used for prediction. To do this, we re-train a PLS classifier on the full dataset using the optical number of PLS components as indicated b the cross-validation. 9

10 Supervised analsis of MS images using Cardinal > rcc.pls <- PLS(rcc.small, = rcc.small$diagnosis, ncomp = ) Now we plot the PLS coefficients against the m/z values, shown in Figure 8. > plot(rcc.pls) coefficients ncomp = m z Figure 8: PLS coefficients for and We can also rank the most important for each condition, based on the PLS coefficients, b using the toplabels method. > toplabels(rcc.pls) mz ncomp column coefficients loadings weights e e e e e e- Among the top-ranked m/z values, we see m/z 25 is listed for tissue, which we know to be abundant in tissue. The top-ranked ion for is m/z 75, which we plot below in Figure 9. > image(rcc.small, mz = 75, laout = c(4, 2), ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian") m/z = 75 MH24_ m/z = 75 UH55_ m/z = 75 UH7_ m/z = 75 UH96_ m/z = 75 UH982_ m/z = 75 UH995_ m/z = 75 UH99_ m/z = 75 UH992_ Figure 9: m/ 75 (identified b PLS as associated with )

11 Supervised analsis of MS images using Cardinal We see from Figure 9 that m/z 75 is indeed more abundant in the tissue. Although the PLS coefficients are useful for ranking the relative important of m/z values, the are not indicative of statistical significance. 2.4 Classification using O-PLS-DA Orthogonal partial least squares discriminant analsis (O-PLS-DA) is another multivariate method that can be useful for classification of MS images. It is related to PLS, but it removes from the data a number of PLS components orthogonal to the relationship between the data and condition prior to fitting a -component PLS model. O-PLS-DA can often produce comparable accurac to PLS-DA, but with more stable and easil interprettable coefficients Cross-validation with partial least squares We now use cross-validation to fit O-PLS-DA models for the RCC dataset. > rcc.cv.opls <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "OPLS", ncomp = :5, + keep.xnew = FALSE) > plot(summar(rcc.cv.opls)) As seen in Figure, 2 O-PLS components produce the best prediction rate, with 95.7% cross-validated accurac. > summar(rcc.cv.pls)$accurac[["ncomp = 2"]] Accurac Sensitivit Specificit FDR Accurac Number of Components Figure : Accurac of O-PLS-DA classification for number of components used Plotting the classified images As with PLS-DA, we now plot the images for the O-PLS-DA fitted values, to visuall show the predictions, shown in Figure.

12 Supervised analsis of MS images using Cardinal > image(rcc.cv.opls, model = list(ncomp = 2), laout = c(4, 2)) ncomp = 2 MH24_ ncomp = 2 UH55_ ncomp = 2 UH7_ ncomp = 2 UH96_ ncomp = 2 UH982_ ncomp = 2 UH995_ ncomp = 2 UH99_ ncomp = 2 UH992_ Figure : O-PLS-DA fitted values indicating or tissue Figure shows good qualit prediction comparable with PLS-DA Plotting and interpretting the coefficients of the m/z values Now we consider the O-PLS coefficients b training a classifier on the full dataset with the optimal number of O-PLS components as shown b cross-validation. > rcc.opls <- OPLS(rcc.small, = rcc.small$diagnosis, ncomp = 2, keep.xnew = FALSE) And we plot the O-PLS coefficients, as shown in Figure 2. > plot(rcc.opls) ncomp = 2 coefficients m z Figure 2: O-PLS coefficients for and A comparison of the O-PLS coefficients in Figure 2 with the PLS coefficients from Figure 8 shows that the O-PLS coefficients appear more stable and should be easier to interpret. We get the top-ranked m/z values using toplabels > toplabels(rcc.opls) mz ncomp column coefficients loadings Oloadings weights Oweights

13 Supervised analsis of MS images using Cardinal The O-PLS coefficients rank m/z 886 highl for, which we know to be more abundant in the ous tissue. As with the PLS coefficients, the ion at m/z 25, which we know to be more abundant in tissue, is also highl ranked for. 2.5 Classification using spatial shrunken centroids This section demonstrates the spatial shrunken centroids classification method for statistical analsis we introduce in Cardinal in the spatialshrunkencentroids method. In this method, we adapt the nearest shrunken centroids classifier [2] with spatial smoothing. This method uses statistical regularization to shrink each condition s mean spectrum toward the global mean spectrum. This shrinkage allows automated feature selection of important masses. It then classifies piels b comparing their mass spectra to the shrunken mean spectra of each conditions. The spatial smoothing uses weights adapted from spatiall-aware clustering [3], including Gaussian weights, and adaptive weights that attempt to account for local structure. The parameters to be eplicitl provided in the spatialshrunkencentroids method are: r: The neighborhood smoothing radius s: The shrinkage parameter The s parameter is the shrinkage parameter that enforces sparsit. As s increases, fewer mass features (m/z values) will be used b the classifier, and onl the informative mass features will be retained. For a detailed eplanation of the shrinkage parameter s, see [2] and [4]. Clustering can also be performed if no response variable is given, b providing an additional parameter k for the initial number of clusters. See the clustering workflow for details Cross-validation with spatial shrunken centroids Now we perform cross-validation with spatial shrunken centroids classification and the method="gaussian" weights. > rcc.cv.sscg <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "spatialshrunkencentroids", + method = "gaussian", r = c(, 2, 3), s = c(, 4, 8, 2, 6, 2, 24, 28)) And we perform cross-validation with spatial shrunken centroids classification and the method="adaptive" weights. > rcc.cv.ssca <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "spatialshrunkencentroids", + method = "adaptive", r = c(, 2, 3), s = c(, 4, 8, 2, 6, 2, 24, 28)) Now we plot the cross-validated accurac for the classifier with Gaussian weights in Figure 3a and adaptive weights in Figure 3b. 3

14 Supervised analsis of MS images using Cardinal > plot(summar(rcc.cv.sscg)) > plot(summar(rcc.cv.ssca)) Accurac r =, k = 2 r = 2, k = 2 r = 3, k = 2 Accurac r =, k = 2 r = 2, k = 2 r = 3, k = Shrinkage parameter (s) (a) Accurac for Gaussian weights Shrinkage parameter (s) (b) Accurac for adaptive weights Figure 3: Plots of accurac for spatial shrunken centroids, showing highest accurac for s = 2 and r = 3 As shown in Figure 3, for both weight tpes and all smoothing radii r, the highest accurac occurs with a shrinkage parameter s = 2, ecept for the case with adaptive weights and r =, for which the highest accurac occurs at s = 6. For Gaussian weights with r = 3, s = 2, accurac was 88.8%. Note that in general, the accurac increases with larger smoothing neighborhood radii r. This is likel because rather than heterogenous samples with both and ous cells on the same tissue, each tissue is relativel homogenous with predominantl or ous cells. Therefore, greater spatial smoothing increases the accurac, and adaptive weights have no advantage over Gaussian weights. For classification on more heterogenous tissue, adaptive weights ma perform better Plotting the classified images Now we plot the classified images for Gaussian weights in Figure 4a and adaptive weights in Figure 4b. > image(rcc.cv.sscg, model = list(r = 3, s = 2), laout = c(4, 2)) > image(rcc.cv.ssca, model = list(r = 3, s = 2), laout = c(4, 2)) Unlike PLS-DA and O-PLS-DA, spatial shrunken centroids produce probabilities of versus, which we plot using higher opacit for higher probabilit. This makes for more interpretable predicted images Plotting and interpreting the t-statistics of the m/z values A major advantage of spatial shrunken centroids is that it provides t-statistics for each feature for each condition, and it uses statistical regularization to perform automatic feature selection. This allows for easier identification of important m/z values, and a more straightforward and interpretable method of ranking their relative importance. 4

15 Supervised analsis of MS images using Cardinal r = 3, k = 2, s = 2 MH24_ r = 3, k = 2, s = 2 UH55_ r = 3, k = 2, s = 2 UH7_ r = 3, k = 2, s = 2 UH96_ r = 3, k = 2, s = 2 UH982_ r = 3, k = 2, s = 2 UH995_ r = 3, k = 2, s = 2 UH99_ r = 3, k = 2, s = 2 UH992_ (a) Probabilities for Gaussian weights r = 3, k = 2, s = 2 MH24_ r = 3, k = 2, s = 2 UH55_ r = 3, k = 2, s = 2 UH7_ r = 3, k = 2, s = 2 UH96_ r = 3, k = 2, s = 2 UH982_ r = 3, k = 2, s = 2 UH995_ r = 3, k = 2, s = 2 UH99_ r = 3, k = 2, s = 2 UH992_ (b) Probabilities for adaptive weights Figure 4: Predicted probabilities of and, with higher opacit for a condition s color indicating higher probabilit To inspect the t-statistics of the m/z values, we now train classifiers on the full dataset using the parameters r = 3, s = 2. > rcc.sscg <- spatialshrunkencentroids(rcc.small, = rcc.small$diagnosis, r = 3, + s = 2, method = "gaussian") > rcc.ssca <- spatialshrunkencentroids(rcc.small, = rcc.small$diagnosis, r = 3, + s = 2, method = "adaptive") Now we plot the t-statistics, shown for Gaussian weights in Figure 5a and for adaptive weights in Figure 5b. > plot(rcc.sscg, mode = "tstatistics", model = list(r = 3, s = 2)) > plot(rcc.ssca, mode = "tstatistics", model = list(r = 3, s = 2)) As seen in Figure 5a and Figure 5b, onl a few m/z values have non-zero t-statistics. > summar(rcc.sscg) r k s method time Predicted # of Classes Mean # of Features per Class 5

16 Supervised analsis of MS images using Cardinal tstatistics 2 2 r = 3, k = 2, s = 2 tstatistics 2 2 r = 3, k = 2, s = m z m z (a) Shrunken t-statistics for Gaussian weights (b) Shrunken t-statistics for adaptive weights Figure 5: Predicted probabilities of and, with higher opacit for a condition s color indicating higher probabilit gaussian > summar(rcc.ssca) r k s method time Predicted # of Classes Mean # of Features per Class adaptive In fact, onl 4 of 85 mass features are used in the spatial shrunken centroids classifier. We identif the top-ranked mass features using the toplabels method. > toplabels(rcc.sscg) mz r k s classes centers tstatistics p.values adj.p.values > toplabels(rcc.ssca) mz r k s classes centers tstatistics p.values adj.p.values Note that the shrunken t-statistics are identical between the Gaussian and adaptive weights, since the are based on the same training data, and the spatial structure is taken into account onl for the predicted probabilities. Spatial shrunken centroids identified m/z 25, m/z 886, and m/z 8, which were also identified b O-PLS-DA, and are all known to be important (as discussed in Section 2.2.), as well as m/z 75, which was identified b PLS-DA. Of the three methods, spatial shrunken centroids most reliabl selected the m/z values known to be associated with and, in addition to selecting potentiall informative new m/z values. 6

17 Supervised analsis of MS images using Cardinal 3 Session info R version 3.5. Patched ( r74967), 86_64-pc-linu-gnu Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C Running under: Ubuntu LTS Matri products: default BLAS: /home/biocbuild/bbs-3.8-bioc/r/lib/librblas.so LAPACK: /home/biocbuild/bbs-3.8-bioc/r/lib/librlapack.so Base packages: base, datasets, grdevices, graphics, methods, parallel, stats, stats4, utils Other packages: BiocGenerics.28., BiocParallel.6., Cardinal 2.., CardinalWorkflows.4., EBImage 4.24., ProtGenerics.4., S4Vectors.2. Loaded via a namespace (and not attached): Biobase 2.42., BiocManager.3.3, BiocStle 2.., DBI.., MASS 7.3-5, Matri.2-4, R6 2.3., RCurl.95-4., Rcpp.2.9, abind.4-5, assertthat.2., backports..2, biglm.9-, bindr.., bindrcpp.2.2, bitops.-6, compiler 3.5., craon.3.4, digest.6.8, dplr.7.7, evaluate.2, fftwtools.9-8, glue.3., grid 3.5., htmltools.3.6, htmlwidgets.3, irlba 2.3.2, jpeg.-8, knitr.2, lattice.2-35, locfit.5-9., magrittr.5, matter.8., pillar.3., pkgconfig 2..2, png.-7, purrr.2.5, rlang.3.., rmarkdown., rprojroot.3-2, signal.7-6, sp.3-, tibble.4.2, tidselect.2.5, tiff.-5, tools 3.5., aml 2.2. References [] A. L. Dill, L. S. Eberlin, C. Zheng, A. B. Costa, D. R. Ifa, L. Cheng, T. A. Masterson, M. O. Koch, O. Vitek, and R. G. Cooks. Multivariate statistical differentiation of renal cell carcinomas based on lipidomic analsis b ambient ionization imaging mass spectrometr. Analtical and Bioanaltical Chemistr, 398:2969, 2. [2] Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Diagnosis of multiple tpes b shrunken centroids of gene epression. Proceedings of the National Academ of Sciences of the United States of America, 99(): , 22. [3] Theodore Aleandrov and Jan Hendrik Kobarg. Efficient spatial segmentation of large imaging mass spectrometr datasets with spatiall aware clustering. Bioinformatics (Oford, England), 27(3):i23 8, Jul 2. URL: gov/articlerender.fcgi?artid=37346&tool=pmcentrez&rendertpe=abstract, doi:.93/bioinformatics/btr246. [4] Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Class Prediction b Nearest Shrunken with Applications to DNA Microarras. Statistical Science, 8():4 7, 23. 7

Cardinal: Analytic tools for mass spectrometry

Cardinal: Analytic tools for mass spectrometry Cardinal: Analtic tools for mass spectrometr imaging Klie A. Bemis and April Harr April 30, 2018 Contents 1 Introduction.............................. 2 2 Input/Output.............................. 4 2.1

More information

Cardinal: Analytic tools for mass spectrometry imaging

Cardinal: Analytic tools for mass spectrometry imaging Cardinal: Analtic tools for mass spectrometr imaging Kle D. Bemis and April Harr Ma 3, 2016 Contents 1 Introduction 2 2 Input/Output 3 2.1 Input.................................................... 3 2.1.1

More information

GeneOverlap: An R package to test and visualize

GeneOverlap: An R package to test and visualize GeneOverlap: An R package to test and visualize gene overlaps Li Shen Contact: li.shen@mssm.edu or shenli.sam@gmail.com Icahn School of Medicine at Mount Sinai New York, New York http://shenlab-sinai.github.io/shenlab-sinai/

More information

AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype

AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype AIMS: Absolute Assignment of Breast Cancer Intrinsic Molecular Subtype Eric R. Paquet (eric.r.paquet@gmail.com), Michael T. Hallett (michael.t.hallett@mcgill.ca) 1 1 Department of Biochemistry, Breast

More information

Introduction to antiprofiles

Introduction to antiprofiles Introduction to antiprofiles Héctor Corrada Bravo hcorrada@gmail.com Modified: March 13, 2013. Compiled: April 30, 2018 Introduction This package implements the gene expression anti-profiles method in

More information

MethylMix An R package for identifying DNA methylation driven genes

MethylMix An R package for identifying DNA methylation driven genes MethylMix An R package for identifying DNA methylation driven genes Olivier Gevaert May 3, 2016 Stanford Center for Biomedical Informatics Department of Medicine 1265 Welch Road Stanford CA, 94305-5479

More information

metaseq: Meta-analysis of RNA-seq count data

metaseq: Meta-analysis of RNA-seq count data metaseq: Meta-analysis of RNA-seq count data Koki Tsuyuzaki 1, and Itoshi Nikaido 2. October 30, 2017 1 Department of Medical and Life Science, Tokyo University of Science. 2 Bioinformatics Research Unit,

More information

Using Messina. Mark Pinese. October 13, Introduction The problem Example: Designing a colon cancer screening test...

Using Messina. Mark Pinese. October 13, Introduction The problem Example: Designing a colon cancer screening test... Using Messina Mark Pinese October 13, 2014 Contents 1 Introduction 1 2 Using Messina to construct optimal diagnostic classifiers 1 2.1 The problem.................................................. 1 2.2

More information

Applying Metab. Raphael Aggio. October 30, This document describes how to use the function included in the R package Metab.

Applying Metab. Raphael Aggio. October 30, This document describes how to use the function included in the R package Metab. Applying Metab Raphael Aggio October 30, 2018 Introduction This document describes how to use the function included in the R package Metab. 1 Requirements Metab requires 3 packages: xcms, svdialogs and

More information

splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data

splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data splicer: An R package for classification of alternative splicing and prediction of coding potential from RNA-seq data Kristoffer Knudsen, Johannes Waage 5 Dec 2013 1 Contents 1 Introduction 3 1.1 Alternative

More information

IAS 3.9 Bivariate Data

IAS 3.9 Bivariate Data Year 13 Mathematics IAS 3.9 Bivariate Data Robert Lakeland & Carl Nugent Contents Achievement Standard.................................................. 2 Bivariate Data..........................................................

More information

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Diagnosis of multiple cancer types by shrunken centroids of gene expression Diagnosis of multiple cancer types by shrunken centroids of gene expression Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu PNAS 99:10:6567-6572, 14 May 2002 Nearest Centroid

More information

September 20, 2017 SIMULATED DATA

September 20, 2017 SIMULATED DATA Tables and Figures for manuscript for Virologic Suppression and Mortality in Older HIV subjects in a Multisite Latin American and the Caribbean Cohort manuscript September 20, 2017 : RESULTS ARE NOT TO

More information

Infer mirna-mrna interactions using paired expression data from a single sample

Infer mirna-mrna interactions using paired expression data from a single sample Infer mirna-mrna interactions using paired expression data from a single sample Yue Li yueli@cs.toronto.edu October 0, 0 Introduction MicroRNAs (mirnas) are small ( nucleotides) RNA molecules that base-pair

More information

TITAN: Subclonal copy number and LOH prediction from whole genome sequencing of tumours

TITAN: Subclonal copy number and LOH prediction from whole genome sequencing of tumours TITAN: Subclonal copy number and LOH prediction from whole genome sequencing of tumours Gavin Ha October 30, 2017 Contents 1 Introduction 1 2 Analysis Workflow Overview 2 3 Model training and parameter

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Using the DART package: Denoising Algorithm based on Relevance network Topology

Using the DART package: Denoising Algorithm based on Relevance network Topology Using the DART package: Denoising Algorithm based on Relevance network Topology Katherine Lawler, Yan Jiao, Andrew E Teschendorff, Charles Shijie Zheng October 30, 2018 Contents 1 Introduction 1 2 Load

More information

Application Note # MT-111 Concise Interpretation of MALDI Imaging Data by Probabilistic Latent Semantic Analysis (plsa)

Application Note # MT-111 Concise Interpretation of MALDI Imaging Data by Probabilistic Latent Semantic Analysis (plsa) Application Note # MT-111 Concise Interpretation of MALDI Imaging Data by Probabilistic Latent Semantic Analysis (plsa) MALDI imaging datasets can be very information-rich, containing hundreds of mass

More information

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang

Classification. Methods Course: Gene Expression Data Analysis -Day Five. Rainer Spang Classification Methods Course: Gene Expression Data Analysis -Day Five Rainer Spang Ms. Smith DNA Chip of Ms. Smith Expression profile of Ms. Smith Ms. Smith 30.000 properties of Ms. Smith The expression

More information

Colon cancer subtypes from gene expression data

Colon cancer subtypes from gene expression data Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et

More information

Digitizing the Proteomes From Big Tissue Biobanks

Digitizing the Proteomes From Big Tissue Biobanks Digitizing the Proteomes From Big Tissue Biobanks Analyzing 24 Proteomes Per Day by Microflow SWATH Acquisition and Spectronaut Pulsar Analysis Jan Muntel 1, Nick Morrice 2, Roland M. Bruderer 1, Lukas

More information

Utility of neurological smears for intrasurgical brain cancer diagnostics and tumour cell percentage by DESI-MS

Utility of neurological smears for intrasurgical brain cancer diagnostics and tumour cell percentage by DESI-MS Electronic Supplementary Material (ESI) for Analyst. This journal is The Royal Society of Chemistry 2017 Utility of neurological smears for intrasurgical brain cancer diagnostics and tumour cell percentage

More information

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering Kunal Sharma CS 4641 Machine Learning Abstract Clustering is a technique that is commonly used in unsupervised

More information

Comparing Two ROC Curves Independent Groups Design

Comparing Two ROC Curves Independent Groups Design Chapter 548 Comparing Two ROC Curves Independent Groups Design Introduction This procedure is used to compare two ROC curves generated from data from two independent groups. In addition to producing a

More information

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based Package MSstatsTMT February 26, 2019 Title Protein Significance Analysis in shotgun mass spectrometry-based proteomic experiments with tandem mass tag (TMT) labeling Version 1.1.2 Date 2019-02-25 Tools

More information

Package MethPed. September 1, 2018

Package MethPed. September 1, 2018 Type Package Version 1.8.0 Date 2016-01-01 Package MethPed September 1, 2018 Title A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes Depends R (>= 3.0.0), Biobase

More information

ROC Curves (Old Version)

ROC Curves (Old Version) Chapter 545 ROC Curves (Old Version) Introduction This procedure generates both binormal and empirical (nonparametric) ROC curves. It computes comparative measures such as the whole, and partial, area

More information

Automating Mass Spectrometry-Based Quantitative Glycomics using Tandem Mass Tag (TMT) Reagents with SimGlycan

Automating Mass Spectrometry-Based Quantitative Glycomics using Tandem Mass Tag (TMT) Reagents with SimGlycan PREMIER Biosoft Automating Mass Spectrometry-Based Quantitative Glycomics using Tandem Mass Tag (TMT) Reagents with SimGlycan Ne uaca2-3galb1-4glc NAcb1 6 Gal NAca -Thr 3 Ne uaca2-3galb1 Ningombam Sanjib

More information

Data Independent MALDI Imaging HDMS E for Visualization and Identification of Lipids Directly from a Single Tissue Section

Data Independent MALDI Imaging HDMS E for Visualization and Identification of Lipids Directly from a Single Tissue Section Data Independent MALDI Imaging HDMS E for Visualization and Identification of Lipids Directly from a Single Tissue Section Emmanuelle Claude, Mark Towers, and Kieran Neeson Waters Corporation, Manchester,

More information

The use of random projections for the analysis of mass spectrometry imaging data Palmer, Andrew; Bunch, Josephine; Styles, Iain

The use of random projections for the analysis of mass spectrometry imaging data Palmer, Andrew; Bunch, Josephine; Styles, Iain The use of random projections for the analysis of mass spectrometry imaging data Palmer, Andrew; Bunch, Josephine; Styles, Iain DOI: 10.1007/s13361-014-1024-7 Citation for published version (Harvard):

More information

A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization

A measure of the impact of CV incompleteness on prediction error estimation with application to PCA and normalization Hornung et al. BMC Medical Research Methodology () :95 DOI 10.16/s4-0-0088-9 RESEARCH ARTICLE Open Access A measure of the impact of CV incompleteness on prediction error estimation with application to

More information

SUPPLEMENTARY APPENDIX

SUPPLEMENTARY APPENDIX SUPPLEMENTARY APPENDIX 1) Supplemental Figure 1. Histopathologic Characteristics of the Tumors in the Discovery Cohort 2) Supplemental Figure 2. Incorporation of Normal Epidermal Melanocytic Signature

More information

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11

Article from. Forecasting and Futurism. Month Year July 2015 Issue Number 11 Article from Forecasting and Futurism Month Year July 2015 Issue Number 11 Calibrating Risk Score Model with Partial Credibility By Shea Parkes and Brad Armstrong Risk adjustment models are commonly used

More information

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-2, Issue-6, November-2014 Classification and Statistical Analysis of Auditory FMRI Data Using

More information

Molecular pathology with desorption electrospray ionization (DESI) - where we are and where we're going

Molecular pathology with desorption electrospray ionization (DESI) - where we are and where we're going Molecular pathology with desorption electrospray ionization (DESI) - where we are and where we're going Dr. Emrys Jones Waters Users Meeting ASMS 2015 May 30th 2015 Waters Corporation 1 Definition: Seek

More information

CHARACTERIZATION AND DETECTION OF OLIVE OIL ADULTERATIONS USING CHEMOMETRICS

CHARACTERIZATION AND DETECTION OF OLIVE OIL ADULTERATIONS USING CHEMOMETRICS CHARACTERIZATION AND DETECTION OF OLIVE OIL ADULTERATIONS USING CHEMOMETRICS Paul Silcock and Diana Uria Waters Corporation, Manchester, UK AIM To demonstrate the use of UPLC -TOF to characterize and detect

More information

Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection

Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection Y.-H. Wen, A. Bainbridge-Smith, A. B. Morris, Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection, Proceedings of Image and Vision Computing New Zealand 2007, pp. 132

More information

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion Yue Li yueli@cs.toronto.edu October 0, 2018 1 Introduction MicroRNAs (mirnas) are 22 nucleotide small noncoding

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Quality assessment of TCGA Agilent gene expression data for ovary cancer

Quality assessment of TCGA Agilent gene expression data for ovary cancer Quality assessment of TCGA Agilent gene expression data for ovary cancer Nianxiang Zhang & Keith A. Baggerly Dept of Bioinformatics and Computational Biology MD Anderson Cancer Center Oct 1, 2010 Contents

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst Metabolomic Data Analysis with MetaboAnalyst User ID: guest6501 April 16, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety of data types

More information

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis

Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis Supplementary Information Methods Subjects The study was comprised of 84 chronic pain patients with either chronic back pain (CBP) or osteoarthritis (OA). All subjects provided informed consent to procedures

More information

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and

Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Discovery of Novel Human Gene Regulatory Modules from Gene Co-expression and Promoter Motif Analysis Shisong Ma 1,2*, Michael Snyder 3, and Savithramma P Dinesh-Kumar 2* 1 School of Life Sciences, University

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Study of cigarette sales in the United States Ge Cheng1, a,

Study of cigarette sales in the United States Ge Cheng1, a, 2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and

More information

How To Use SubpathwayGMir

How To Use SubpathwayGMir How To Use SubpathwayGMir Li Feng, Chunquan Li and Xia Li May 20, 2015 Contents 1 Overview 1 2 The experimentally verified mirna-target interactions 2 3 Reconstruct KEGG metabolic pathways 2 3.1 Embed

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

UvA-DARE (Digital Academic Repository)

UvA-DARE (Digital Academic Repository) UvA-DARE (Digital Academic Repository) A classification model for the Leiden proteomics competition Hoefsloot, H.C.J.; Berkenbos-Smit, S.; Smilde, A.K. Published in: Statistical Applications in Genetics

More information

Package IMAS. March 10, 2019

Package IMAS. March 10, 2019 Type Package Package IMAS March 10, 2019 Title Integrative analysis of Multi-omics data for Alternative Splicing Version 1.6.1 Author Seonggyun Han, Younghee Lee Maintainer Seonggyun Han

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data Dhouha Grissa, Mélanie Pétéra, Marion Brandolini, Amedeo Napoli, Blandine Comte and Estelle Pujos-Guillot

More information

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES

CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES CLASSIFICATION OF BREAST CANCER INTO BENIGN AND MALIGNANT USING SUPPORT VECTOR MACHINES K.S.NS. Gopala Krishna 1, B.L.S. Suraj 2, M. Trupthi 3 1,2 Student, 3 Assistant Professor, Department of Information

More information

Image Analysis. Edge Detection

Image Analysis. Edge Detection Image Analsis Image Analsis Edge Detection K. Buza Lars Schmidt-Thieme Inormation Sstems and Machine Learning Lab ISMLL Institute o Economics and Inormation Sstems & Institute o Computer Science Universit

More information

LC/MS/MS SOLUTIONS FOR LIPIDOMICS. Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS

LC/MS/MS SOLUTIONS FOR LIPIDOMICS. Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS LC/MS/MS SOLUTIONS FOR LIPIDOMICS Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS Lipids play a key role in many biological processes, such as the formation of cell membranes and signaling

More information

Knowledge Discovery and Data Mining I

Knowledge Discovery and Data Mining I Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Introduction What is an outlier?

More information

Package netclass. July 2, 2014

Package netclass. July 2, 2014 Package netclass Jul 2, 2014 Version 1.2.1 Date 2013-12-03 Title netclass: An R Package for Network-Based Biomarker Discover Author Yupeng Cun Maintainer netclass is an R package for network-based feature

More information

Meaning-based guidance of attention in scenes as revealed by meaning maps

Meaning-based guidance of attention in scenes as revealed by meaning maps SUPPLEMENTARY INFORMATION Letters DOI: 1.138/s41562-17-28- In the format provided by the authors and unedited. -based guidance of attention in scenes as revealed by meaning maps John M. Henderson 1,2 *

More information

A fatty liver study on Mus musculus

A fatty liver study on Mus musculus Sergio Picart-Armada 1 and Alexandre Perera-Lluna 1 1 B2SLab at Polytechnic University of Catalonia sergi.picart@upc.edu alexandre.perera@upc.edu Package September 17, 2018 FELLA 1.2.0 Contents 1 Introduction..............................

More information

Bivariate &/vs. Multivariate

Bivariate &/vs. Multivariate Bivariate &/vs. Multivariate Differences between correlations, simple regression weights & multivariate regression weights Patterns of bivariate & multivariate effects Prox variables Multiple regression

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion

Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion Mirsynergy: detect synergistic mirna regulatory modules by overlapping neighbourhood expansion Yue Li yueli@cs.toronto.edu October 30, 2017 1 Introduction MicroRNAs (mirnas) are 22 nucleotide small noncoding

More information

PCA Enhanced Kalman Filter for ECG Denoising

PCA Enhanced Kalman Filter for ECG Denoising IOSR Journal of Electronics & Communication Engineering (IOSR-JECE) ISSN(e) : 2278-1684 ISSN(p) : 2320-334X, PP 06-13 www.iosrjournals.org PCA Enhanced Kalman Filter for ECG Denoising Febina Ikbal 1, Prof.M.Mathurakani

More information

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2

How To Use MiRSEA. Junwei Han. July 1, Overview 1. 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2 How To Use MiRSEA Junwei Han July 1, 2015 Contents 1 Overview 1 2 Get the pathway-mirna correlation profile(pmset) and a weighting matrix 2 3 Discovering the dysregulated pathways(or prior gene sets) based

More information

CANONICAL CORRELATION ANALYSIS OF DATA ON HUMAN-AUTOMATION INTERACTION

CANONICAL CORRELATION ANALYSIS OF DATA ON HUMAN-AUTOMATION INTERACTION Proceedings of the 41 st Annual Meeting of the Human Factors and Ergonomics Society. Albuquerque, NM, Human Factors Society, 1997 CANONICAL CORRELATION ANALYSIS OF DATA ON HUMAN-AUTOMATION INTERACTION

More information

Identification of Sickle Cells using Digital Image Processing. Academic Year Annexure I

Identification of Sickle Cells using Digital Image Processing. Academic Year Annexure I Academic Year 2014-15 Annexure I 1. Project Title: Identification of Sickle Cells using Digital Image Processing TABLE OF CONTENTS 1.1 Abstract 1-1 1.2 Motivation 1-1 1.3 Objective 2-2 2.1 Block Diagram

More information

A Bag of Words Approach for Discriminating between Retinal Images Containing Exudates or Drusen

A Bag of Words Approach for Discriminating between Retinal Images Containing Exudates or Drusen A Bag of Words Approach for Discriminating between Retinal Images Containing Exudates or Drusen by M J J P Van Grinsven, Arunava Chakravarty, Jayanthi Sivaswamy, T Theelen, B Van Ginneken, C I Sanchez

More information

Target Analyses in Parallel Reaction Monitoring Mode (PRM)

Target Analyses in Parallel Reaction Monitoring Mode (PRM) Target Analses in Parallel Reaction Monitoring Mode (PRM) Skline Webinar Januar 13, 2015 Bruno Domon, PhD Head Luxembourg Clinical Proteomics Center Invited Professor Universit of Luxembourg INTRODUCTION

More information

Supplementary Materials

Supplementary Materials Supplementary Materials July 2, 2015 1 EEG-measures of consciousness Table 1 makes explicit the abbreviations of the EEG-measures. Their computation closely follows Sitt et al. (2014) (supplement). PE

More information

Unsupervised Identification of Isotope-Labeled Peptides

Unsupervised Identification of Isotope-Labeled Peptides Unsupervised Identification of Isotope-Labeled Peptides Joshua E Goldford 13 and Igor GL Libourel 124 1 Biotechnology institute, University of Minnesota, Saint Paul, MN 55108 2 Department of Plant Biology,

More information

Discovery Metabolomics - Quantitative Profiling of the Metabolome using TripleTOF Technology

Discovery Metabolomics - Quantitative Profiling of the Metabolome using TripleTOF Technology ANSWERS FOR SCIENCE. KNOWLEDGE FOR LIFE. Discovery Metabolomics - Quantitative Profiling of the Metabolome using TripleTOF Technology Baljit Ubhi Ph.D ASMS Baltimore, June 2014 What is Metabolomics? Also

More information

RNA-seq. Differential analysis

RNA-seq. Differential analysis RNA-seq Differential analysis Data transformations Count data transformations In order to test for differential expression, we operate on raw counts and use discrete distributions differential expression.

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS) Chapter : Advanced Remedial Measures Weighted Least Squares (WLS) When the error variance appears nonconstant, a transformation (of Y and/or X) is a quick remedy. But it may not solve the problem, or it

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

LC-MS. Pre-processing (xcms) W4M Core Team. 29/05/2017 v 1.0.0

LC-MS. Pre-processing (xcms) W4M Core Team. 29/05/2017 v 1.0.0 LC-MS Pre-processing (xcms) W4M Core Team 29/05/2017 v 1.0.0 Acquisition files upload and pre-processing with xcms: extraction, alignment and retention time drift correction. SECTION 1 2 LC-MS Data What

More information

Understanding Inheritance

Understanding Inheritance 656 iscience Grade 7, Davis Count Edition Lesson 2 Reading Guide Ke Concepts ESSENTIAL QUESTIONS What determines the expression of traits? How can inheritance be modeled? How do some patterns of inheritance

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training. Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze

More information

Sample classification from protein mass spectrometry, by peak probability contrasts

Sample classification from protein mass spectrometry, by peak probability contrasts Sample classification from protein mass spectrometry, by peak probability contrasts Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Scott Soltys, Gongyi Shi, Albert Koong, Quynh-Thu Le, June

More information

Supporting Information

Supporting Information Supporting Information Mass Spectrometry Imaging Shows Cocaine and Methylphenidate have Opposite Effects on Major Lipids in Drosophila Brain Mai H. Philipsen *, Nhu T. N. Phan *, John S. Fletcher *, Per

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst Metabolomic Data Analysis with MetaboAnalyst User ID: guest4965371694680211620 April 14, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety

More information

A Study on Edge Detection Techniques in Retinex Based Adaptive Filter

A Study on Edge Detection Techniques in Retinex Based Adaptive Filter A Stud on Edge Detection Techniques in Retine Based Adaptive Filter P. Swarnalatha and Dr. B. K. Tripath Abstract Processing the images to obtain the resultant images with challenging clarit and appealing

More information

Clustering Autism Cases on Social Functioning

Clustering Autism Cases on Social Functioning Clustering Autism Cases on Social Functioning Nelson Ray and Praveen Bommannavar 1 Introduction Autism is a highly heterogeneous disorder with wide variability in social functioning. Many diagnostic and

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Bias Adjustment: Local Control Analysis of Radon and Ozone

Bias Adjustment: Local Control Analysis of Radon and Ozone Bias Adjustment: Local Control Analysis of Radon and Ozone S. Stanley Young Robert Obenchain Goran Krstic NCSU 19Oct2016 Abstract Bias Adjustment: Local control analysis of Radon and ozone S. Stanley Young,

More information

Nearest Shrunken Centroid as Feature Selection of Microarray Data

Nearest Shrunken Centroid as Feature Selection of Microarray Data Nearest Shrunken Centroid as Feature Selection of Microarray Data Myungsook Klassen Computer Science Department, California Lutheran University 60 West Olsen Rd, Thousand Oaks, CA 91360 mklassen@clunet.edu

More information

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666

Quantitative Trait Analysis in Sibling Pairs. Biostatistics 666 Quantitative Trait Analsis in Sibling Pairs Biostatistics 666 Outline Likelihood function for bivariate data Incorporate genetic kinship coefficients Incorporate IBD probabilities The data Pairs of measurements

More information

NMF-Density: NMF-Based Breast Density Classifier

NMF-Density: NMF-Based Breast Density Classifier NMF-Density: NMF-Based Breast Density Classifier Lahouari Ghouti and Abdullah H. Owaidh King Fahd University of Petroleum and Minerals - Department of Information and Computer Science. KFUPM Box 1128.

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Validating the Visual Saliency Model

Validating the Visual Saliency Model Validating the Visual Saliency Model Ali Alsam and Puneet Sharma Department of Informatics & e-learning (AITeL), Sør-Trøndelag University College (HiST), Trondheim, Norway er.puneetsharma@gmail.com Abstract.

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Performance of Median and Least Squares Regression for Slightly Skewed Data

Performance of Median and Least Squares Regression for Slightly Skewed Data World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of

More information

STATISTICS & PROBABILITY

STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

Comparison of volume estimation methods for pancreatic islet cells

Comparison of volume estimation methods for pancreatic islet cells Comparison of volume estimation methods for pancreatic islet cells Jiří Dvořák a,b, Jan Švihlíkb,c, David Habart d, and Jan Kybic b a Department of Probability and Mathematical Statistics, Faculty of Mathematics

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features.

Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features. Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features. Mohamed Tleis Supervisor: Fons J. Verbeek Leiden University

More information

BREAST CANCER EPIDEMIOLOGY MODEL:

BREAST CANCER EPIDEMIOLOGY MODEL: BREAST CANCER EPIDEMIOLOGY MODEL: Calibrating Simulations via Optimization Michael C. Ferris, Geng Deng, Dennis G. Fryback, Vipat Kuruchittham University of Wisconsin 1 University of Wisconsin Breast Cancer

More information

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments

Outline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments 1 2 Outline Finish sampling slides from Tuesday. Study design what do you do with the subjects/units once you select them? (OI Sections 1.4-1.5) Observational studies vs. experiments Descriptive statistics

More information

Nature Methods: doi: /nmeth.3115

Nature Methods: doi: /nmeth.3115 Supplementary Figure 1 Analysis of DNA methylation in a cancer cohort based on Infinium 450K data. RnBeads was used to rediscover a clinically distinct subgroup of glioblastoma patients characterized by

More information