Supervised analysis of MS images using Cardinal

Size: px

Start display at page:

Download "Supervised analysis of MS images using Cardinal"

Russell Gibson
5 years ago
Views:

1 Supervised analsis of MS images using Cardinal Klie A. Bemis and April Harr November, 28 Contents Introduction Analsis of a renal cell carcinoma (RCC) dataset Pre-processing Normalization Resampling to unit resolution Subsetting the dataset Visualizing the dataset Visualization of molecular ion images Eplorator analsis using PCA Classification using PLS-DA Cross-validation with partial least squares Plotting the classified images Plotting and interpretting the coefficients of the m/z values Classification using O-PLS-DA Cross-validation with partial least squares Plotting the classified images Plotting and interpretting the coefficients of the m/z values Classification using spatial shrunken centroids Cross-validation with spatial shrunken centroids Plotting the classified images Plotting and interpreting the t-statistics of the m/z values Session info Introduction For eperiments in which analzed samples come from different classes or conditions, a common goal of supervised analsis is to predict the class of a new sample, given a labeled training set for which classes are alread known. This task is called classification. Unlike unsupervised analsis such as clustering, classification requires biological replicates for testing and validation, to avoid biased reporting of accurac. Cardinal implements crossvalidation for classification.

2 Supervised analsis of MS images using Cardinal In this vignette, an eample classification workflow in Cardinal is presented, together with plots of the results. 2 Analsis of a renal cell carcinoma (RCC) dataset This eample uses a renal cell carcinoma (RCC) dataset consisting of 8 matched pairs of human kidne tissue. Each tissue pair consists of a tissue sample and a ous tissue sample. The goal of the workflow is to develop classifiers for predicting whether a new tissue sample is or. > librar(cardinalworkflows) > data(rcc, rcc_analses) In this RCC dataset, we epect that tissue and ous tissue will have unique chemical profiles, which we can use to classif new tissue based on the mass spectra. In Figure we show the H&E stained tissue samples. In Figure i we plot the ion images for m/z 8.5, which we know from previous studies to be abundant in approimatel equal intensities in both ous and tissue []. > image(rcc, mz = 8.5, ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian", laout = c(4, 2)) > summar(rcc) Class: MSImageSet Features: m/z = m/z = (2 total) Piels: =, = 3, sample = MH24_33... = 77, = 5, sample = UH992_ (6 total) : : Size in memor: Mb As can be seen in Figure, each matched pair of tissues belonging to the same subject are on the same slide. Note also the the tissue is on the left and the tissue is on the right on each slide. The image contains 6 piels with 2 spectral features measured at each location (m/z range from 5 to ). 2. Pre-processing For statistical analsis, some form of dimension reduction is necessar so that computation times are reasonable. However, the usual form of dimension reduction for mass spectra peak-picking is often unsuitable for classification. This is because classification requires testing and validation to avoid bias in the reported accurac. If we perform peak-picking on the whole dataset, then the accurac reported for the validation set will be biased, because the selected peaks are also coming from the validation set. 2

Supervised analsis of MS images using Cardinal 8 m/z = 8.

5 UH995_8 2 4 6 6 6 2 4 4 4 6 2 2 2 2 2 6 2 4 m/z = 8.

5 UH7_33 2 2 m/z = 8.5 UH55_2 2 (h) UH992_ (g) UH99_5 2 m/z = 8.

2 (a) MH24_33 2 4 6 8 2 4 6 8 (i) Corresponding ion images for m/z 8.

general morphol- og Therefore, we recommend resampling or binning as the

choice for mass spectrometr imaging datasets. > rcc.

3 Supervised analsis of MS images using Cardinal 8 m/z = 8.5 UH99_ m/z = 8.5 UH992_ m/z = 8.5 UH995_ m/z = 8.5 UH982_ m/z = 8.5 UH96_ m/z = 8.5 UH7_ m/z = 8.5 UH55_2 2 (h) UH992_ (g) UH99_5 2 m/z = 8.5 MH24_33 (d) UH96_5 2 (f) UH995_8 (c) UH7_33 2 (e) UH982_3 2 (b) UH55_2 2 (a) MH24_ (i) Corresponding ion images for m/z 8.5 Figure : Optical images and ion images for the eight samples showing general morphol- og Therefore, we recommend resampling or binning as the preferred method of dimension reduction for classification workflows. We will use resampling. 2.. Normalization Before resampling or binning, ization is necessar to correct for piel-to-piel variation. We will use total ion current (TIC) standardization, which is a popular choice for mass spectrometr imaging datasets. > rcc.norm <- ize(rcc, method = "tic") 2..2 Resampling to unit resolution The ized data is then resampled to unit resolution. Binning would also be an appropriate alternative, and could be used b setting method="bin" in the reducedimension method. 3

4 Supervised analsis of MS images using Cardinal > rcc.resample <- reducedimension(rcc.norm, method = "resample") As discussed above, resampling or binning is preferred to peak-picking for classification. However, if peak-picking is preferred, this can be worked around b performing peak-picking separatel on the training set onl, and using the same peaks in the testing and validation sets. This can become a comple procedure if cross-validation is desired, and will not be covered in this vignette Subsetting the dataset Lastl, we will subset the dataset to drop piels that contain onl the slide background, so that the final dataset will onl consist of mass spectra from actual tissue. To subset the data, we will use the diagnosis variable stored in the object s pieldata. This variable is a factor with the disease condition for each piel, as annotated b a pathologist. > summar(rcc$diagnosis) NA's We drop the 9923 piels without annotation. > rcc.small <- rcc.resample[, rcc$diagnosis %in% c("", "")] > summar(rcc.small) Class: MSImageSet Features: m/z = 5... m/z = (85 total) Piels: = 7, = 5, sample = MH24_33... = 6, = : : Size in memor: 4.7 Mb 6, sample = UH992_ (677 total) Now the dataset contains onl the 677 mass spectra we need to train and test a classifier. 2.2 Visualizing the dataset In this section, we will walk through several visualization methods to eplore the dataset before training our classifiers Visualization of molecular ion images To begin visualizing the dataset, we will plot ion images for m/z values we alread know to be useful in distinguishing tissue versus. First, we plot the ion images for m/z 25.3, known to be more abundant in tissue (right) [], shown in Figure 2a. > image(rcc, mz = 25.3, ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian", laout = c(4, 2)) 4

5 Supervised analsis of MS images using Cardinal Likewise, we plot the ion images for m/z 885.7, known to be more abundant in ous tissue (left) [], shown in Figure 2b. > image(rcc, mz = 885.7, ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian", laout = c(4, 2)) m/z = MH24_ m/z = UH55_ m/z = UH7_ m/z = UH96_ m/z = UH982_ m/z = UH995_ m/z = UH99_ m/z = UH992_ (a) m/z 25.3 (more abundant in tissue) m/z = MH24_ m/z = UH55_ m/z = UH7_ m/z = UH96_ m/z = UH982_ m/z = UH995_ m/z = UH99_ m/z = UH992_ (b) m/z (more abundant in ous tissue) Figure 2: Ion images showing ions associated with and ous tissue From Figure 2a and Figure 2b, we note that there is still a great deal of variation in these images for ions that should be associated with a particular disease condition. For eample, m/z 25.3 which should be more abundant in tissue is also abundant in ous tissue for samples UH55_2 and UH995_8. This shows that multiple ions will be necessar for classification Eplorator analsis using PCA Although man use principal components analsis (PCA) combined with linear regression for classification, it is a method for unsupervised analsis most appropriatel used for eploring a dataset, prior to appling a method designed for classification. We will use PCA for visualization. Here we fit the first 5 principal components using the PCA method. 5

6 Supervised analsis of MS images using Cardinal > rcc.pca <- PCA(rcc.small, ) > summar(rcc.pca) PC PC2 PC3 PC4 PC5 Standard deviation The summar of the first 5 principal components show that PCA is not ver useful for this dataset, since the first 5 components cumulativel eplain approimatel onl 5% of the variation in the data. To further eplore the dataset with PCA, we plot images of the scores of the first principal component, shown in Figure 3. > image(rcc.pca, column = "PC", superpose = FALSE, col.regions = risk.colors(), + laout = c(4, 2)) PC MH24_ PC UH55_ PC UH7_ PC UH96_ PC UH982_ PC UH995_ PC UH99_ PC UH992_ Figure 3: PC scores for the first principal component Figure 3 show that the images based on the PC scores do not seem to show a strong pattern useful for classification of versus tissue, although tissue seems to have slightl higher PC scores. We also plot the PC loadings for the first 2 principal components, shown in Figure 5. > plot(rcc.pca, column = c("pc", "PC2", "PC3"), superpose = FALSE, laout = c(3, + )) loadings PC loadings PC2 loadings PC m z m z m z Figure 4: PC loadings for the first three principal components 6

7 Supervised analsis of MS images using Cardinal Another useful PCA plot in a classification setting is to plot the scores of different components against each other, plotting each class separatel, which we do below for disease condition. > pca. <- as.data.frame(rcc.pca[[]]$scores[rcc.small$diagnosis == "", + ]) > pca. <- as.data.frame(rcc.pca[[]]$scores[rcc.small$diagnosis == "", + ]) We show PC versus PC2 in Figure 5a. > plot(pc2 ~ PC, data = pca., col = "blue") > points(pc2 ~ PC, data = pca., col = "red") > legend("top", legend = c("", ""), col = c("blue", "red"), pch =, + bg = rgb(,,,.75)) Now PC versus PC3 in Figure 5b. > plot(pc3 ~ PC, data = pca., col = "blue") > points(pc3 ~ PC, data = pca., col = "red") > legend("top", legend = c("", ""), col = c("blue", "red"), pch =, + bg = rgb(,,,.75)) And PC2 versus PC3 in Figure 5c. > plot(pc3 ~ PC2, data = pca., col = "blue") > points(pc3 ~ PC2, data = pca., col = "red") > legend("top", legend = c("", ""), col = c("blue", "red"), pch =, + bg = rgb(,,,.75)) PC PC2 (a) PC versus PC2 scores PC PC3 (b) PC versus PC3 scores PC2 PC3 (c) PC2 versus PC3 scores Figure 5: PCA plots showing PC scores according to disease condition The PC score plots shown in Figure 5 show that there is indeed separation between the disease conditions in the data. However, it is often difficult to interpret the comple relationship between PC loadings and how the PC scores relate to condition. Therefore, we will now move on to methods designed for classification. 2.3 Classification using PLS-DA Partial least squares discriminant analsis (PLS-DA) also known as projection to latent structures is a multivariate method that has been shown to be useful in the classification of MS images []. We will now demonstrate classification of the RCC dataset using PLS-DA. 7

8 Supervised analsis of MS images using Cardinal Note that although we show PLS-DA prediction on two conditions ( and ), it can also be used for classification on more than two conditions Cross-validation with partial least squares An important step in classification is testing and validation. If the accurac of a classifier is tested on the same dataset that was used to train the classifier, the reported accurac will be biased and too optimistic. Therefore, Cardinal implements the cvappl method, which performs cross-validation for an of the supplied classification methods, including PLS. See?cvAppl for further details on how to use cross-validation in Cardinal. B default, cvappl considers each unique sample (as given b the sample variable in an MSImageSet object s pieldata) as a fold for n-fold cross-validation. In most cases, these should correspond to biological replicates, which is our recommended workflow. This is the case for the RCC dataset, where each matched pair on a separate slide constitutes a unique sample. > summar(rcc.small$sample) MH24_33 UH55_2 UH7_33 UH96_5 UH982_3 UH995_8 UH99_5 UH992_ Generall, biological replicates should be used to partition the dataset rather than technical replicates or individual piels. The onl eception would be in the case of a sample size of one, in which case there are no biological replicates. However, a sample size of one is a worst case scenario, and biological replicates should alwas be preferred. We now perform cross-validation using PLS-DA as our classification method, using from to 5 PLS components. > rcc.cv.pls <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "PLS", ncomp = :5) We plot the cross-validated accurac to determine the best number of components for prediction, shown in Figure 6. > plot(summar(rcc.cv.pls)) As seen in Figure 6, PLS components produce the best prediction rate, with 96.8% crossvalidated accurac. > summar(rcc.cv.pls)$accurac[["ncomp = "]] Accurac Sensitivit Specificit FDR Figure 6 tells us that if we wanted to use PLS-DA for prediction on new data, we should train a classifier on this data using PLS components. Note that rcc.cv.pls is a CrossValidated object, which contains 8 objects in its resultdata slot one for each cross-validation fold each of which is a PLS object containing the results of prediction for that fold. 8

9 Supervised analsis of MS images using Cardinal Accurac Number of Components Figure 6: Accurac of PLS-DA classification for number of components used CrossValidated inherits from ResultSet. See?ResultSet for details Plotting the classified images Now we plot the images of the PLS-DA fitted values, to visualize the cross-validated prediction rate, shown in Figure 7. > image(rcc.cv.pls, model = list(ncomp = ), laout = c(4, 2)) ncomp = MH24_ ncomp = UH55_ ncomp = UH7_ ncomp = UH96_ ncomp = UH982_ ncomp = UH995_ ncomp = UH99_ ncomp = UH992_ Figure 7: PLS-DA fitted values indicating or tissue For prediction, PLS-DA creates indicator variables (with values or ) for each condition. The predicted condition is the one with the highest fitted value. (Since the fitted values can fall outside the range to, these are not interprettable as probabilities.) Figure 7 shows the tissues on the left are more predominantl red than blue, indicating that the are predicted to be, which corresponds with the true disease conditions. Since we used cross-validation, the prediction on each matched pair is based onl on the data from the other 7 matched pairs of tissue samples Plotting and interpretting the coefficients of the m/z values To interpret the relative importance of the mass features in the classification, we can look at the PLS coefficients used for prediction. To do this, we re-train a PLS classifier on the full dataset using the optical number of PLS components as indicated b the cross-validation. 9

10 Supervised analsis of MS images using Cardinal > rcc.pls <- PLS(rcc.small, = rcc.small$diagnosis, ncomp = ) Now we plot the PLS coefficients against the m/z values, shown in Figure 8. > plot(rcc.pls) coefficients ncomp = m z Figure 8: PLS coefficients for and We can also rank the most important for each condition, based on the PLS coefficients, b using the toplabels method. > toplabels(rcc.pls) mz ncomp column coefficients loadings weights e e e e e e- Among the top-ranked m/z values, we see m/z 25 is listed for tissue, which we know to be abundant in tissue. The top-ranked ion for is m/z 75, which we plot below in Figure 9. > image(rcc.small, mz = 75, laout = c(4, 2), ize.image = "linear", contrast.enhance = "histogram", + smooth.image = "gaussian") m/z = 75 MH24_ m/z = 75 UH55_ m/z = 75 UH7_ m/z = 75 UH96_ m/z = 75 UH982_ m/z = 75 UH995_ m/z = 75 UH99_ m/z = 75 UH992_ Figure 9: m/ 75 (identified b PLS as associated with )

11 Supervised analsis of MS images using Cardinal We see from Figure 9 that m/z 75 is indeed more abundant in the tissue. Although the PLS coefficients are useful for ranking the relative important of m/z values, the are not indicative of statistical significance. 2.4 Classification using O-PLS-DA Orthogonal partial least squares discriminant analsis (O-PLS-DA) is another multivariate method that can be useful for classification of MS images. It is related to PLS, but it removes from the data a number of PLS components orthogonal to the relationship between the data and condition prior to fitting a -component PLS model. O-PLS-DA can often produce comparable accurac to PLS-DA, but with more stable and easil interprettable coefficients Cross-validation with partial least squares We now use cross-validation to fit O-PLS-DA models for the RCC dataset. > rcc.cv.opls <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "OPLS", ncomp = :5, + keep.xnew = FALSE) > plot(summar(rcc.cv.opls)) As seen in Figure, 2 O-PLS components produce the best prediction rate, with 95.7% cross-validated accurac. > summar(rcc.cv.pls)$accurac[["ncomp = 2"]] Accurac Sensitivit Specificit FDR Accurac Number of Components Figure : Accurac of O-PLS-DA classification for number of components used Plotting the classified images As with PLS-DA, we now plot the images for the O-PLS-DA fitted values, to visuall show the predictions, shown in Figure.

12 Supervised analsis of MS images using Cardinal > image(rcc.cv.opls, model = list(ncomp = 2), laout = c(4, 2)) ncomp = 2 MH24_ ncomp = 2 UH55_ ncomp = 2 UH7_ ncomp = 2 UH96_ ncomp = 2 UH982_ ncomp = 2 UH995_ ncomp = 2 UH99_ ncomp = 2 UH992_ Figure : O-PLS-DA fitted values indicating or tissue Figure shows good qualit prediction comparable with PLS-DA Plotting and interpretting the coefficients of the m/z values Now we consider the O-PLS coefficients b training a classifier on the full dataset with the optimal number of O-PLS components as shown b cross-validation. > rcc.opls <- OPLS(rcc.small, = rcc.small$diagnosis, ncomp = 2, keep.xnew = FALSE) And we plot the O-PLS coefficients, as shown in Figure 2. > plot(rcc.opls) ncomp = 2 coefficients m z Figure 2: O-PLS coefficients for and A comparison of the O-PLS coefficients in Figure 2 with the PLS coefficients from Figure 8 shows that the O-PLS coefficients appear more stable and should be easier to interpret. We get the top-ranked m/z values using toplabels > toplabels(rcc.opls) mz ncomp column coefficients loadings Oloadings weights Oweights

13 Supervised analsis of MS images using Cardinal The O-PLS coefficients rank m/z 886 highl for, which we know to be more abundant in the ous tissue. As with the PLS coefficients, the ion at m/z 25, which we know to be more abundant in tissue, is also highl ranked for. 2.5 Classification using spatial shrunken centroids This section demonstrates the spatial shrunken centroids classification method for statistical analsis we introduce in Cardinal in the spatialshrunkencentroids method. In this method, we adapt the nearest shrunken centroids classifier [2] with spatial smoothing. This method uses statistical regularization to shrink each condition s mean spectrum toward the global mean spectrum. This shrinkage allows automated feature selection of important masses. It then classifies piels b comparing their mass spectra to the shrunken mean spectra of each conditions. The spatial smoothing uses weights adapted from spatiall-aware clustering [3], including Gaussian weights, and adaptive weights that attempt to account for local structure. The parameters to be eplicitl provided in the spatialshrunkencentroids method are: r: The neighborhood smoothing radius s: The shrinkage parameter The s parameter is the shrinkage parameter that enforces sparsit. As s increases, fewer mass features (m/z values) will be used b the classifier, and onl the informative mass features will be retained. For a detailed eplanation of the shrinkage parameter s, see [2] and [4]. Clustering can also be performed if no response variable is given, b providing an additional parameter k for the initial number of clusters. See the clustering workflow for details Cross-validation with spatial shrunken centroids Now we perform cross-validation with spatial shrunken centroids classification and the method="gaussian" weights. > rcc.cv.sscg <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "spatialshrunkencentroids", + method = "gaussian", r = c(, 2, 3), s = c(, 4, 8, 2, 6, 2, 24, 28)) And we perform cross-validation with spatial shrunken centroids classification and the method="adaptive" weights. > rcc.cv.ssca <- cvappl(rcc.small,. = rcc.small$diagnosis,.fun = "spatialshrunkencentroids", + method = "adaptive", r = c(, 2, 3), s = c(, 4, 8, 2, 6, 2, 24, 28)) Now we plot the cross-validated accurac for the classifier with Gaussian weights in Figure 3a and adaptive weights in Figure 3b. 3

14 Supervised analsis of MS images using Cardinal > plot(summar(rcc.cv.sscg)) > plot(summar(rcc.cv.ssca)) Accurac r =, k = 2 r = 2, k = 2 r = 3, k = 2 Accurac r =, k = 2 r = 2, k = 2 r = 3, k = Shrinkage parameter (s) (a) Accurac for Gaussian weights Shrinkage parameter (s) (b) Accurac for adaptive weights Figure 3: Plots of accurac for spatial shrunken centroids, showing highest accurac for s = 2 and r = 3 As shown in Figure 3, for both weight tpes and all smoothing radii r, the highest accurac occurs with a shrinkage parameter s = 2, ecept for the case with adaptive weights and r =, for which the highest accurac occurs at s = 6. For Gaussian weights with r = 3, s = 2, accurac was 88.8%. Note that in general, the accurac increases with larger smoothing neighborhood radii r. This is likel because rather than heterogenous samples with both and ous cells on the same tissue, each tissue is relativel homogenous with predominantl or ous cells. Therefore, greater spatial smoothing increases the accurac, and adaptive weights have no advantage over Gaussian weights. For classification on more heterogenous tissue, adaptive weights ma perform better Plotting the classified images Now we plot the classified images for Gaussian weights in Figure 4a and adaptive weights in Figure 4b. > image(rcc.cv.sscg, model = list(r = 3, s = 2), laout = c(4, 2)) > image(rcc.cv.ssca, model = list(r = 3, s = 2), laout = c(4, 2)) Unlike PLS-DA and O-PLS-DA, spatial shrunken centroids produce probabilities of versus, which we plot using higher opacit for higher probabilit. This makes for more interpretable predicted images Plotting and interpreting the t-statistics of the m/z values A major advantage of spatial shrunken centroids is that it provides t-statistics for each feature for each condition, and it uses statistical regularization to perform automatic feature selection. This allows for easier identification of important m/z values, and a more straightforward and interpretable method of ranking their relative importance. 4

15 Supervised analsis of MS images using Cardinal r = 3, k = 2, s = 2 MH24_ r = 3, k = 2, s = 2 UH55_ r = 3, k = 2, s = 2 UH7_ r = 3, k = 2, s = 2 UH96_ r = 3, k = 2, s = 2 UH982_ r = 3, k = 2, s = 2 UH995_ r = 3, k = 2, s = 2 UH99_ r = 3, k = 2, s = 2 UH992_ (a) Probabilities for Gaussian weights r = 3, k = 2, s = 2 MH24_ r = 3, k = 2, s = 2 UH55_ r = 3, k = 2, s = 2 UH7_ r = 3, k = 2, s = 2 UH96_ r = 3, k = 2, s = 2 UH982_ r = 3, k = 2, s = 2 UH995_ r = 3, k = 2, s = 2 UH99_ r = 3, k = 2, s = 2 UH992_ (b) Probabilities for adaptive weights Figure 4: Predicted probabilities of and, with higher opacit for a condition s color indicating higher probabilit To inspect the t-statistics of the m/z values, we now train classifiers on the full dataset using the parameters r = 3, s = 2. > rcc.sscg <- spatialshrunkencentroids(rcc.small, = rcc.small$diagnosis, r = 3, + s = 2, method = "gaussian") > rcc.ssca <- spatialshrunkencentroids(rcc.small, = rcc.small$diagnosis, r = 3, + s = 2, method = "adaptive") Now we plot the t-statistics, shown for Gaussian weights in Figure 5a and for adaptive weights in Figure 5b. > plot(rcc.sscg, mode = "tstatistics", model = list(r = 3, s = 2)) > plot(rcc.ssca, mode = "tstatistics", model = list(r = 3, s = 2)) As seen in Figure 5a and Figure 5b, onl a few m/z values have non-zero t-statistics. > summar(rcc.sscg) r k s method time Predicted # of Classes Mean # of Features per Class 5

16 Supervised analsis of MS images using Cardinal tstatistics 2 2 r = 3, k = 2, s = 2 tstatistics 2 2 r = 3, k = 2, s = m z m z (a) Shrunken t-statistics for Gaussian weights (b) Shrunken t-statistics for adaptive weights Figure 5: Predicted probabilities of and, with higher opacit for a condition s color indicating higher probabilit gaussian > summar(rcc.ssca) r k s method time Predicted # of Classes Mean # of Features per Class adaptive In fact, onl 4 of 85 mass features are used in the spatial shrunken centroids classifier. We identif the top-ranked mass features using the toplabels method. > toplabels(rcc.sscg) mz r k s classes centers tstatistics p.values adj.p.values > toplabels(rcc.ssca) mz r k s classes centers tstatistics p.values adj.p.values Note that the shrunken t-statistics are identical between the Gaussian and adaptive weights, since the are based on the same training data, and the spatial structure is taken into account onl for the predicted probabilities. Spatial shrunken centroids identified m/z 25, m/z 886, and m/z 8, which were also identified b O-PLS-DA, and are all known to be important (as discussed in Section 2.2.), as well as m/z 75, which was identified b PLS-DA. Of the three methods, spatial shrunken centroids most reliabl selected the m/z values known to be associated with and, in addition to selecting potentiall informative new m/z values. 6

17 Supervised analsis of MS images using Cardinal 3 Session info R version 3.5. Patched ( r74967), 86_64-pc-linu-gnu Locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=C, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8, LC_IDENTIFICATION=C Running under: Ubuntu LTS Matri products: default BLAS: /home/biocbuild/bbs-3.8-bioc/r/lib/librblas.so LAPACK: /home/biocbuild/bbs-3.8-bioc/r/lib/librlapack.so Base packages: base, datasets, grdevices, graphics, methods, parallel, stats, stats4, utils Other packages: BiocGenerics.28., BiocParallel.6., Cardinal 2.., CardinalWorkflows.4., EBImage 4.24., ProtGenerics.4., S4Vectors.2. Loaded via a namespace (and not attached): Biobase 2.42., BiocManager.3.3, BiocStle 2.., DBI.., MASS 7.3-5, Matri.2-4, R6 2.3., RCurl.95-4., Rcpp.2.9, abind.4-5, assertthat.2., backports..2, biglm.9-, bindr.., bindrcpp.2.2, bitops.-6, compiler 3.5., craon.3.4, digest.6.8, dplr.7.7, evaluate.2, fftwtools.9-8, glue.3., grid 3.5., htmltools.3.6, htmlwidgets.3, irlba 2.3.2, jpeg.-8, knitr.2, lattice.2-35, locfit.5-9., magrittr.5, matter.8., pillar.3., pkgconfig 2..2, png.-7, purrr.2.5, rlang.3.., rmarkdown., rprojroot.3-2, signal.7-6, sp.3-, tibble.4.2, tidselect.2.5, tiff.-5, tools 3.5., aml 2.2. References [] A. L. Dill, L. S. Eberlin, C. Zheng, A. B. Costa, D. R. Ifa, L. Cheng, T. A. Masterson, M. O. Koch, O. Vitek, and R. G. Cooks. Multivariate statistical differentiation of renal cell carcinomas based on lipidomic analsis b ambient ionization imaging mass spectrometr. Analtical and Bioanaltical Chemistr, 398:2969, 2. [2] Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Diagnosis of multiple tpes b shrunken centroids of gene epression. Proceedings of the National Academ of Sciences of the United States of America, 99(): , 22. [3] Theodore Aleandrov and Jan Hendrik Kobarg. Efficient spatial segmentation of large imaging mass spectrometr datasets with spatiall aware clustering. Bioinformatics (Oford, England), 27(3):i23 8, Jul 2. URL: gov/articlerender.fcgi?artid=37346&tool=pmcentrez&rendertpe=abstract, doi:.93/bioinformatics/btr246. [4] Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, and Gilbert Chu. Class Prediction b Nearest Shrunken with Applications to DNA Microarras. Statistical Science, 8():4 7, 23. 7

Cardinal: Analytic tools for mass spectrometry

Cardinal: Analytic tools for mass spectrometry Cardinal: Analtic tools for mass spectrometr imaging Klie A. Bemis and April Harr April 30, 2018 Contents 1 Introduction.............................. 2 2 Input/Output.............................. 4 2.1