Multilevel data analysis

Similar documents
chapter 1 - fig. 2 Mechanism of transcriptional control by ppar agonists.

CNV PCA Search Tutorial

Dividing up the data

Chapter 1. Introduction

The effect of CT dose reduction on performance of a diagnostic task

The effect of CT dose reduction on performance of a diagnostic task

Clinical application of 3.0 T proton MR spectroscopy in evaluation of pancreatic diseases

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

One-Way Independent ANOVA

Tuberculosis Tutorials

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Assignment 5: Integrative epigenomics analysis

Investigation of the Confinement Odour Problem in Exported Lamb using NMRbased

Modeling Sentiment with Ridge Regression

Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance *

Unsupervised Identification of Isotope-Labeled Peptides

Package MSstatsTMT. February 26, Title Protein Significance Analysis in shotgun mass spectrometry-based

A13. MISCELLANEOUS SERVICE ARRANGEMENTS

README file for GRASTv1.0.pl

Acceptable Use Policy - Phone

CSDplotter user guide Klas H. Pettersen

Shear Wave Elastography in diagnostics of supraspinatus tendon.

Prognostic value of CT texture analysis in patients with nonsmall cell lung cancer: Comparison with FDG-PET

Whole brain CT perfusion maps with paradoxical low mean transit time to predict infarct core

AND9020/D. Adaptive Feedback Cancellation 3 from ON Semiconductor APPLICATION NOTE INTRODUCTION

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Cerebral Cortex. Edmund T. Rolls. Principles of Operation. Presubiculum. Subiculum F S D. Neocortex. PHG & Perirhinal. CA1 Fornix CA3 S D

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

The effect of plant sterols and different low doses of omega-3 fatty acids from fish oil on lipoprotein subclasses

Two-Way Independent ANOVA

MRI evaluation of TMJ condylar angulations

OUTLIER SUBJECTS PROTOCOL (art_groupoutlier)

A mathematical model for short-term vs. long-term survival in patients with. glioma

Center Speller ( )

UvA-DARE (Digital Academic Repository)

Exo-spin Exosome Purification Kit For cell culture media/urine/saliva and other low-protein biological fluids

Automatic detection of prostate cancer using quantitative perfusion parameters in contrast-enhanced ultrasound.

NMR. Sample preparation. and Analysis

BODY SYSTEMS TRANSFORMATION CHALLENGE PROGRAM: HIIT ADVANCED 1

SubLasso:a feature selection and classification R package with a. fixed feature subset

BODY SYSTEMS TRANSFORMATION CHALLENGE PROGRAM: HIIT BEGINNER 1

Supporting Information for. revealed defense and detoxification mechanism of. cucumber plant under nano-cu stress

Exo-spin Exosome Purification Kit For cell culture media/urine/saliva and other low-protein biological fluids

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Using CART to Mine SELDI ProteinChip Data for Biomarkers and Disease Stratification

HOW IS PACE TO BE USED

MiSP Solubility Lab L3

H-NMR METABOLOMIC ANALYSIS OF THE EFFECTS OF PROBIOTICS ADMINISTRATION ON HUMAN METABOLIC PHENOTYPE (on going) Probiota 2016 Dr.

Package ordibreadth. R topics documented: December 4, Type Package Title Ordinated Diet Breadth Version 1.0 Date Author James Fordyce

Measuring cell density in prostate cancer imaging as an input for radiotherapy treatment planning

25. Two-way ANOVA. 25. Two-way ANOVA 371

The clinical trial information provided in this public disclosure synopsis is supplied for informational purposes only.

Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Urinary metabolic profiling in inflammatory bowel disease. Dr Horace Williams Clinical Research Fellow Imperial College London

Vega: Variational Segmentation for Copy Number Detection

BREAST CANCER EPIDEMIOLOGY MODEL:

MEA DISCUSSION PAPERS

One-step photochemical attachment of NHS-terminated monolayers onto. silicon surfaces and subsequent functionalization

Supporting Information. Interphase Enzyme Kinetics of an Integral Membrane Kinase by. Time-Resolved Solid-state NMR. contributed equally

AFFILIATION PROGRAM AGREEMENT

OpenSim Tutorial #2 Simulation and Analysis of a Tendon Transfer Surgery

Predicting Breast Cancer Survivability Rates

Name: Date: Solubility Lab - Worksheet #3 Level 1

Regression Discontinuity Design

Biomimetic Chirality Sensing with Pyridoxal-5 -phosphate

Reliability of Ordination Analyses

The use of random projections for the analysis of mass spectrometry imaging data Palmer, Andrew; Bunch, Josephine; Styles, Iain

DICOM Tractography Converter

J2.6 Imputation of missing data with nonlinear relationships

NERVE ACTION POTENTIAL SIMULATION version 2013 John Cornell

Quantification of liver steatosis in MRI: available techniques and use of transverse magnetization decay curve in patients with iron overload

RADIESSE Volumizing Filler E-Media Kit Terms of Use PERMISSION TO USE MERZ AESTHETICS COPYRIGHTED MATERIAL AND TRADEMARKS

DeconRNASeq: A Statistical Framework for Deconvolution of Heterogeneous Tissue Samples Based on mrna-seq data

Comparison of a UPLC Method across Multiple UHPLC Systems

Application Note # MT-111 Concise Interpretation of MALDI Imaging Data by Probabilistic Latent Semantic Analysis (plsa)

Supplementary Materials for

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Figure S6. A-J) Annotated UVPD mass spectra for top ten peptides found among the peptides identified by Byonic but not SEQUEST + Percolator.

Supplementary Materials: An NMR Guided Screening Method for Selective Fragment Docking and Synthesis of a Warhead Inhibitor

Analysis of Uncomplexed and Copper-complexed Methanobactin with UV/Visible Spectrophotometry, Mass Spectrometry and NMR Spectrometry

Aesthetic Response to Color Combinations: Preference, Harmony, and Similarity. Supplementary Material. Karen B. Schloss and Stephen E.

The visualization of periprostatic nerve fibers using Diffusion Tensor Magnetic Resonance Imaging with tractography

Classic experiments in sensation and perception

COMMUNITY HOSPICE & PALLIATIVE CARE NOTICE OF PRIVACY PRACTICES

Reshaping of Human Fertility Database data from long to wide format in Excel

Reactive agents and perceptual ambiguity

Randomized Block Designs 1

ncounter Data Analysis Guidelines for Copy Number Variation (CNV) Molecules That Count NanoString Technologies, Inc.

A Quick-Start Guide for rseqdiff

LC/MS/MS SOLUTIONS FOR LIPIDOMICS. Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS

ECDC HIV Modelling Tool User Manual

EHS QUICKSTART GUIDE RTLAB / CPU SECTION EFPGASIM TOOLBOX.

Section 6: Analysing Relationships Between Variables

ECDC HIV Modelling Tool User Manual version 1.0.0

DEPARTMENT OF CHEMISTRY AND CHEMICAL ORGANIC CHEMISTRY II 202-BZG-05 03

Spreadsheet signal detection

Tutorial: RNA-Seq Analysis Part II: Non-Specific Matches and Expression Measures

Transcription:

Multilevel data analysis TUTORIAL 1 1 1 Multilevel PLS-DA 1 PLS-DA 1 1 Frequency Frequency 1 1 3 5 number of misclassifications (NMC) 1 3 5 number of misclassifications (NMC) Copyright Biosystems Data Analysis Group; Universiteit van Amsterdam This is a license to use and modify SOFTWARE & DATA produced by: THE BIOSYSTEMS DATA ANALYSIS GROUP OF THE UNIVERSITEIT VAN AMSTERDAM If you use, modify or redistribute the SOFTWARE & DATA and/or your source modifications, you agree: i) to use the SOFTWARE & DATA and/or your source modifications solely as part of your research and not in any commercial product; ii) that the SOFTWARE & DATA and/or your source modifications will not be distributed for profit; iii) all copyright notices and this license note with the SOFTWARE & DATA are retained any redistribution of the SOFTWARE & DATA, or any portion thereof; (iv) to indemnify, hold harmless, and defend the "Biosystems Data Analysis Group of the Universiteit van Amsterdam" from and against any claims or lawsuits that arise or result from the use of the SOFTWARE & DATA or your source modifications (v) to properly reference the SOFTWARE & DATA when used in your reseach in any publication that may result from that research Reserved Rights The "Biosystems Data Analysis Group of the Universiteit van Amsterdam" retains title and all ownership rights to the SOFTWARE & DATA Page 1 of

Installation Copy the Toolbox on your hard disk Unzip the Toolbox in a directory, eg c:\matlab\work\cmv\toolbox Start Matlab Add directory to search path by using the addpath command addpath c:\matlab\work\cmv\toolbox; Preparation Load data sets load NMR_testset load class_labels load chemical_shifts load designmatrix Simulated test dataset The crossover designed NMR_testset contains simulated 1 H NMR urine spectra of 3 individuals (15 males and 15 females) These spectra were acquired after a long-term intervention of placebo (PLACEBO) and an antioxidant supplement (TREATMENT) Each individual in the dataset is therefore represented by urine spectra The structure of the dataset is shown in Figure 1 The PLACEBO and TREATMENT spectra are grouped in two data blocks The order of the individuals is exactly the same in both data blocks The class labels that are associated with both interventions are -1 (PLACEBO) and +1 (TREATMENT) Variables (chemical shifts) class labels Individual 1 Individual 3 1 3 5 7 3 1 3 5 7 3 PLACEBO TREATMENT Individuals -1 +1 Figure 1 : Structure of dataset and the class labels Page of

Plot the entire dataset The urine profiles of the males are coloured red, whereas the urinary profiles of the females are coloured blue (Figure ) Notice that the resonances in the region δ ppm are elevated in the female-group as compared to the males In the region δ 75-79 ppm no obvious differences can be observed plotfigure1(chemical_shifts,nmr_testset); 79 7 77 7 chemical shift (ppm) 75 7 3 1 chemical shift (ppm) 19 1 chemical shift (ppm) Figure : 1 H NMR urinary spectra in the NMR testset (urinary profiles) The spectra of the males are coloured red, and the spectra of the females are coloured blue The resonance patterns in the region δ ppm and δ 75-79 ppm are highlighted In the NMR testset two types of simulated variations are present The first is related to gender differences (between-subject variation) The second is related to the antioxidant treatment (withinsubject variation) Plot the NMR peaks of subject 1 (female) and subject (male) to illustrate the gender-related differences between the subjects (Figure 3a) plotfigure(chemical_shifts,between_testset); Plot the NMR peaks of subject 1 after placebo and antioxidant consumption to illustrate the treatment-related differences within the subjects (Figure 3b) plotfigure3(chemical_shifts,within_testset); Page 3 of

a 1 1 between-subject variation - = female (subject 1) - = male (subject ) 1 b 1 1 within-subject variation 1 1 1 - = subject 1 (PLACEBO) - = subject 1 (INTERVENTION) 1 Figure 3: 1 H NMR signals that represent the simulated differences in the testset (a) between-subjects (δ ppm) and (b) within-subjects (δ 75-79 ppm) Page of

Variation splitting Split the between-subject variation and the within-subject variation in the NMR testset and perform PCA on both sets separately Matrix containing 's and 1's defining which samples belong to which individual output [model]=msca(nmr_testset,designmatrix); data Visualize the variation in the between-subject model by means of the PC1-PC scores (Figure a) and the PC1 loadings (Figure b) Notice that Figure b corresponds with Figure 3a plotfigure(modelbetweenscores(:,1),modelbetweenscores(:,)); plotfigure5(chemical_shifts,modelbetweenloadings(:,1)); Visualize the variation in the within-subject model by means of the PC1-PC scores (Figure c) and the PC1 loadings (Figure d) Notice that Figure d corresponds with Figure 3b plotfigure(modelwithinscores(:,1),modelwithinscores(:,)); plotfigure5(chemical_shifts,modelwithinloadings(:,1)); PC 3 1-1 Males a Females PC 3 1-1 c PLACEBO -3 Between-subjects - -15-1 -5 5 1 15 PC1-3 Within-subjects TREATMENT - -15-1 -5 5 1 15 PC1 7 5 b 5 d 3 1 Between-subjects 3 1 Within-subjects -1-1 1 1 Figure : PC1-PC score plot of (a) the between-subject model and (c) the within-subject model The PC1 loadings in the (b) between-subject model and (d) the within-subject model reveal the main sources of variation (gender, antioxidant treatment) Page 5 of

Multilevel PLS-DA and cross-model validation (CMV) Determine the treatment effect in NMR testset using multilevel PLS-DA and (ordinary) PLS-DA Output multilevel PLS-DA data Values x-axis Max number of PLS factors CV repeats [MLPLSDA,PLSDA] = CMV(NMR_testset,class_labels,chemical_shifts,5,,,,[1;;5;;5]); Output PLS-DA Class labels: PLACEBO = -1 TREATMENT = +1 n-fold 1CV n-fold CV Fractions of data used in variable selection Perform a permutation test (1 CMV repeats) for both the multilevel PLS-DA model as well as for the (ordinary) PLS-DA model [MLPLSDAp,PLSDAp] = CMV_perm(NMR_testset,class_labels,chemical_shifts,5,,,,1,[1;;5;;5]); CMV repeats MLPLSDA, PLSDA, MLPLSDAp, PLSDAp are structured output-variables and contain all y-class predictions, Q values, AUROC values (area under the ROC curve) and NMC values (number of misclassifications) obtained in the performed CV1 and the CV rounds The most discriminating variables between the PLACEBO and the TREATMENT group after multilevel PLS-DA are reflected in the Rank Product (RP) In Figure 5 the lowest RP s represents the NMR signals that are strongly associated with the treatment effect plotfigure7(chemical_shifts,mlplsdacvrp(1,:)); 1 1 RP 1/ 1 1 Figure 5: Rank product (RP 1/ ) obtained after multilevel PLS-DA The NMR signals with the lowest RP s contribute most to the observed treatment effect (within the subjects) Page of

Permutation testing Compare the prediction error (NMC) of the multilevel PLS-DA model against the permutations (Figure ) [frequency,nmc]=hist(mean(mlplsdapcvnmc),[:]); plotfigure(nmc,frequency,mean(mlplsdacvnmc)); 1 1 1 Frequency 1 3 5 number of misclassifications (NMC) Figure : Comparison of the classification error (estimated in number of misclassifications) of the multilevel PLS-DA model (red) against the permutations (blue) Since the prediction error (NMC=) is much smaller as compared to the permutations, the statistically significance of this multilevel PLS-DA model is confirmed (p<5) Similar to the previous example, also the Q values and the AUROC values can be used as the prediction error As shown in Figure 7, the classification error of the PLSDA model (NMC=5) is substantially larger [frequency,nmc]=hist(mean(plsdapcvnmc),[:]); plotfigure(nmc,frequency,mean (PLSDACVNMC)); 1 1 1 1 Frequency 1 3 5 number of misclassifications (NMC) Figure 7: Comparison of the classification error (estimated in number of misclassifications) of the ordinary PLS-DA model (red) against the permutations (blue) Page 7 of

Using own data When analyzing your own data set, the following considerations should be taken into account: 1- The new data should derive from a crossover designed experiment with a similar data structure as the NMR testset (Figure 1) - The designmatrix in the variation splitting procedure (msca) should be adjusted in such a way that the number of rows is equal to *I (I = number of individuals) and the numbers of columns is equal to I The structure of this new matrix should be similar to the designmatrix in the toolbox 3- The number of CMV repeats (permutations), the number of CV repeats, number of PLS components, the n-fold of the 1CV and CV, and the number of PLS can be changed according the requirements of the user Page of