LC-MS. Pre-processing (xcms) W4M Core Team. 29/05/2017 v 1.0.0

Similar documents
Unsupervised Identification of Isotope-Labeled Peptides

SimLipid 3.5. Manual PREMIER. Biosoft International Corina Way, Palo Alto, CA Tel: (650) FAX: (650)

MS/MS Library Creation of Q-TOF LC/MS Data for MassHunter PCDL Manager

Metabolite identification in metabolomics: Database and interpretation of MSMS spectra

SimGlycan. A high-throughput glycan and glycopeptide data analysis tool for LC-, MALDI-, ESI- Mass Spectrometry workflows.

AB Sciex QStar XL. AIMS Instrumentation & Sample Report Documentation. chemistry

Sample Preparation is Key

Metabolite identification in metabolomics: Metlin Database and interpretation of MSMS spectra

Moving from targeted towards non-targeted approaches

Targeted and untargeted metabolic profiling by incorporating scanning FAIMS into LC-MS. Kayleigh Arthur

Primary Structure Analysis. Automated Evaluation. LC-MS Data Sets

Databehandling. 3. Mark e.g. the first fraction (1: 0-45 min, 2: min, 3; min, 4: min, 5: min, 6: min).

The Analysis of Isotopically Labeled Propylene Glycol in ecigarettes

J. A. Mayfield et al. FIGURE S1. Methionine Salvage. Methylthioadenosine. Methionine. AdoMet. Folate Biosynthesis. Methylation SAH.

Data Independent MALDI Imaging HDMS E for Visualization and Identification of Lipids Directly from a Single Tissue Section

MSSimulator. Simulation of Mass Spectrometry Data. Chris Bielow, Stephan Aiche, Sandro Andreotti, Knut Reinert FU Berlin, Germany

A NOVEL METHOD OF M/Z DRIFT CORRECTION FOR OA-TOF MASS SPECTROMETERS BASED ON CONSTRUCTION OF LIBRARIES OF MATRIX COMPONENTS.

[ APPLICATION NOTE ] High Sensitivity Intact Monoclonal Antibody (mab) HRMS Quantification APPLICATION BENEFITS INTRODUCTION WATERS SOLUTIONS KEYWORDS

Feature selection methods for early predictive biomarker discovery using untargeted metabolomic data

INTRODUCTION CH 3 CH CH 3 3. C 37 H 48 N 6 O 5 S 2, molecular weight Figure 1. The Xevo QTof MS System.

LC/QTOF Discovery of Previously Unreported Microcystins in Alberta Lake Waters

UPLC-HRMS: A tool for multi-residue veterinary drug methods

Workflows in SIEVE and LipidSearch

SIEVE 2.1 Proteomics Example

Thermo Scientific LipidSearch Software for Lipidomics Workflows. Automated Identification and Relative. Quantitation of Lipids by LC/MS

General Single Ion Calibration. Pete 14-May-09

CEU MASS MEDIATOR USER'S MANUAL Version 2.0, 31 st July 2017

Comprehensive Forensic Toxicology Screening in Serum using On-Line SPE LC-MS/MS

NON TARGETED SEARCHING FOR FOOD

Ultra High Definition Optimizing all Analytical Dimensions

Robust Peak Detection and Alignment of nanolc-ft Mass Spectrometry Data

How to Use TOF and Q-TOF Mass Spectrometers

Don t miss a thing on your peptide mapping journey How to get full coverage peptide maps using high resolution accurate mass spectrometry

Lipid annotation with MS2Analyzer. Yan Ma 10/24/2013

For personal use only. Please do not reuse or reproduce

Figure S6. A-J) Annotated UVPD mass spectra for top ten peptides found among the peptides identified by Byonic but not SEQUEST + Percolator.

CSDplotter user guide Klas H. Pettersen

Impurity Identification using a Quadrupole - Time of Flight Mass Spectrometer QTOF

Advancing your Forensic Toxicology Analyses; Adopting the Latest in Mass Spectrometry Innovations

Automating Mass Spectrometry-Based Quantitative Glycomics using Tandem Mass Tag (TMT) Reagents with SimGlycan

Automated Lipid Identification Using UPLC/HDMS E in Combination with SimLipid

Improving Coverage of the Plasma Lipidome Using Iterative MS/MS Data Acquisition Combined with Lipid Annotator Software and 6546 LC/Q-TOF

Application of LC/Electrospray Ion Trap Mass Spectrometry for Identification and Quantification of Pesticides in Complex Matrices

Lipididentifizierung in der LC-MS-basierten Lipidomik mittels einer Kombination aus SWATH und DMS

Measuring Lipid Composition LC-MS/MS

Impact of Chromatography on Lipid Profiling of Liver Tissue Extracts

Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev

Lipidomic Analysis by UPLC-QTOF MS

Towards High Resolution MS in Regulated Bioanalysis

Identification of Haemoglobinopathies by LC/MS

Dividing up the data

Introduction to the Oligo HTCS Systems. Novatia, LLC

Package LipidMS. July 12, 2018

DNA Sequence Bioinformatics Analysis with the Galaxy Platform

Agilent 6410 Triple Quadrupole LC/MS. Sensitivity, Reliability, Value

Biological Mass Spectrometry. April 30, 2014

Latest Innovations in LC/MS/MS from Waters for Metabolism and Bioanalytical Applications

Summary of Analytical Method for Quantitative Estimation of Fingolimod and Fingolimod Phosphate from Human Whole Blood Samples

Chemical Analysis Business Operations Waters Corporation Milford MA

Analysis of Testosterone, Androstenedione, and Dehydroepiandrosterone Sulfate in Serum for Clinical Research

Rapid, Simple Impurity Characterization with the Xevo TQ Mass Spectrometer

CHARACTERIZATION AND DETECTION OF OLIVE OIL ADULTERATIONS USING CHEMOMETRICS

SYNAPT G2-S High Definition MS (HDMS) System

Bioanalytical Quantitation of Biotherapeutics Using Intact Protein vs. Proteolytic Peptides by LC-HR/AM on a Q Exactive MS

Michal Godula Thermo Fisher Scientific. The world leader in serving science

Automatic Lung Cancer Detection Using Volumetric CT Imaging Features

Profiling Analysis of Polysulfide Silane Coupling Agent

LC/MS/MS SOLUTIONS FOR LIPIDOMICS. Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS

Exercises: Differential Methylation

Supporting Information. Lysine Propionylation to Boost Proteome Sequence. Coverage and Enable a Silent SILAC Strategy for

Discovering Meaningful Cut-points to Predict High HbA1c Variation

Cannabinoid Profiling and Quantitation in Hemp Extracts using the Agilent 1290 Infinity II/6230B LC/TOF system

High resolution mass spectrometry for bioanalysis at Janssen. Current experiences and future perspectives

Gene Expression Analysis Web Forum. Jonathan Gerstenhaber Field Application Specialist

Increased Identification Coverage and Throughput for Complex Lipidomes

Early Learning vs Early Variability 1.5 r = p = Early Learning r = p = e 005. Early Learning 0.

Ultra Performance Liquid Chromatography Coupled to Orthogonal Quadrupole TOF MS(MS) for Metabolite Identification

New Mass Spectrometry Tools to Transform Metabolomics and Lipidomics

Quantification of PtdInsP 3 molecular species in cells and tissues by mass spectrometry

Phospholipid characterization by a TQ-MS data based identification scheme

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

Comparison of Relative Quantification of Monoclonal Antibody N-glycans Using Fluorescence and MS Detection

Supporting information

High Resolution Glycopeptide Mapping of EPO Using an Agilent AdvanceBio Peptide Mapping Column

Group-Wise FMRI Activation Detection on Corresponding Cortical Landmarks

Small Molecule Science: Experimental designs for achieving ultra trace analysis

Data mining with Ensembl Biomart. Stéphanie Le Gras

for new contaminants at ultra trace level by using high resolution mass spectrometry

Titrations in Cytobank

Amadeo R. Fernández-Alba

Use Case 9: Coordinated Changes of Epigenomic Marks Across Tissue Types. Epigenome Informatics Workshop Bioinformatics Research Laboratory

Comparison of mass spectrometers performances

Bioactivity Based Molecular Networking for the Discovery of Drug Lead in Natural Product Bioassay-Guided Fractionation

Extended Mass Range Triple Quadrupole for Routine Analysis of High Mass-to-charge Peptide Ions

Outlier Analysis. Lijun Zhang

Quantitative Analysis of -Hydroxybutyrate in Hair Using Target Analyte Finding Processing of Comprehensive GC-HRT Data

Characterization of an Unknown Compound Using the LTQ Orbitrap

What You Can t See Can Hurt You. How MS/MS Specificity Can Bite Your Backside

Identification & Confirmation of Structurally Related Degradation Products of Simvastatin

Comprehensive Two-Dimensional HPLC and Informative Data Processing for Pharmaceuticals and Lipids

Transcription:

LC-MS Pre-processing (xcms) W4M Core Team 29/05/2017 v 1.0.0

Acquisition files upload and pre-processing with xcms: extraction, alignment and retention time drift correction. SECTION 1 2

LC-MS Data What is provided by the mass spectrometers What we want for data analysis

Extraction with XCMS R based software, Free A lot of parameters to tune, No graphical interface Need to write a R script Web for documentation: https://bioconductor.org/packages/release/bioc/html/xcms.html Forums : https://groups.google.com/forum/#!forum/xcms http://metabolomics-forum.com

Extraction with XCMS Extraction Extraction of ions in each sample independantly. Grouping alignment Each ion is aligned across all samples Retention time correction (optional) Fill peaks Replace missing data with baseline value CAMERA Annotation Statistics and visualisation (optionals) CAMERA For annotation of adducts, neutral loss and isotopes

Extraction with XCMS CAMERA Annotation of Adduct Fragments and isotopes

DATA 7

Data Raw data: mzxml, mzml, mzdata and netcdf samplemetadata HU_neg_011_b2 HU_neg_014_b2 Blank04 Blank05 bio bio blank blank Add some informations for further steps 8

Data Raw data: mzxml, mzml, mzdata and netcdf samplemetadata samples class sampletype subset full injectionorder batch age HU_neg_011_b2 bio sample 0 1 44 ne1 19 HU_neg_014_b2 bio sample 0 1 57 ne2 22 Blank04 blank blank 0 1 16 ne2 NA Blank05 blank blank 0 1 29 ne1 NA Add some informations for further steps 9

Two strategies 10

Two strategies 11 The "old" system -> files are nested in folders for their groups within a zip file + The folders set the group of the files for xcms.group + Only one import and one step - xcmsset is limited to 6 CPUs - The files aren't integrated into the history and can't be visualized (one day)

Two strategies The "brand new" system -> files are uploaded individually and processed in parallel - The xcmsset outputs have to be merged before using group - A samplemetadata file must be used to set the group (but you need one for some further steps anyway) + One xcmsset job is launch for each input file. It is highly parallelizable + The files are completely integrated in Galaxy and can be one day vizualized + A better transparency 12

Dataset Collection Dataset collection allow to group N datasets in 1 wrap / collection A Dataset collection depending of the tool will process nested datasets In one step tool In parallel xcmsset xcmsset xcmsset 13

Dataset Collection 14

Dataset Collection 15

Dataset Collection 1 16

Dataset collection XCMSSET 17

Dataset Collection Dataset collection allow to group N datasets in 1 wrap / collection A Dataset collection depending of the tool will process nested datasets In one step tool In parallel xcmsset xcmsset xcmsset 18

Dataset Collection 19

Dataset Collection 20

Dataset Collection 1 21

RUN! XCMSSET 22

mzxml raw file mzxml in a text editor 1 scan 23

mzxml raw file informations Real life example : m/z 187 fichier HU_neg_091.mzXML scan # RetTime (sec) basepeakmz int TIC %TIC delta ppm delta dalton 724 374.013 187.006423950195 1.11E+07 2.66E+07 42% 725 374.511 187.006896972656 3.26E+07 5.25E+07 62% 2.5 0.000473 726 374.996 187.007186889648 5.14E+07 7.89E+07 65% 1.6 0.000290 727 375.478 187.007324218750 6.19E+07 9.28E+07 67% 0.7 0.000137 728 375.955 187.007125854492 7.13E+07 1.05E+08 68% 1.1 0.000198 729 376.432 187.006988525391 7.34E+07 1.08E+08 68% 0.7 0.000137 730 376.906 187.006942749023 7.62E+07 1.10E+08 69% 0.2 0.000046 731 377.380 187.006942749023 6.98E+07 1.05E+08 67% 0.0 0.000000 732 377.861 187.006942749023 5.94E+07 9.00E+07 66% 0.0 0.000000 733 378.330 187.006713867188 5.79E+07 8.89E+07 65% 1.2 0.000229 734 378.805 187.006942749023 5.06E+07 7.77E+07 65% 1.2 0.000229 735 379.283 187.006484985352 4.33E+07 6.89E+07 63% 2.4 0.000458 736 379.762 187.006622314453 3.87E+07 6.19E+07 62% 0.7 0.000137 737 380.241 187.006576538086 3.14E+07 5.36E+07 58% 0.2 0.000046 738 380.720 187.006347656250 2.49E+07 4.96E+07 50% 1.2 0.000229 739 381.204 187.006439208984 1.98E+07 5.02E+07 39% 0.5 0.000092 740 381.684 187.006515502930 1.25E+07 3.56E+07 35% 0.4 0.000076 741 382.179 187.006393432617 1.11E+07 3.86E+07 29% 0.7 0.000122 187.0080 m/z deviation RT range m/z median 8.166 187.006805419922 187.0075 187.0070 187.0065 2.5 1.6 0.7 1.1 0.7 0.2 0.0 0.0 1.2 1.2 2.4 0.7 0.2 1.2 0.5 0.4 0.7 Scan to scan m/z deviation 187.0060 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 24

mzxml raw file informations Real life example : m/z 187 fichier HU_neg_091.mzXML scan # RetTime (sec) basepeakmz int TIC %TIC delta ppm delta dalton 724 374.013 187.006423950195 1.11E+07 2.66E+07 42% 725 374.511 187.006896972656 3.26E+07 5.25E+07 62% 2.5 0.000473 726 374.996 187.007186889648 5.14E+07 7.89E+07 65% 1.6 0.000290 727 375.478 187.007324218750 6.19E+07 9.28E+07 67% 0.7 0.000137 728 375.955 187.007125854492 7.13E+07 1.05E+08 68% 1.1 0.000198 729 376.432 187.006988525391 7.34E+07 1.08E+08 68% 0.7 0.000137 730 376.906 187.006942749023 7.62E+07 1.10E+08 69% 0.2 0.000046 731 377.380 187.006942749023 6.98E+07 1.05E+08 67% 0.0 0.000000 732 377.861 187.006942749023 5.94E+07 9.00E+07 66% 0.0 0.000000 733 378.330 187.006713867188 5.79E+07 8.89E+07 65% 1.2 0.000229 734 378.805 187.006942749023 5.06E+07 7.77E+07 65% 1.2 0.000229 735 379.283 187.006484985352 4.33E+07 6.89E+07 63% 2.4 0.000458 736 379.762 187.006622314453 3.87E+07 6.19E+07 62% 0.7 0.000137 737 380.241 187.006576538086 3.14E+07 5.36E+07 58% 0.2 0.000046 738 380.720 187.006347656250 2.49E+07 4.96E+07 50% 1.2 0.000229 739 381.204 187.006439208984 1.98E+07 5.02E+07 39% 0.5 0.000092 740 381.684 187.006515502930 1.25E+07 3.56E+07 35% 0.4 0.000076 9.00E+07 741 382.179 187.006393432617 1.11E+07 3.86E+07 29% 0.7 0.000122 8.00E+07 RT range m/z median 8.166 187.006805419922 7.00E+07 6.00E+07 5.00E+07 4.00E+07 3.00E+07 2.00E+07 1.00E+07 peak width = 8.2 0.00E+00 373 374 375 376 377 378 379 380 381 382 383 25

xcms Extraction centwave algorithm The algorithm aims detecting «Mass traces» or «region of interest» (ROI) which are defined as regions with less than a defined deviation of m/z in consecutive scans. This deviation must be lower than the value of the parameter «ppm» The value (unit ppm) has to be set according to mass spectrometer accuracy. ROI mass intensities are then used to define the chomatographic peak with continuous wavelet transform algorithm. peakwidth (min, max) has to be set for this step. 20,50 for HPLC 5,12 for UPLC Tautenhahn R. BMC Bioinformtics 2008

xcms Extraction algorithms MatchedFilter is dedicated to centroid or profile low resolution MS data Centwave is dedicated to centroid high resolution MS data 27

xcms Extraction centwave algorithm Centwave chromatographic peaks detection. Tautenhahn R., BioInformatics, 2008

xcms Extraction centwave algorithm Centwave chromatographic peaks detection. Tautenhahn R., BioInformatics, 2008

centwave basic parameters CAMERA Annotation of Adduct Fragments and isotopes

centwave noise and coeluted peaks CAMERA Annotation of Adduct Fragments and isotopes

xcms centwave parameters xcms forum : How to choose peakwidth? "The main purpose of the peakwidth parameter is to roughly estimate the peak width range, this parameter is not a threshold. The wavelets used for peak detection are calculated from this parameter. If you use HPLC and your peaks are normally 20-60 s wide (base peak with), just go with that, i.e. peakwidth=c(20,60) centwave will still detect peaks that are 15s or 80 s wide! Important: Do not choose the minimum peak width too small, it will not increase sensitivity, but cause peaks to be split." Example: peak width ~ 45 s Using peakwidth = c(20,60) the peak will be split in three peaks, each detected as a ~10s wide separate peak (since they are separated by a local minimum) : using peakwidth = c(20,120) will keep the peak intact :

xcms extraction output CAMERA Annotation of Adduct Fragments and isotopes

xcms extraction output CAMERA Annotation of Adduct Fragments and isotopes When a zip file of is used a samplemetadata.tsv is created at this step. It must contains all informations needed for further analyses: batch correction and statistical analyses. This file must be downloaded in order to add all these informations and then uploaded.

xcms extraction output CAMERA Annotation of Adduct Fragments and isotopes

extraction parameters summary xcms steps xcms parameters related to description examples Extraction ppm m/z fluctuation of m/z value (ppm) from scan to scan. 5 (xcmsset) Depends on the mass spectrometer resolution peakwidth retention time range of chromatographic peak width (second) UPLC 5,20 HPLC 10,40 mzdiff m/z and retention time Minimum difference of mz for peaks with overlapping retention time (coeluting peak). Must be negative to allow overlap. -0.001 or 0.05 Prefilter Intensity A peak must be present in n scans with an intensity n=3,k=1000 greater than k. snthresh Intensity Ratio signal/noise threshold 3 noise Intensity Each centroid must be greater than "noise" value.

xcmsset 37

«Grouping» step Independant peaklists pool1b1 pool1b2 pool1b3 mz rt int mz rt int mz rt int 196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325 158.1180 67.4 71736 342.0310 69.0 74594 158.1173 67.4 82969 342.0308 67.6 202268 267.0581 65.5 260877 342.0308 21.3 2581 267.0581 65.5 282039 283.0318 65.2 424631 283.0320 65.3 357448 Group ions by m/z Group by retention time mz rt int mz rt int mz rt int 196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325 158.1180 67.4 71736 342.0310 69.0 74594 158.1173 67.4 82969 342.0308 67.6 202268 267.0581 65.5 260877 342.0308 21.3 2581 267.0581 65.5 282039 283.0318 65.2 424631 283.0320 65.3 357448 mz rt int mz rt int mz rt int 196.0905 66.6 7810936 196.0910 66.7 11733921 196.0902 66.6 7933325 158.1180 67.4 71736 158.1173 67.4 82969 342.0308 21.3 2581 342.0308 67.6 202268 342.0310 69.0 74594 267.0581 65.5 282039 267.0581 65.5 260877 283.0318 65.2 424631 283.0320 65.3 357448 Resulting matrix mz rt pool1b1 pool1b2 pool1b3 196.0905 66.6 7810936 11733921 7933325 158.1176 67.4 71736 82969 342.0308 21.3 2581 342.0309 68.3 202268 74594 267.0581 65.5 282039 260877 283.0319 65.2 424631 357448

xcms alignment group First step, a binning of mass domain is performed. The size of the bin is defined by mzwid parameter. Then for each mz bin, all ions of all samples are taken into account for all retention times. Kernel density estimator method is used to detect region of retention time with high density of ions. mzwid

xcms alignment group A gaussian model group together peaks with simillar retention time. The inclusivness of ions in a group is defined by the the standard deviation of the gaussian model (bandwith) corresponding to of the bw parameter xcms. This parameter can be interpreted as a retention time window. Vertical dash lines indicates that the feature is valid and will be retain in the data Matrix To be valid, the number of peaks in a group must be greater than the a percentage of the total number of samples. This threshold is defined by the minfrac parameter. mzwid Problem bw = 30 sec

xcms alignment group mzwid bw = 30 sec Problem Solved bw = 10 sec Decreasing bw allows to separate these 2 groups. The resulting m/z and retention time of the feature correspond to the median of m/z and RT of all ions grouped together as a single feature.

Minfrac parameter for group Minfrac = 0.5 4 samples in each group m/z Minfrac = minimum sample detected in at least one class to be considered as a group RT m/z RT 42

Minfrac parameter for group Minfrac = 0.5 4 samples in each group m/z RT

group interface CAMERA Annotation of Adduct Fragments and isotopes

xcms group output mzwid define the intervals of m/z CAMERA Annotation of Adduct Fragments and isotopes bw define the width of the gaussian curve

xcms group output Two distinct m/z merge as one group. Mzwid and bw too large

xcms group output Two distinct m/z are separated by decreasing bandwith value.

grouping parameters summary xcms steps Alignment (group) xcms parameters related to definition examples mzwid m/z Size of mz slices (bins). Range of m/z to be included in a group. Depends on mass spectrometer accuracy. bw retention standart deviation of the gaussian metapeak that group time together peaks minfrac samples A group to be valid must be found in minfrac*total n=10, number of samples in each subfolder of datafiles. minfrac=0.5 minfrac=0.5 correspond to 50%. found in at least 5 max number of ions Maximum number of groups detected in a single mz slices. 10 or 50

xcms workflow retcor CAMERA Annotation of Adduct Fragments and isotopes

xcms retcor output CAMERA Annotation of Adduct Fragments and isotopes retcor improving retention time must be followed by a second group step.

xcms retcor output Modification of the degre of smoothing Span = 0.8 CAMERA Annotation of Adduct Fragments and isotopes

Parameters for retcor Missing = 1 4 samples in each group m/z RT Extra = 1 m/z RT 52

xcms retcor obiwarp 53

retcor parameters summary xcms steps Retention time correction (retcor) xcms parameters smooth method related to description examples retention time Regression model to model time deviation among samples (linear or loess) linear or loess span degree of smoothing of the loess model. 0.2 to 1 extra samples number of "extra" peaks use to define reference peaks default=1 (or well behaved peaks) for modeling time deviation. Number of Peaks > number of samples. missing samples number of samples without reference peaks. If blank samples are used, missing = number of blanks. ploytype retention time Define the graphical visualistion of the effect of the model on retention time correction. number of blank samples deviation

Second grouping As retcor improved retention drift among samples a new grouping is mandatory to take advantage of this correction. bw parameter can thus be set to a smaller value than in the first group step. CAMERA Annotation of Adduct Fragments and isotopes

xcms fillpeaks Filling method: «chrom» for LCMS «MSW» for direct injection. CAMERA Annotation of Adduct Fragments and isotopes

MS data processing Report creation and Annotations Yann GUITTON 29/05/2017 v 1.0.0