MSSimulator. Simulation of Mass Spectrometry Data. Chris Bielow, Stephan Aiche, Sandro Andreotti, Knut Reinert FU Berlin, Germany

Similar documents
Nature Biotechnology: doi: /nbt Supplementary Figure 1

Unsupervised Identification of Isotope-Labeled Peptides

NIH Public Access Author Manuscript J Proteome Res. Author manuscript; available in PMC 2014 July 05.

Don t miss a thing on your peptide mapping journey How to get full coverage peptide maps using high resolution accurate mass spectrometry

MASS SPECTROMETRY BASED METABOLOMICS. Pavel Aronov. ABRF2010 Metabolomics Research Group March 21, 2010

Lecture 3. Tandem MS & Protein Sequencing

Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev

Quantification with Proteome Discoverer. Bernard Delanghe

Metabolite identification in metabolomics: Metlin Database and interpretation of MSMS spectra

Characterization of an Unknown Compound Using the LTQ Orbitrap

MALDI-TOF. Introduction. Schematic and Theory of MALDI

Proteomics/Peptidomics

Mass Spectrometry Infrastructure

Comparison of mass spectrometers performances

Mass Spectrometry. Mass spectrometer MALDI-TOF ESI/MS/MS. Basic components. Ionization source Mass analyzer Detector

The Comparison of High Resolution MS with Triple Quadrupole MS for the Analysis of Oligonucleotides

Introduction to Proteomics 1.0

2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry

MALDI Imaging Mass Spectrometry

Biomolecular Mass Spectrometry

1. Sample Introduction to MS Systems:

SimGlycan. A high-throughput glycan and glycopeptide data analysis tool for LC-, MALDI-, ESI- Mass Spectrometry workflows.

REDOX PROTEOMICS. Roman Zubarev.

Shotgun Proteomics MS/MS. Protein Mixture. proteolysis. Peptide Mixture. Time. Abundance. Abundance. m/z. Abundance. m/z 2. Abundance.

Quantification by Mass Spectrometry

New Instruments and Services

Metabolite identification in metabolomics: Database and interpretation of MSMS spectra

LC-MS. Pre-processing (xcms) W4M Core Team. 29/05/2017 v 1.0.0

Learning Objectives. Overview of topics to be discussed 10/25/2013 HIGH RESOLUTION MASS SPECTROMETRY (HRMS) IN DISCOVERY PROTEOMICS

Biological Mass spectrometry in Protein Chemistry

New Instruments and Services

FOURIER TRANSFORM MASS SPECTROMETRY

Automating Mass Spectrometry-Based Quantitative Glycomics using Tandem Mass Tag (TMT) Reagents with SimGlycan

Mass Spectrometry Course Árpád Somogyi Chemistry and Biochemistry MassSpectrometry Facility) University of Debrecen, April 12-23, 2010

Advances in Hybrid Mass Spectrometry

Supporting Information. Lysine Propionylation to Boost Proteome Sequence. Coverage and Enable a Silent SILAC Strategy for

Introduction to the Oligo HTCS Systems. Novatia, LLC

NON TARGETED SEARCHING FOR FOOD

Mass Spectrometry. - Introduction - Ion sources & sample introduction - Mass analyzers - Basics of biomolecule MS - Applications

Ion Source. Mass Analyzer. Detector. intensity. mass/charge

Fundamentals of Soft Ionization and MS Instrumentation

Application Note # ET-17 / MT-99 Characterization of the N-glycosylation Pattern of Antibodies by ESI - and MALDI mass spectrometry

Rapid, Simple Impurity Characterization with the Xevo TQ Mass Spectrometer

Bioanalytical Quantitation of Biotherapeutics Using Intact Protein vs. Proteolytic Peptides by LC-HR/AM on a Q Exactive MS

For personal use only. Please do not reuse or reproduce

LABORATÓRIUMI GYAKORLAT SILLABUSZ SYLLABUS OF A PRACTICAL DEMONSTRATION. financed by the program

Primary Structure Analysis. Automated Evaluation. LC-MS Data Sets

ION MOBILITY COUPLED TO HIGH RESOLUTION MASS SPECTROMETRY: THE POSSIBILITIES, THE LIMITATIONS

Enhancing Sequence Coverage in Proteomics Studies by Using a Combination of Proteolytic Enzymes

The 1997 ABRF Mass Spectrometry Committee Collaborative Study: Identification of Phosphopeptides in a Tryptic Digest of Apomyoglobin

Solving practical problems. Maria Kuhtinskaja

PTM Discovery Method for Automated Identification and Sequencing of Phosphopeptides Using the Q TRAP LC/MS/MS System

4th Multidimensional Chromatography Workshop Toronto (January, 2013) Herman C. Lam, Ph.D. Calibration & Validation Group

Ultra Performance Liquid Chromatography Coupled to Orthogonal Quadrupole TOF MS(MS) for Metabolite Identification

Metabolomics: quantifying the phenotype

Figure S6. A-J) Annotated UVPD mass spectra for top ten peptides found among the peptides identified by Byonic but not SEQUEST + Percolator.

SUPPORTING INFORMATION. Lysine Carbonylation is a Previously Unrecognized Contributor. to Peroxidase Activation of Cytochrome c by Chloramine-T

Moving from targeted towards non-targeted approaches

Protein sequence mapping is commonly used to

Methods in Mass Spectrometry. Dr. Noam Tal Laboratory of Mass Spectrometry School of Chemistry, Tel Aviv University

Characterization of Disulfide Linkages in Proteins by 193 nm Ultraviolet Photodissociation (UVPD) Mass Spectrometry. Supporting Information

Mass spectra of peptides and proteins - and LC analysis of proteomes Stephen Barnes, PhD

Sequence Identification And Spatial Distribution of Rat Brain Tryptic Peptides Using MALDI Mass Spectrometric Imaging

Technical Note # TN-31 Redefining MALDI-TOF/TOF Performance

New Developments in LC-IMS-MS Proteomic Measurements and Informatic Analyses

Amadeo R. Fernández-Alba

More structural information with MS n

Supporting information

MS/MS to Targeted Proteomics (MRM)

Ultra High Definition Optimizing all Analytical Dimensions

UPLC-HRMS: A tool for multi-residue veterinary drug methods

Robust Peak Detection and Alignment of nanolc-ft Mass Spectrometry Data

Application Note # LCMS-89 High quantification efficiency in plasma targeted proteomics with a full-capability discovery Q-TOF platform

Time (min) Supplementary Figure 1: Gas decomposition products of irradiated DMC.

MASS SPECTROMETRY IN METABOLOMICS

High-sensitivity Orbitrap mass analysis of intact macromolecular assemblies. R. J. Rose, E. Damoc, E. Denisov, A. Makarov, A. J. R.

Mass spectrometry based proteomics

Quadrupole and Ion Trap Mass Analysers and an introduction to Resolution

Databehandling. 3. Mark e.g. the first fraction (1: 0-45 min, 2: min, 3; min, 4: min, 5: min, 6: min).

Increased Identification Coverage and Throughput for Complex Lipidomes

Identification of Haemoglobinopathies by LC/MS

AB Sciex QStar XL. AIMS Instrumentation & Sample Report Documentation. chemistry

Nature Methods: doi: /nmeth.3177

One Gene, Many Proteins. Applications of Mass Spectrometry to Proteomics. Why Proteomics? Raghothama Chaerkady, Ph.D.

High-Throughput Analysis of Oligonucleotides using Automated Electrospray Ionization Mass Spectrometry

Targeted and untargeted metabolic profiling by incorporating scanning FAIMS into LC-MS. Kayleigh Arthur

Profiling the Distribution of N-Glycosylation in Therapeutic Antibodies using the QTRAP 6500 System

O O H. Robert S. Plumb and Paul D. Rainville Waters Corporation, Milford, MA, U.S. INTRODUCTION EXPERIMENTAL. LC /MS conditions

Glycerolipid Analysis. LC/MS/MS Analytical Services

Application of LC/Electrospray Ion Trap Mass Spectrometry for Identification and Quantification of Pesticides in Complex Matrices

Extended Mass Range Triple Quadrupole for Routine Analysis of High Mass-to-charge Peptide Ions

SUNY UPSTATE MEDICAL UNIVERSITY PROTEOMICS CORE

Mass Spectrometry at the Laboratory of Food Chemistry. Edwin Bakx Laboratory of Food Chemistry Wageningen University

FOURIER TRANSFORM MASS SPECTROMETRY

FOURIER TRANSFORM MASS SPECTROMETRY

for the Identification of Phosphorylated Peptides

4-Plex itraq Based Quantitative Proteomic Analysis Using an Agilent Accurate -Mass Q-TOF

[ APPLICATION NOTE ] High Sensitivity Intact Monoclonal Antibody (mab) HRMS Quantification APPLICATION BENEFITS INTRODUCTION WATERS SOLUTIONS KEYWORDS

Proteins: Proteomics & Protein-Protein Interactions Part I

ABSTRACT. Catherine Fenselau, Professor, Department of Chemistry and Biochemistry

Transcription:

Chris Bielow Algorithmic Bioinformatics, Institute for Computer Science MSSimulator Chris Bielow, Stephan Aiche, Sandro Andreotti, Knut Reinert FU Berlin, Germany Simulation of Mass Spectrometry Data

Motivation Digestion (trypsin) Cañas et. Al 2006 2

Motivation Gradient length Matrix type, ESI voltage, X Ion Trap Orbitrap, FT-ICR, TOF X SILAC MS E ICAT MeCAT itraq = a Create myriada of diverse different database MS (orwith MS/MS) manual setups annotation Simulation BUT insufficient data for algorithm development 3

Outline Capabilities of MSSimulator Realism of generated data Algorithm Benchmarking 4

The Big Picture FASTA >sp P02586 TNNC2... [ # intensity=120 # ] MTDQQAEARSYLSEEMAAFDMFDADGGGDISVKELGTVMRM Models & Parameters (e.g., SVM, ) MSSimulator Contaminants "Methanol,CH3OH,1622.6,.. Digestion Separation Ionization MS MS/MS RAW Data mzml Feature Data position, charge featurexml Relation Data labeling pairs, charge pairs consensusxml 5

Digestion Naïve digestion enzyme Digestion Separation Ionization MS MS/MS # missed cleavages (not site specific) 6

Digestion Naïve digestion Trained model Siepen et al, 2007 Digestion Separation Ionization MS MS/MS 7

Separation Capillary Electrophoresis HPLC via SVR Pfeiffer et al, 2007 Digestion Separation Ionization MS MS/MS 30 MT [min] 15 30 RT [min] 15 500 m/z [Th] 1000 500 m/z [Th] 1000 8

Separation RT dimension: Exponential gaussian hybrid (EGH) ì ï f = í ï î ï æ -( t - t H exp R ) 2 ç è 2s 2 g +t t - t R ( ) ö, 2s 2 g +t t - t R ø Lan et al, 2001 ( ) > 0 0, 2s 2 g +t ( t - t R ) 0 Digestion Separation Ionization MS MS/MS 9

Detectability and Ionization Detectability thresholded SVC (Schulz-Trieglaff et al, 2008) Digestion Separation Ionization MS MS/MS Ionization MALDI simulated real ESI P(q = 1) = p q1 P(q = 2) = p q2 B(n; p) 10

intensity intensity intensity MS Signal Components Isotope distribution 1n 2n MW 500 Digestion Separation Ionization MS MS/MS m/z MW 1000 m/z MW 2500 m/z 11

MS Signal Components Isotope distribution convolved with Lorentzian Gaussian Digestion Separation Ionization MS MS/MS Lorentzian Gaussian m/z 12

MS Signal Convolved Raw signal is a convolution of RT and m/z signal m/z 13

Capabilities MS/MS Signal Digestion Separation Ionization MS MS/MS http://www.astbury.leeds.ac.uk/facil/mstut/mstutorial_files/qtof.jpg 14

Capabilities MS/MS Signal Digestion Separation Ionization MS MS/MS 15

Capabilities MS/MS Signal Zhou et al 2008 Peptide sequence 35 features encoding for every b- and y-ion Digestion Separation Ionization MS MS/MS SVM classification: prediction of fragment presence / absence SVM regression: prediction of fragment peak intensity neutral losses + charge variants with predefined intensity MS/MS spectrum neutral losses + charge variants with probabilistic intensity model 16

Realism of Data Assume you create chocolate real chocolate imitate the real chocolate 17

Realism of Data Two different ways to convince yourself about the quality of your new chocolate bar: Bottom-up: IMITATE BUILDING BLOCKS Each basic simulation step is ideally based on an accepted physical model and evaluated with an accepted measure against real data. Top-down: EAT THE CHOCOLATE The result of the simulation is used in subsequent analysis steps (e.g. MS/MS Identification) and deemed good if similar results are obtained. 18

Realism of Data MSSimulator hence combines both approaches: Bottom-up: MSSimulator uses published models Digestions (missed cleavages) RT/MT prediction Detectability MS/MS ion ladder prediction Top-down: Lets have a look 19

Top-down Q-TOF Simulated Real 20

Top-down Simulated FTMS Real 21

Algorithm Benchmarking SILAC labeled data simulated with a Q-TOF preset 22

Use Case SILAC Data Imagine you want to compare two tools that can perform an expression analysis, ASAPRatio and XPRESS (both TPP). ASAPRatio is newer and hence should perform better. You want to convince yourself. 23

Use Case SILAC Data Lab Simulation Measure data 2h 10min 1:1 1:2 1:4 1:10 Simulate data Identification 1h Conversion to pepxml Run XPRESS/ASAPRatio 1h 1h Run XPRESS/ASAPRatio Performance comparison using manual annotation 2d 2min Performance comparison using ground truth 24

Use Case SILAC Data These results indicate that the newer ASAPRatio performs better then XPRESS 25

More Use Cases Experimental setup optimization Increasing the HPLC gradient time will increase the number of identified peptides! Increasing resolution will improve feature finding, even though aquisition time will go up! 26

More Use Cases Experimental setup optimization Protein Count Dyn Range Resolution Gradient Length Noise 27

More Use Cases feature finding performance by in-silico spike in 28

More Use Cases feature finding performance by in-silico spike in Real signal: TP FN (sensitivity) Bogus signal: FP TN (specificity) 29

More Use Cases feature finding performance by in-silico spike in Real signal: TP FN (sensitivity) Bogus signal: FP TN (specificity) 30

Wrapping up MSSimulator is part of (www.openms.de) Input: FASTA and additional configuration files Output (ground truth) Raw data (mzml) peptide positions (RT/mz), charge state, labeling status, contaminants positions (featurexml) labeling pairs/groups, charge groups (consensusxml) 31

Acknowledgements Alexandra Zerck Christian Huber Silke Ruzek OpenMS team 32

Literature 1. Siepen JA, Keevil E-J, Knight D, Hubbard SJ. Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. Journal of proteome research. 2007;6(1):399-408. 2. Schulz-Trieglaff O, Pfeifer N, Gröpl C, Kohlbacher O, Reinert K. LC-MSsim--a simulation software for liquid chromatography mass spectrometry data. BMC bioinformatics. 2008;9:423. 3. Pfeifer N, Leinenbach A, Huber CG, Kohlbacher O. Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics. BMC bioinformatics. 2007;8:468. 4. Lan K, Jorgenson JW. A hybrid of exponential and gaussian functions as a simple model of asymmetric chromatographic peaks. Journal of Chromatography A. 2001;915(1-2):1-13. 5. Zhou C, Bowler LD, Feng J. A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data. BMC bioinformatics. 2008;9(Cid):325 33

Thank you for your attention! 34