DATA ANALYSIS IN MASS SPECTROMETRY BASED METABOLOMICS Pavel Aronov Stanford Mass Spectrometry Users Meeting September 26, 2011
Presented at the 2011 Stanford Mass Spectrometry t Users Meeting For personal use only. Please do not reuse or reproduce without t the author s permission. i 2
Types of Experiments in Metabolomics targeted non targeted quantitative semi quantitative Number of analyzed metabolites is limited by the number of available standards Absolute quantitation of metabolites (nm, mg/ml) Number of analyzed Number of analyzed metabolites is limited dby metabolites is limited dby the number of available lbl capacity of analytical l lb library spectra instrumentation Relative quantitation Relative quantitation of metabolites (fold) of metabolites (fold) 3
Targeted LC/MS metabolomics Absolute quantitation of metabolites using triple quadrupole instruments SUMS: Quattro Premier (Waters) and TSQ Vantage (Thermo) Chemical standards of metabolites are required, internal standards d (isotopically i labeled l metabolites, 13 C or 2 H) are desired Custom methods; amino acid method is available (~30 amino acids) SUMS contact: Karolina Krasinska 4
Untargeted t GC/MS based metabolomics Agilent 5795 single quadrupole GC/MS Metabolomics method based on Kind T. et al 2009, Anal Chem 81(24) Identification based on Agilent MSRI library (~700 metabolites, mostly primary metabolism) For general overview: SUMS 2010 users meeting workshop khppresentation 5
Untargeted t GC/MS based metabolomics Service began: Summer 2011 Sample requirements: dry or non-aqueous solvent Samples tested: tissue, bacterial cells, urine, plasma Metabolites identified: ~30-120 Metabolites detected: ~200-500 Data output: AMDIS IDs, non-targeted and pathway analysis is under development 6
Untargeted LC/MS based metabolomics Sample prep p Blood Plasma: protein precipitation it ti with 4 v of Acetonitrile, il evaporation, reconst in 5 % ACN LC Reversed Phase Chromatography Aqueous Normal Phase Chromatography MS Thermo Exactive: +ESI and ESI, R =50,000 000 m/z 70-800 (+APCI if necessary) Data Analysis MZmine 2.2 accurate mass database search (HMDB) Identification structure elucidation (Orbitrap Velos), data interpretation (Mass Frontier)
Data Processing Workflow m/z LC/MS Map Peak ID (m/z, t R ) Area 0001 3500 0002 6000 0003 9000 0004 700 0005 1200 Retention time Sample1 Sample2 SampleN Peak ID Area Area Area 0001 3500 2300 300 0002 6000 7800 Peak annotation 5600 0003 9000 5100 9700 and statistical analysis 0004 700 1300 1200 0005 1200 400 900
File Format Conversion Universal MS data formats: netcdf (*.cdf) All MS manufacturers provide tools for conversion into netcdf XMLs (mzxml, mzml) Less common in metabolomics
Data Processing Software Proprietary MarkerLynx (Waters), SIEVE (Thermo), MarkerView (Sciex), Mass Profiler Professional (Agilent) Open Source XCMS (http://metlin.scripps.edu/xcms/) edu/xcms/) and MZmine (http://mzmine.sourceforge.net/) net/)
MZmine: Hardware considerations Multi-processor systems Server CPU are expensive (Xeon, Opteron), Desktop processors are ok (Phenom, Core i7) 64-bit OS with Java and CPUs to address more than 4 GB memory At least 8 GB memory (high resolution profile data)
MZmine: main window
MZmine: supported file formats
MZmine: use of multiple CPU cores
MZmine: inspection of raw data
MZmine: raw data SIC
MZmine: peak picking
MZmine: preview of parameters
MZmine: estimating the noise level
MZmine: zooming into noise
MZmine: chromatogram construction
MZmine: peak picking results
MZmine: deconvolution
MZmine: deconvolution parameters
MZmine: deconvolution results
MZmine: deisotoping
MZmine: deisotoping gp parameters
MZmine: deisotoping results
MZmine: alignment
Data Processing Workflow m/z LC/MS Map Peak ID (m/z, t R ) Area 0001 3500 0002 6000 0003 9000 0004 700 0005 1200 Retention time Sample1 Sample2 SampleN Peak ID Area Area Area 0001 3500 2300 300 0002 6000 7800 Peak annotation 5600 0003 9000 5100 9700 and statistical analysis 0004 700 1300 1200 0005 1200 400 900
MZmine: alignment parameters
MZmine: alignment is computationally intense
MZmine: alignment results
MZmine: gap filling
MZmine: removal of duplicates
MZmine: normalization options
MZmine: identification options
MZmine: accurate mass online search
Mtbl Metabolomics Databases Human Metabolome Database: http://www.hmdb.ca/ h / BioCyc: http://biocyc.org/ Metlin: http://metlin.scripps.edu/ edu/ KEGG: http://www.genome.jp/kegg/ PubChem (not biology specific): http://pubchem.ncbi.nlm.nih.gov/ nlm nih
MS/MS databases Metlin http://metlin.scripps.edu/ tli ipp / MassBank (not only metabolites) http://www.massbank.jp/?lang jp/?lang=en
MZmine: identification results
Monoisotopic i vs Average Mass Fall 100 Two stable isotopes important in biochemistry Carbon-12 (100 %) and Carbon-13 (~1.1 %) Sulfur-32 (100 %) and Sulfur-34 (4.44 %) 0022 DS vitd051608sample001 (0.005) Is (1.00,1.00) 204.09 8.73e12 Tryptophan statistically can contain: HN NH no carbon-13 (M): 204.09 Da (100 %) 2 C 11 H 12 N 2 O 2 one carbon-13 (M+1): 205.09 09 Da (11.9 %) two carbons-13 (M+2): 206.09 09 Da (1.4 %) These are monoisotopic i masses % O OH Average mass = (204.09 *100 + 205.09*11.9 + 206.09*1.4)/113.2 = 204.22 (molecular weight, g/mol) g/mol) 205.09 206.10 0 203 204 205 206 mass
Elemental l composition i from accurate mass 1 H 1.0078 u 12 C 12.0000 u 14 N 14.0031 u 16 O 15.9949 u What is 28 u? N 2 (2x14u) u), CO(12u+16u)orCH 16 u) C 2 4 (2 x 12 u + 4 x 1 u)? What is 28.0313 u? [high accuracy] C 2H 4 (2 x 12.0000 u + 4 x 1.0078 u)
High Resolution 100 561.18 562.19 High Resolution: R = 561/0.06 06 ~ FWHM 9000 9,000 0.06 amu % 563.20 564.20 0 561.14 100 TOF: 7,000-50,000 Obit Orbitrap: 10 4-10 5 FT ICR: 10 5-10 6 0.8 amu Nominal Mass Resolution 562.10 FWHM (<1000) %R = 561/0.8 ~ 700 563.06 Accurate mass measurement is possible without high resolution. 0 m/z 560 561 562 563 564 High resolution improves precision 44 i
Mass of an electron becomes important at high accuracies Even Electron (EE) Ions Typically generated by chemical ionization techniques and electrospray C 11 H 12 N 2 O 2 C 11 H 13 N 2 O 2 205. 09715 Da (true mass) C 11 H 13 N 2 O 2 205. 09770 Da (2.6 ppm error) Orbitrap can achieve < 1 ppm accuracy
Identification based on accurate mass 212.00217 NL: C 8 H 6 O 4 NS 6.95E6 100-0.63646 ppm MeyerT_100422_sampl e0062#636 RT: 6.29 90 AV: 1 T: FTMS {1,1} - p ESI Full ms 80 [70.00-800.00] 00-800 00] s[70 70 ass io ma rati e m k r rate ea cur c pe acc pic g a top ing sot ch d is Matc and nce Relative Abunda 60 50 40 30 20 10 0 100 90 80 70 60 50 40 Theoretical M30 20 10 0 Acquired spectrum 212.00230 NL: C 8 H 6 O 4 NS 8.59E5 0.00000 ppm C 8 H 6 O 4 NS: C 8 H 6 O 4 N 1 S 1 pa Chrg -1 211 212 213 214 215 m/z Theoretical spectrum Error =-0.00013 Da/212.0023 Da * 1000,000 = 0.6 parts per million (ppm) H N OSO 3
MZmine: data export options
Manual Revision i of Mtbl Metabolomics Dt Data Many software tools can t remove some artifacts: t unusual adducts, fragments A majority of detected features may be isotopes, adducts, fragments and/or impurities Baran R et al 2010, Anal Chem
MeyerT_100422_sample0088 C18 pos R5 RT: 14.99-16.15 ve Abun ndance Relativ 100 50 0 100 50 [M + H] + ESI-MS Adducts 15.34 [M + NH + 4 ] 4/27/2010 7:23:42 PM 15.59 15.61 NL: 1.53E6 m/z= 398.95-399.26 MS MeyerT_100422_sa mple0088 15.83 15.86 15.97 16.01 16.06 15.59 NL: 1.37E6 m/z= 416.11-416.28 MS MeyerT_100422_sa 16.03 mple0088 15.09 15.12 15.18 15.90 16.12 0 15.61 NL: 100 643E5 6.43E5 m/z= [M + Na ] + 421.05-421.26 MS 50 MeyerT_100422_sa mple0088 0 15.78 15.84 15.97 16.08 15.0 15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 16.0 16.1 Time (min) MeyerT_100422_sample0088 #1581-1600 RT: 15.51-15.69 AV: 20 NL: 7.32E5 T: FTMS {1,1} + p ESI Full ms [70.00-800.00] 100 399.1856 e undance Rela ative Ab 90 80 70 60 50 40 30 H] + + 416.2120 [M + + [M NH 4 ] 421.1673 20 400.1888 417.2153 10 420.3664 415.2011 422.1704 397.1851 401.1905 403.0866 407.1887 409.9785 413.2656 418.2171 423.1722 0 398 400 402 404 406 408 410 412 414 416 418 420 422 424 426 m/z [M + Na ] +
Unusual LC-ESI-MS Adducts [2M-H+Fe] +
MeyerT_100422_sample0088 C18 pos R5 RT: 669 6.69-806 8.06 undance tive Abu Rela 100 80 60 40 Fragments in LC-MS 4/27/2010 7:23:42 PM 7.31 729 7.29 7.33 7.35 7.39 Hyppuric acid 20 6.73 6.78 6.87 6.91 7.00 7.08 7.17 7.51 7.55 7.60 7.66 7.74 7.80 7.86 7.94 7.99 0 7.31 100 7.29 7.33 80 735 7.35 m/z 118.0651 60 739 7.39 40 7.41 20 6.73 6.78 6.85 6.92 6.98 7.06 7.12 7.17 7.51 7.55 7.60 7.66 7.76 7.80 7.93 8.00 0 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 Time (min) NL: 2.23E5 m/z= 118.05-118.09 MS MeyerT_10042 2_sample0088 NL: 2.38E5 m/z= 118.05-118.09 09 MS MeyerT_10042 2_sample0088 MeyerT_100422_sample0088 #740-782 RT: 7.16-7.52 AV: 43 NL: 3.04E6 T: FTMS {1,1} + p ESI Full ms [70.00-800.00] undance tive Abu Relat 180.0651 100 Hyppuric acid O 90 80 70 60 50 40 30 20 C 8 H 8 N indole? No, fragment of hyppuric acid 176.9715 Not confirmed by GC-MS either 181.0684 10 122.5471 118.0651 162.0547 125.9862 134.0599 141.9584 149.0231 154.9899 167.0125 173.0298 182.9847 0 195.0873 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 m/z N H O OH
Conclusions Open source MZmine software is capable to perform automated peak extraction, alignment and annotation from high resolution LC/MS metabolomics and proteomics data Manual curation of fthe results is requires to rule out fragments, some adducts and isotopes 52
Ak Acknowledgements ld - Allis Chien - Theresa McLaughlin - Karolina Krasinska - Chris Adams - Anna Okumu - Tim Meyer - Natalie Plummer - Tammy Sirich - Angela Marcobal - Justin Sonneburg - Mike Snyder 53