Metabolomic Profiling in Drug Discovery: Understanding the Factors that Influence a Metabolomics Study and Strategies to Reduce Biochemical and Chemical Noise Mark Sanders 1 ;Serhiy Hnatyshyn 2 ; Don Robertson 2 ; Michael Reily 2 ; Thomas McClure 1 ; Michael Athanas 3, Jessica Wang 1, Pengxiang Yang 1 and David Peake 1 1 Thermo Fisher Scientific Inc., San Jose, CA; 2 Bristol Myers Squibb, Princeton, NJ 3 Vast Scientific, Boston, MA
Metabolomics in Drug Discovery Pattern recognition Good profile verses bad profile Identification and quantitation of endogenous markers Compound selection Target effects efficacy markers Off-target effects toxicity/liability markers Identification of markers provides mechanistic insights Target validation Mechanism of toxicity Early evaluation of potential clinical markers
Metabolomics in Drug Discovery Targeted Analysis Metabolite target analysis Analysis restricted to metabolites of an enzyme system that are known to be affected by a certain perturbation Metabolite profiling Analysis focused on a class of compounds associated with a particular pathway (e.g. nucleoside triphosphates, lipids, steroids, etc.) Only find what you are looking for Untargeted Analysis A comprehensive analysis of all metabolites A measure of the fingerprint of biochemical perturbations Useful when you don t know what to expect Hypothesis generation
Metabolomics Analysis Goals Quantitative assessment of the biochemical makeup of the samples Differential analysis between sample groups Identify compounds responsible for changes Challenges Complexity of a biological sample Diversity of small molecule metabolites Wide range of metabolite concentration Multiple sources of variability Incomplete information majority of components seen by LC/MS are unknowns Structure elucidation of unknowns is expensive Need sophisticated data reduction tools and strategies to minimize noise
Sources of Noise in a Metabolomics Study Instrumental Mass and retention time stability Robust and stable detector response Sufficient resolution to resolve isobaric interference Chemical (Data Processing) Background from column/solvents Multiple signals per compound Setting the threshold Biological Different response rates to a stimulus between individuals Stress status Feeding status Other health factors Study Design Proper controls and randomized sampling/analysis to minimize systematic errors Sampling, sample preparation and storage Statistical Analysis Limited sampling, over fitting data
The First Benchtop Quadrupole Orbitrap LC-MS/MS
Q Exactive LC-MS/MS: Benchtop Quadrupole Orbitrap Quadrupole mass filter Quadrupole: hyperbolic rods Isolation down to 0.4 amu HCD collision cell Analogous to Thermo Scientific LTQ Orbitrap Velos Hybrid S-lens Stacked Ring Ion Guide Analogous to Thermo Scientific LTQ Velos LC-MS n Shorter inject times Parallel Processing Ions collected in C-trap while orbitrap is scanning Advanced Signal Processing Improved resolution Faster acquisition speed Precursor ion selection for SIM and MS/MS functionality Higher scan speed
What do we Want in a Good HRAM Instrument? Accurate mass stability Robust accuracy over extended periods set it and forget it Ability to do pos/neg switching within a run and maintain accuracy Speed Compatibility with the most demanding UHPLC separations Resolution Primary discriminator for the analytes of interest Want as much as we can get without compromising sensitivity Sensitivity As good if not better than a triple quad
Q Exactive LC-MS/MS: Mass and Response Stability D 5 -hippuric acid, external calibration, resolution = 82,000 Chromatograms XIC 185.0969 ± 5ppm FWHM = 1.86 sec 4.28 4.27 4.28 7:51pm 11:14pm 3:32am 185.0968 185.0968 185.0969-0.45ppm -0.72ppm -0.24ppm Mass Spectra -0.04ppm 186.1003-0.31ppm 186.1002 CV = 2.4% 4.27 4.15 4.20 4.25 4.30 4.35 4.40 Time (min) Instrumentation 8:08am 185.0968-0.72ppm Ext.Cal + 65.13 hrs 185.5 186.0 m/z 0.39ppm 186.1003-0.47ppm 186.1002
Q Exactive LC-MS/MS: Speed Resolution Setting: 35,000 Resolution Setting: 70,000 100 100 90 90 80 80 Relative Abundance 70 60 50 40 30 Relative Abundance 70 60 50 40 30 20 20 10 10 0 1.75 1.80 1.85 1.90 1.95 Time (min) 0 1.75 1.80 1.85 1.90 1.95 Time (min) Peakwidth (FWHM) ~ 1 sec Scans/peak = 21 Peakwidth (FWHM) ~ 1 sec Scans/peak = 11 Instrumentation
Q Exactive LC-MS/MS: Resolution Setting - 70,000 311.1689 34 S 313.1641 313.1669 13 C 2 313.1741 313.14 313.16 313.18 313.20 m/z 313.10 313.15 313.20 313.25 m/z 312.1715 313.1641 Calculated 35,000 Resolution C 17 H 27 O 3 S 311.0 311.5 312.0 312.5 313.0 313.5 m/z Instrumentation
Q Exactive LC-MS/MS: High Sensitivity Quantitation Testosterone 10pg/mL in Serum Area Ratio 4.8 4.6 4.4 4.2 4.0 3.8 3.6 3.4 3.2 3.0 2.8 2.6 2.4 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 Standard pg/ml % Difference 10 0.97 20 7.45 50-5.78 100-0.29 250-5.35 500 2.99 Testosterone 10 pg/ml in Serum 0 50 100 150 200 250 300 350 400 450 500 550 Concentration (pg/ml) Instrumentation
Mass Accuracy with Polarity Switching (External Calibration) 1 positive + 1 negative scan in 1 second Instrumentation
Apex Triggered Data Dependant MS/MS O 0.8ppm 94.0652 HO NH 2 O NH 2 Kynurenine 74.0240 0.6ppm 120.0445 118.0652 0.9ppm 0.8ppm 136.0758 1.0ppm 146.0602 0.8ppm 0.4ppm 174.0551 0.9ppm 192.0656 150.0551 2.8 2.9 3.0 Time (min) 60 80 100 120 140 160 180 200 m/z Instrumentation
Anatomy of a UHPLC/Orbitrap Data Set +4 100 0 Other (5%) [M+H] + 5 0 852.9720 m/z window = 853.4727 853.9745 854.4817 1 Da z =+2 853.0 853.5 854.0 854.5 855.0 m/z 100 200 300 400 500 600 700 800 900 1000 +2 +3 Z=2 12% Z=3 (10%) Data Processing Z=1 (73%) Adduct % Assignments [M+H] + 100 [M+Na] + 12.1 [M-H2O+H] + 8.3 [2M+H] + 4.7 [M+NH 4 ] + 3.8 [2M+Na] + 3.1 [M+K] + 2.7 [M-2(H 2 O)+H] + 2.5 [M+CH 3 CN+H] + 2.1 +1 >1,000,000 data points ~100,000 extracted ion peaks. Peak area ranges ~ 7 orders Much irrelevant data Much redundant data High quality data from the Orbitrap mass analyzer allows for more precise automated data processing Need to be able to reduce the data to chemical entities
Background Subtraction Sample - Solvent blank = Analyte signals Data Processing ~98% of lower intensity signals are eliminated
Spectral Interpretation Rat Urine [M+H]+ 180.07 m/z 180.0652 [M+H]+ 3.37 4e7 Hippuric Acid O N H O OH m/z 413.0427 [2M+Fe-H]+ 3e5 ESI+ ESI - 12 related ions 24 related ions m/z 576.1277 [3M+Ca-H]+ 1e6 [M+Na]+ [2M+H]+ 359.12 202.05 413.04 576.13 100 150 200 250 300 350 400 450 500 550 600 m/z m/z 591.0930 [3M+Fe-2H]+ 9e5 2 3 4 5 Time (min) Data Processing
Spectral Interpretation Rat Urine [M+H]+ 180.07 Fe Isotope Pattern Detected 591.0929 Measured m/z 180.0652 [M+H]+ 3.37 4e7 589.0975 590.1034 592.0975 593.1014 594.0981 m/z 413.0427 [2M+Fe-H]+ 3e5 591.0935 Theoretical C 27 H 25 N 3 O 9 Fe 589.0981 590.1013 592.0965 593.0987 594.1010 m/z 576.1277 [3M+Ca-H]+ 1e6 590 592 594 m/z [M+Na]+ [2M+H]+ 359.12 202.05 413.04 576.13 100 150 200 250 300 350 400 450 500 550 600 m/z m/z 591.0930 [3M+Fe-2H]+ 9e5 2 3 4 5 Time (min) Data Processing
Varying Response with Different Ion Species Rat Urine Hippuric Acid M+H [2M+Fe-H]+ [3M+Ca]+ [3M+Fe-2H]+ 122,738,814 869,212 2,576,598 527,298 119,451,097 824,794 2,471,863 499,852 117,092,066 689,807 2,234,709 456,582 115,057,559 623,152 2,167,836 432,552 115,387,079 573,090 2,138,694 417,703 117,957,232 560,476 2,157,101 409,896 117,947,308 690,089 2,291,134 457,314 2,858,562 130,537 186,409 47,198 2.4% 19% 8% 10% Trp Phe 415,983 2,574,163 420,085 2,614,732 410,093 2,494,727 427,342 2,479,608 423,358 2,448,543 416,844 2,439,600 418,951 2,508,562 6,047 70,659 1.4% 2.8% Data Processing
Varying Response with Different Ion Species Rat Urine Hippuric Acid M+H [2M+Fe-H]+ [3M+Ca]+ [3M+Fe-2H]+ 122,738,814 869,212 2,576,598 527,298 119,451,097 824,794 2,471,863 499,852 117,092,066 689,807 2,234,709 456,582 115,057,559 623,152 2,167,836 432,552 115,387,079 573,090 2,138,694 417,703 117,957,232 560,476 2,157,101 409,896 117,947,308 690,089 2,291,134 457,314 2,858,562 130,537 186,409 47,198 2.4% 19% 8% 10% Trp Phe M+H M+H 415,983 2,574,163 420,085 2,614,732 410,093 2,494,727 427,342 2,479,608 423,358 2,448,543 416,844 2,439,600 418,951 2,508,562 6,047 70,659 1.4% 2.8% Data Processing
Importance of Spectral Interpretation m/z = 593.2815 Dosed Control [M+H]+ 297.1443 5e7 [M+H]+ 220.1176 2.22 2.24 2.26 2.28 2.30 2.32 2.34 2.36 2.38 2.40 Time (min) Component ion 297.1443 5e5 2.22 2.24 2.26 2.28 2.30 2.32 2.34 2.36 2.38 2.40 Time (min) [2M+H]+ 593.2815 100 200 300 400 500 600 700 800 900 1000 m/z Data Processing
Removing Noise from the Statistics m/z Peaks Components Large intra group variability Female Fed Fasted No group separation Male Data Processing
SIEVE Analysis Platform Statistically rigorous automated label-free LC/MS differential analysis platform State 1 Raw file State 2 raw file State raw file Workflow Align Detect Identify Reports: Components Identification Relative Quantitation Statistical Analysis Trend information Data Processing Applied to: peptide, protein, small molecule data
SIEVE Workflow Alignment Unaligned basepeaks Data Processing
SIEVE Workflow Alignment Aligned basepeaks Alignment scores Data Processing
SIEVE Workflow Alignment 1. Full scan spectra are acquired with high mass accuracy. Data File 1 Reference Data File 2 Data Processing
SIEVE Workflow Alignment 1. Full scan spectra are acquired with high mass accuracy Data File 1 Reference 2. The spectra are binned. Data File 2 Data Processing
SIEVE Workflow Alignment 1. Full scan spectra are acquired with high mass accuracy. Data File 1 Reference 2. The spectra are binned. 3. A dot product correlation is calculated between each pair of spectra X Data File 2 Data Processing
SIEVE Workflow Alignment Scan # data file 1 Scan-to-scan correlation: Red High Green Low Scan # data file 2 Data Processing
SIEVE Workflow Alignment 5 An overlapping tile is constructed from the next region starting from the middle of the optimal path. Data Processing
SIEVE Workflow Alignment 6 An overlapping tile is constructed from the next region starting from the middle of the optimal path. The full plane is tiled and a final alignment score is calculated. Overlapping measurements are averaged Data Processing
Component Detection Adducts, fragments and multimers 524.3703, z=1, I=4.2E+08, 100% 546.3517, z=1, I=1.0E+08, 24.6% 562.3232, z=1, I=1.1E+06, 0.3% [M+H]+ [M+Na]+ [M+K]+ Isotopic peaks 21.9816 37.9554 A+1 525.3730, I=1.2E+08, 28.9% A+2 526.3756, I=2.3E+07, 5.5% A+3 A+4 527.3784, I=3.0E+06, 0.7% 528.3811, I=3.9E+05, 0.1% A+1 Isotopic peaks A+2 547.3535, I=2.9E+07, 27.8% 548.3577, I=5.6E+06, 5.4% A+3 549.3595, I=9.0E+05, 0.9% Data Processing Constituents are represented by base component
Frame / Feature Frame: a well defined rectangular region in the M/Z versus Retention Time plane. L-Epicatechin MW = 290.0790 Data Processing
L-Epicatechin # Blend Location 1 zinfandel Lake 10 petite sirah Lake 13 zinfandel Lake 36 cabernet sauvignon Mendocino 37 petite sirah Mendocino 2 cabernet franc Napa 3 cabernet franc Napa 20 petite verdot Napa 21 cabernet franc Sonoma 25 cabernet sauvignon Sonoma 33 merlot Sonoma 35 merlot Sonoma 44 cabernet sauvignon Sonoma L-Epicatechin MW = 290.0790 Data Processing
Accurate Mass Identification Component MW chemspider web service Local database Data Processing MolWt Expression Name 290.079 L-Epicatechin 306.074 Epigallocatechin 314.01 D-glycoside of vanillin 380.1254 Vellokaempferol 3-5-dimethyl ether 382.1047 Velloquercetin 4 -methyl ether 426.0945 Epigallocatechin 3-O-(4-hydroxybenzoate) 436.1153 Epigallocatechin 3-O-cinnamate 450.0793 Quercetin 4 -galactoside 468.1051 Epigallocatechin 3-O-caffeate 472.1 Epigallocatechin 3-O-(3-O-methylgallate) 477.1266 Isorhamnetin 7-alpha-D-Glucosamine;Quercetin 3 -methyl ether 7-alpha-D-Glucosam 478 0742 Q i 7 l id List of candidates
Rat Fasting Study Study designed to monitor the effect of fasting on metabolic profiles Biology Group Male Fasting Time 1 1: 1101-1105 5 Dark Cycle Control (no Fast) 2. 2101-2105 5 2 hr Fast 3. 3101-3105 5 4 hr Fast 4. 4101-4105 5 8 hr Fast 5. 5101-5105 5 12 hr Fast 6. 6101-6105 5 16 hr Fast 1 Rats fasted during a 6 p.m. to 6 a.m. dark cycle to capture peak feeding time Samples: 50uL Serum ppt with cold MeOH MS: Q Exactive LC-MS/MS @ 70K resolution, ESI+ and ESI- UHPLC: Accela 1250 Pump Column: Hypersil GOLD aq 2.1x150mm, 1.9µ @ 600µL/min, 50ºC Buffers: A: 0.1% formic acid in H 2 O, B: 0.1% formic acid in 98:2 ACN:H 2 O
Pooled Quality Controls Same Sample IS citrulline Tyr Phe Trp 273.1479_1.07 pooled QC 20,903,851 969,474 18,350,003 19,904,399 20,685,704 10,918,321 pooled QC 22,076,315 1,041,539 20,227,547 20,984,429 22,968,636 9,599,500 pooled QC 22,088,562 1,182,143 20,853,789 21,040,901 23,310,086 9,010,457 pooled QC 22,052,324 1,205,426 20,390,553 21,477,887 23,583,964 8,213,523 pooled QC 22,042,181 1,153,795 21,417,740 22,061,286 23,215,235 6,456,432 pooled QC 22,778,779 1,244,100 21,862,115 21,822,323 23,765,745 3,499,156 3% 9% 6% 4% 5% 33% Inj. # 3 14 25 36 47 53 Sample 1 Sample 2 Sample 3 Component of Interest Pooled QC Treated Control Sample 52 Pooled QC *Sangster, et. al., Analyst, 2006, 131, 1075-1078 Biology QC Treated Control
Finding the Differences PCA, rat plasma negative mode 4hr 12hr Control - Fed Pooled Controls 16hr Biology
Examples of Metabolite Changes on Fasting 450,000,000 400,000,000 350,000,000 Methionine 1,200,000,000 1,000,000,000 Proline 300,000,000 800,000,000 250,000,000 200,000,000 600,000,000 150,000,000 250,000,000 400,000,000 100,000,000 50,000,000 200,000,000 200,000,000-150,000,000 QC Blank DC 2h 4h 8h 12h - QC Blank DC 2h 4h 8h 12h 140,000,000 120,000,000 100,000,000 Arachidonic Acid Linoleoyl-lyso-PC (18:2) 50,000,000 100,000,000 80,000,000 60,000,000 40,000,000 20,000,000 - QC Blank DC 2h 4h 8h 12h - QC Blank DC 2h 4h 8h 12h Biology
Overall Method Robustness Uric Acid: Positive and Negative Data 300,000,000 250,000,000 Negative ion 200,000,000 150,000,000 100,000,000 50,000,000 100,000,000 90,000,000 80,000,000 70,000,000 60,000,000 50,000,000 40,000,000 30,000,000 20,000,000 10,000,000 QC Blank DC 2h 4h 8h 12h Positive ion Time between analysis 30 hrs QC Blank DC 2h 4h 8h 12h Biology
Study Findings Fasting has profound impact on metabolomic profiles Most metabolic changes are modest in extent Fasting-status may exacerbate or obscure druginduced metabolic effects. Fasting data help contextualize drug-induced changes in many metabolites As part of the study design, fasting is neither right or wrong but it is a significant variable in model design Biology
Summary Metabolomics is very challenging. It is fraught with numerous sources of noise and the cost of going down the wrong path is high Instrumentation Needs to be precise and robust good quality in, good quality out Q Exactive LC-MS/MS provides an ideal platform Excellent mass accuracy with external calibration Ultra high resolution without loss of sensitivity High performance quantitation Discovery and validation on the same platform Chemical Noise (Data Processing) The right software and the right controls can make all the difference Intelligent data reduction tools can significantly reduce noise Biological Noise Needs to be understood through systematic studies Metabolomic prescreening can identify biological outliers Ensure homogeneity within the study
The Power of SIEVE software for Differential Analyses
Comparison of Palm Oil Samples Relative Abundance Relative Abundance 100 80 60 40 20 0 100 80 60 40 20 0 1.53 1.52 2.86 2.56 2.85 2.56 3.21 3.20 3.28 Adulterated 4.89 Control 4.90 4.98 5.73 5.01 5.66 7.87 7.89 9.31 9.34 11.29 0 2 4 6 8 10 12 14 16 18 20 Time (min) 12.89 13.56 15.27 13.37 11.43 14.87 2.68E10 2.69E10
Comparison of Palm Oil Samples Relative Abundance Relative Abundance 50 40 30 20 10 0 50 40 30 20 10 Adulterated 0.44 0.85 1.46 1.19 Control 1.53 1.88 1.52 1.65 2.56 3.70 2.34 3.73 2.33 2.56 3.71 4.20 4.69 4.89 4.90 4.84 4.90 3.74 4.69 4.25 5.01 4.98 5.06 5.63 5.73 5.72 6.84 5.96 5.09 5.66 6.85 5.77 6.82 2 T 2 T g 0 0.30 0.51 1.34 0 1 2 3 4 5 6 7 Time (min)
SIEVE software for Differential Analysis Easy to use wizard walks you through the process and parameters of a differential analysis and unknown identification
SIEVE software for Differential Analysis
SIEVE software for Differential Analysis
Thanks! Serhiy Hnatyshyn Michael Reily Don Robertson Jessica Wang Pengxiang Yang Michael Athanas Thomas McClure David Peake Kate Comstock Yingying Huang Patrick Bennett Markus Kellmann Catharina Crone Thomas Moehring Alexander Makarov Eugen Democ Frank Czemper Sebastian Kanngiesser Andreas Wieghaus