Workflows in SIEVE and LipidSearch

Workflows in SIEVE and LipidSearch I Orbi 5-2014 1 The world leader in serving science

SIEVE v2.1 S tatistical I terative E xploratory V isualization E nvironment 2 The world leader in serving science

SIEVE Overview SIEVE is label-free differential software Aids in discovering molecular changes between states Provides semi-quantitative measurement of differentially expressed proteins, metabolites and other compounds correlating with a disease state, drug response or other perturbation Proteomics Biomarker discovery in plasma, tissue, cell culture, urine, saliva Disulfide bond validation in purified protein Quantify samples that contain their own precursor label, similar to SILAC Small Molecules Scanning electron microsope image of a cancerous (left) and normal cell, showing differences in cell brush. Image courtesy Igor Sokolov. Biomarker discovery in plasma, tissue, cell culture, urine, saliva Monitoring drug degradation due to environmental stresses Metabolomic and Lipidomic profiling Ingredient screening in Food and Safety Water purification monitoring Improving agriculture (tomatoes, whiskey, wine, corn) Direct infusion of olive oil using DART, looking for impurities Screening purified therapeutics for environmental and/or product modifications 3

Getting Started Minimal system requirements Microsoft Windows Windows 7 32/64 Professional SP1 Windows XP 32 Professional SP1 Microsoft.Net 4.0 (Extended) 2 GHz dual core processor 8 GB RAM or higher 500 GB hard drive Recommended system requirements 3.3 GHz processor 32 GB RAM 1TB RAID performance Installation requires administrator rights Free upgrade from v2.0 to v2.1 via the Thermo Omics Portal https://portal.thermo-brims.com/ Not necessary to uninstall v2.0 4

SIEVE Analysis Platform Statistically rigorous automated label-free LC/MS differential analysis platform State 1 Raw file Reports: State 2 Raw file State Raw file Workflow Align Detect Identify Components Relative Quantification Statistical Analysis Trend information Identification Applied to: any experiment that compares one group to another 5

Creating a SIEVE Experiment Initiate the wizard 6

Define Processing Method Select the domain: proteomics or small molecules 7

Define Processing Method and Experiment Type Select the detection algorithm: which one? 8

Two Signal Detection Algorithms Which One to Use? Application Instrumentation Classic Recursive Base Peak Framing All sample types: Proteomics Small molecules High resolution instrument Low resolution instrument Component Extraction Small molecules (charge state 2) High resolution instrument Experiment Type Trend analysis Control Treated ROC analysis Single class analysis Advanced Processing Limitations Perfect pairs Targeted detection Direct infusion NO background subtraction NO data reduction by isotope and adduct grouping Trend analysis Background subtraction Data reduction by isotope and adduct grouping Requires charge state 2 Requires high resolution data 9

SIEVE Workflow Component Extraction Automatically interpret spectra, reduce signal peaks into components [M+H]+ [M+Na]+ [M+K]+ A total of 9 different ions are observed for the same molecule as isotopes and adducts 10

Experiment Name Identify experiment name 11

Select Raw Files Drag and drop the raw files from the file explorer 12

Sort Raw Files Click on the file name to sort the files 13

Multiple Solvent Blanks If more than one solvent blank is present then blank files are averaged 14

Component Extraction Background Subtraction Sample - Solvent blank = Analyte signals Distinguish analyte signals from noise Background subtraction is automatically performed when solvent blanks are acquired. Irrelevant solvent peaks are removed from the data that eliminates a significant amount of low level noise. ~98% of lower intensity signals are eliminated A significant step in data reduction and a critical part of the new component detection algorithm. 15

Scan Raw Files for Data Quality Raw files show no errors 16

Define Analysis Groups Identify group names by separating groups with a space Ratio group is the control group (Fed/Fast) The word blank will enable background subtraction Select alignment reference file 17

Define Search Parameters Define retention time, mass range, and m/z width Define frame width for framing experiment Frame width is automatic for component extraction m/z width: 10 ppm is +/- 5 ppm 18

Select Scan Filter SIEVE automatically selects the full MS scan type Data with both positive and negative filters needs to be processed separately Used a lock mass? Need to modify the filter string to removing lock mass text Example: FTMS + p ESI Full MS Lock Mass 19

Define Main Component Extraction Parameters Review raw data in Qual Browser first Each data set is different requiring different settings Intensity threshold is initially set from the mean intensity of the reference file 20

Define Identification Parameters Three search types available: ChemSpider, Database Lookup or Defer 21

Identification Parameters ChemSpider Free chemical structure database Over 470 data sources i.e. KEGG, Human Metabolome Database, etc. More than one data source can be used in the identification search when separated by a comma DB Lookup Post peak detection lookup (different from seed file) DB Library Files in csv format Requirement: first column must have neutral exact mass All other columns are optional Defer Can defer this setting in the wizard Identification can be later enabled within the parameters table 22

Complete Wizard Setup Save file as.sdb 23

Review SIEVE Parameters Before Processing Reference File Check if reference file is displayed If not, enable through raw file collection Scan Filter Exclude lock mass in text string Update If modifications made in the parameters table, UPDATE Then run processing task Align Always align Even if bypassing align step SIEVE is reading in files Multiple instances of software allowed 24

Component Extraction - Use of Integration Parameters Peak Detection ICIS Genesis PPD (parameter-less) Peak Integration ICIS Genesis PPD None Peak areas generated Integration reflects entire window Why use? Time Peak Integration ICIS None 25

Set Integration Parameters Optimize parameters for chromatographic peaks 26

Unaligned Small Molecule Data 27

Aligned Small Molecule Data Zoom in by placing a box over area with cursor Zoom out by removing scroll bars 28

Frame Report View 29

Data Review Options XIC Trend Intensities Peaks 30

Frame Report Right click on any column title to access field chooser 31

Results Review Options Gel View CVs Displayed by group PCA 32

Frames Table Filter Use filter table to reduce the number of components Filter on column headings Filter follows Boolean logic (and, or, not) Example 1: CV_E <20 and CV_H <20 Example 2: Ratio_E < 0.5 or Ratio_E >1.5 Example 3: Pvalue_E <0.05 Example 4: Pick >0 33

Additional Tips Each dataset may be different Visually confirm alignment (may need to bypass alignment) Multiple iterations of peak detection may be necessary to optimize peak detection parameters Start with higher threshold and no peak integration for faster review Supplemental information provided Questions? Refer to the Thermo Omics Software Portal http://portal.thermo-brims.com/ 34

Lipid Search v 4.0.20 Quick Fix release 35 The world leader in serving science

LipidSearch Features Automated identification of lipids from biological samples Identification, relative quantitation, alignment Comprehensive database of >1.5 million lipid ions and predicted fragment ions Identification algorithms for product ion, precursor ion, and neutral loss scans Identification ranked by mass tolerance, then matched to predicted fragments and predicted retention time Suitable for multiple approaches for lipid analysis LC or nano-infusion (Shotgun) Untargeted and targeted profiling Compatible with data from various MS systems Thermo Q Exactive, hybrid Orbitrap, and TSQ instruments 36

Getting Started Recommended system requirements 64-bit operating system, Microsoft Windows 7/8 Quad- or multi-core CPU, 3 GHz or higher 16 GB RAM or higher 500 GB hard drive or larger (SSD optional) Required programs Thermo Scientific MSFileReader 64-bit (need to uninstall if currently installed) Java runtime environment (JRE 1.6+) Microsoft Visual C 2010 runtime Microsoft Internet Explorer or Google Chrome Web-based graphical interface Installation requires administrator rights 37

Getting Started Tomcat Server Adjustable maximum memory allocated to server Installation Edit after installation C:\lipidserach\lipidserach4.0\LipidSearchLauncher\LipidSearchLauncher.ini Documentation User manual, installation instructions, tutorial files (C drive) 38

Launcher Initiate the software via the desktop icon Tomcat server Stop and start server here Open to launch LipidSearch Minimize Tomcat server to the taskbar Re-open server by clicking on icon http://localhost:8090/lipidsearch040/ 39

License Must request license key to register software Send information to ThermoMSLicensing.com Register key to activate software 40

Configuration Modify configuration to improve performance Increase buffer size to 70 80% If using > 3 GHz processor, increase the number of processes for peak detection, identification, and quantification to 4 41

LipidSearch Workflow Step 1 Step 2 42

Batch Creation for Identification and Quantitation Select raw files to be processed 43

Identification Parameters - LC-MS/MS General: Triple Quadrupole Q Exactive: QE or Fusion (HCD) Orbitrap: Fusion (CID, MS 2 /MS 3 ) Recalc Isotope: ON for general search OFF for low abundant ions M-Score is based on the number of matches with product ion peaks in the spectrum 44

Quantitation Parameters 45

Filter Criteria for Displaying Raw File Results 1) Toprank: displays lipids with top score among identified spectra 2) Main node: main isomer peak displays the largest isomer based on intensity, m-score and t-score 1 3 4 2 3) FA priority: shows the most likely fatty acid chain combination if lipid isomers have the same score 4) ID Quality: A: lipid class & FA were completely identified B: lipid class & some FA were identified C: lipid class or FA were identified D: identification by other fragment ions (H 2 0 loss) 46

Select Lipid Class 47

Select Adducts 48

Submit Batch Successful submission Unsuccessful submission 49

Data Processing Search Job List Window Export: Exacts the summary data in the results list Download: Exacts the entire results file 50

Data Processing Search Job List Window Identification Number of lipid parameters groups window Identification Number of lipid results ions window 51

Identification Results Summary The parameters applied in the filter can be modified then resubmitted with the change filter function This operation can be performed in the job list window or the in the identification results window 52

Data Processing Search Job List Window P Peak picking I Identification Q Quantitation 53

Data Processing Search Job List Window In queue Active Canceled Successful completion Ended in failure 54

Data Processing Search Job List Window Number of lipid groups (sum composition) Number of lipid ions (isomers) 55

Review Identification Results 56

Review Identification Results Data sorted by LipidGroup, CalcMz, TopPos 57

Review Identification Results t-score: the difference between the theoretical LC-MS retention time (RP) calculated from the lipid computational formula and the actual retention time [lower value increases reliability] m-score: based on the number of matches with product ion peaks in the MS2 spectrum [higher value] Occupancy rate: the ratio of MS2 spectrum peaks assigned to the lipid among all peaks [higher value] Grade: identification quality filter assigned A D based on lipid class or fatty acid identification 58

Review Identification Results Mass spectrum Chromatographic peak 59

Spectrum Details Screen Data ID Other spectra where this lipid is identified Precursor ion Black = unassigned ions Red = MS2 matched ions Green = MS3 matched ions 60

Chromatogram Chart De-noising Smoothing Separating partially overlapped peaks Area score: 0.96 Yellow = integrated area 61

Alignment Parameters Set alignment parameters Max peak area is the default Select Mean to obtain group average peak areas 62

Retention Time Tolerance LC Experiment Types: Retention time tolerance threshold for the peak tops of peaks deemed to be the same lipid during alignment r.t.1 r.t.2 > R.T. tolerance If the above is true then the peaks will not be aligned Instead they are two separate records in the results list Example results: LPC (18:1) 63

Select Raw Files for Alignment Click select to open the job list tab 64

Select Raw Files for Alignment Check box to include raw files to be aligned Click add 65

Group Allocation Selected files will appear in alignment setup Define control and sample groups In this example, wild type is control and knock out is sample group 1 66

Submit Alignment Successful submission 67

Alignment Jobs List M Merge (Alignment) 68

Alignment Results Alignment layout is formatted similarly to the identification results Group information can be expanded and collapsed 69

Normalization Option Normalization by either internal standard or by individual lipid class 70

Lipid Specific Alignment Results 71

Acknowledgements Thermo Fisher Scientific Jennifer Sutton David Peake Ralf Tautenhahn Josef Ruzicka 72

Supplemental Material for SIEVE 73 The world leader in serving science

Define Experiment Type Experiment Types: Two Sample Differential Analysis A simple comparison between two states such as healthy and diseased. A ratio and p-value are calculated. Control Compare Trend This experiment is used for time course analysis or trend type experiments. One of the groups is defined as the control group and the others are compared to this control group. For each trend point, a ratio and p-value are calculated against a control group. Differential Case Study with ROC Analysis This experiment type is used to measure candidate marker s capability of distinguishing between two classes. A large subject group ( 10) is recommended. Technical replicates are also recommended. Non-differential Single Class Analysis Allows for a quick assessment of the data to determine reproducibility and overall quality by using the CV processor. This analysis can also be used with SIEVE s Perfect Pairs tool to find precursor pairs in a single raw file. This algorithm tags pairs of frames that are consistent with a designated mass difference. Applications include PTMs, ion and ion + adduct combinations, SILAC, and other precursor labeling methods. 74

Parameter Settings Global Maximum number of threads for processing. Lowering this value on 32 bit computers can bypass memory issues. Change to force calculation if you want to see PCA plot for large experiments. Rawfile collection is where you can add/remove RAWs, change the alignment reference file, change groups, color. After completely the wizard, the user must check to see if the reference file parameter is populated. If a file is not assigned the user must click on the rawfiles (collection) tab and check the reference file in the box that appears, Then select OK and the reference file parameter should display the selected reference file. Retention start and stop can be used to eliminate un wanted data. Bold line parameters are the one that are changed most often Scanfilter is the Full scan type that is used for the analysis. If you have lock mass turned on and you see two full scans in the wizard you need to change this setting before running the analysis to include all of the full scans. Change it so that the string includes FTMS + p ESI Full, in this example. Remove all of the letters after the Full 75

Parameter Settings Alignment Check alignment to see if alignment is needed or not. You have the option to bypass alignment if needed. Minimum intensity threshold for alignment Alignment correlation bin size Max retention time shift for alignment step in mins The initial size of a title that correlates basepeak alignment. Bold line parameters are the one that are changed most often 76

Parameter Settings Basic Component Signal to background noise threshold for background correction (subtraction). Suppress components that do not meet the Background Signal to Noise criteria. Base peak minimum intensity required for a signal to be considered as a component. Minimum number of scans across a chromatographic peak to be considered. Mass window for XIC in ppm (NOTE: 10ppm = +/- 5 ppm, not +/-10ppm). Base peak minimum intensity requires for a signal to be considered as a component from a targetmzlist experiment. List of component MZ s to force find. Algorithm used to determine peaks ICIS because the parameters can be checked and optimized by looking at the raw data in Qual Browser. Time in mins from a peak apex to restrict seeking another peak. Bold line parameters are the one that are changed most often 77

Parameter Settings Frame The condition or trend point that was designated as the control. The control group serves as the denominator for ratio calculations (treatment/control). Algorithm used for second pass peak integration. Default is NONE but if ICIS is used for peak detection the ICIS should be used for peak integration. Parameters for the different integration methods can be found under the Workspace tab at the top of the software. It is recommend that the user look at their data in QualBroswer first and optimize the peak integration parameters (ICIS) and then apply these settings to SIEVE. These parameters are very sensitive to the chromatographic peak shape of the data. Bold line parameters are the one that are changed most often 78

Parameter Settings Advanced Component Minimum scans should always remain 2, do not change this setting it is not base peak minimum scans. Skirt minimum basepeak intensity, default should be set to 5 million but varies from instrument to into. 5 million for QE and 2 million for Exactive. Number of points for data smoothing. Bold line parameters are the one that are changed most often 79

Parameter Settings Global Identification The charge used for ChemSpider and DBLookup if a charge could not be determined. The maximum number of frames/components to identify. Multiple formulas may be assigned to each component. MinFormulaScore is the minimum number of formulas sent to ChemSpider for identification or sent for pathway analysis. Maximum number of formulas report per component is 10. Select the search type: ChemSpider, DBLookup, None Bold line parameters are the one that are changed most often 80

Parameter Settings Accurate Mass Identification ChemSpider identification only: Provide adduct for mass calculation (+nh, -nh +K +Na +NH4 ChemSpider databases used for search. More than one database can be search at a time and should be separated by commas. Browse for the accurate mass library file (csv). Use either COMPMW to find accurate mass ID s based upon component molecular weight, FRAMEMZ to find accurate mass ID s based upon frame MZ, or FORMUAL to id by formula. For COMPMW search type adduct is not necessary. Accurate mass MZ tolerance for DBLookup and ChemSpider searches (ppm). Bold line parameters are the one that are changed most often 81

Parameter Settings Optimized Background SN changed to 10 SkirtBPInty set to 5 million Smoothing set to Points 3 Max retention time shift for alignment step in mins. Default setting is 0.2 which is too large in this case to separate isomers. BPMinimumCounts changed to 1 million Time in mins from a peak apex to restrict seeking another peak. Default setting is 0.2 which is too large to separate isomers. Peak Integration is changed from NONE to ICIS. 82

Frame Target Wizard Seed File Incorporates a search for target ions of interest Identify.csv file in the setup wizard Alternatively identify.csv file in the parameters table as frame seed file 83

Frame Target Wizard Create the.csv file Minimally required columns include MZ, RTStart, and RTStop Additional information can be listed in columns, for example compound name The annotated column can be filtered using the frames table filter 84

Frame Target Wizard Assign columns within the.csv file Check the number of entries in the frame parameters table to confirm file is successfully identified 85

Normalization If normalizing to a selected frame, make sure the desired m/z frame is highlighted The desired frame is then displayed as the frame to normalize to in the normalize tab 86

Alignment and Framing Experiment Associate data by isotopic cluster Select PRElement, PRRoot, and PRSize from the field chooser Sort by PRRoot Tip: filter on PRSize PRElement 0 = 12 C 1 = 13 C 2 = 14 C PRRoot 12 C peak PRSize number of frames per cluster 87

Elemental Composition SIEVE can generate up to 10 possible elemental compositions for a given component Each composition is scored and ranked The top ranked composition is listed in the main frame report table Other possible compositions can be viewed in the Flex View tab 88