Extending Proteome Coverage by Combining MS/MS Methods and a Modified Bioinformatics Platform adapted for Database Searching of Positive and Negative Polarity 193 nm Ultraviolet Photodissociation Mass Spectra Sylvester Greer, 1 Marshall Bern, 2 Christopher Becker, 2 Jennifer S. Brodbelt* 1 1 Department of Chemistry, University of Texas at Austin Austin, TX 78712 2 Protein Metrics, Inc., San Carlos, CA 94070 Supplemental Material Detailed description of database search parameters and re-analysis of benchmark NUVPD datasets. Figure S1. MS/MS spectra generated by A) HCD (35 NCE) and B) UVPD (2 pulses, 2.5 mj per pulse) for peptide HNGAPAIDGIDDTIISDDTAR (3+) from a tryptic digest of proteins extracted from human hepatocytes Figure S2. Negative polarity MS/MS spectra generated by A) HCD (35 NCE) and B) UVPD (2 pulses, 2.5 mj per pulse) for peptide ELEQVCNPIISGLYQGAGGPGPGGFGAQGPK (2-) from a tryptic digest of proteins extracted from human hepatocytes. Figure S3. Overlap of peptides from HCD, UVPD and NUVPD replicates from tryptic digests of proteins extracted from human hepatocytes. Figure S4. Overlap of UVPD peptides identified by Byonic and SEQUEST+Percolator from tryptic digests of proteins extracted from human hepatocytes Figure S5. A) Portions of peptides identified by NUVPD versus UVPD from tryptic digests of proteins extracted from human hepatocytes. The peptides are ordered based on isoelectric point. B) Distribution of isoelectric point of peptides identified by UVPD and NUVPD. Figure S6. A-J) Annotated UVPD mass spectra for top ten peptides found among the peptides identified by Byonic but not SEQUEST + Percolator. Figure S7. ROC plot for NUVPD data analyzed by MassMatrix (blue) and Byonic (orange) from tryptic digests of proteins extracted from human hepatocytes Figure S8. ROC plot comparing 1D and 2D FDR results for UVPD data from tryptic digests of proteins extracted from human hepatocytes. The graphs indicate that the majority of improvement is based on the search algorithm used, not inclusion of 2D FDR. Figure S9. ROC plot comparing 1D and 2D FDR results for NUVPD data from tryptic digests of proteins extracted from human hepatocytes.the graphs indicate that the majority of the improvement is based on the search algorithm used, not inclusion of 2D FDR. S1
Figure S10. Distribution of sequence coverages obtained for proteins by HCD reported by Byonic (blue) and SEQUEST (red) from tryptic digests of proteins extracted from human hepatocytes. Figure S11. Distribution of sequence coverages obtained for proteins by UVPD reported by Byonic (blue) and SEQUEST (red) from tryptic digests of proteins extracted from human hepatocytes. Figure S12. Distribution of sequence coverages obtained for proteins by NUVPD reported by Byonic (blue) and MassMatrix (green) from tryptic digests of proteins extracted from human hepatocytes. Figure S13. Distribution of sequence coverages obtained for HCD, UVPD, and NUVPD from tryptic digests of proteins extracted from human hepatocytes Figure S14. A) HCD, B) UVPD, and C) NUVPD of phosphorylated LKDLFDYSPPLHK from a tryptic digest of proteins extracted from human hepatocytes, showing varying degrees of phosphate retention. Supplemental tables are included in Excel format that summarize all of the results for each MS/MS method. Table S1. Byonic HCD. List of peptides and proteins identified by Byonic from triplicate HCD data Table S2. Sequest HCD. List of peptides and proteins identified by SEQUEST from triplicate HCD data Table S3. Sequest UVPD. List of peptides and proteins identified by SEQUEST from triplicate UVPD data Table S4. Byonic UVPD. List of peptides and proteins identified by Byonic from triplicate UVPDD data Table S5. Byonic NUVPD. List of peptides and proteins identified by Byonic from triplicate NUVPD data Table S6. MassMatrix NUVPD. List of peptides and proteins identified by MassMatrix from triplicate NUVPD data S2
Byonic Parameters The following parameters were used for searching the LC-MS RAW files in Byonic: add decoys was selected. Cleavage Sites were set to RK and cleavage side was C-terminal, Digestion Specificity was set to Fully Specific and missed cleavages were set to 2. A precursor mass tolerance of 7 ppm was used and fragmentation type: UVPD/HCD/NUVPD were selected where appropriate. A 15 ppm fragment ion tolerance was used. The following modifications were searched for: Carbamidomethy (+57.021464) fixed at cysteine, variable deamidation (+0.984016) of asparagine (common), variable pyroglutamic acid (-17.026549) of glutamine (rare), variable pyroglutamic acid of aspartic acid (-18.010565) (rare), variable acetylation (+42.010565) at protein N-termini (rare). The maximum number of precursors per scan was set to 2, and FDR was set to 1% FDR. Unlike SEQUEST and MassMatrix, Byonic uses a protein aware FDR. The above search parameters were modified to include variable phosphorylation (+79.966331) of serine and threonine (rare), when searching for phosphopeptides Proteome Discoverer (SEQUEST) Parameters The following parameters were used for searching LC-MS RAW files in SEQUEST. Proteome Discoverer version 1.4.1.14 was used. Database searching was performed using the SEQUEST HT. Tryptic enzyme specificity was selected allowing up to two missed cleavages. A maximum delta Cn was set to 0.05. A precursor mass tolerance of 7 ppm and fragment mass tolerance of 0.02 Da was used. The following dynamic modifications were used: acetylation of N-termini, deamidation (+0.984) of asparagine, pyroglutamic acid (-17.027 Da) of glutamine, pyroglutamic acid (-18.011 Da) of aspartic acid. Carbamidomethylation (+57.021 Da) of cysteine was treated as a static modification. Result filtering was performed using Percolator with the following parameters: Max Delta Cn: 0.05, target FDR: 0.01 based on q-value. The above search was modified to include the ptmrs node for improved phosphopeptide searching. Phosphorylation (+79.966 Da) was added as a variable modification on S and T. MassMatrix Parameters The following parameters were used for searching NUVPD in MassMatrix. Trypsin was selected as the digestion method, and the fragmentation mode was set to UVPD. The following dynamic modifications were selected: acetylation of N-termini, deamidation of asparagine, pyroglutamic acid at glutamic acid, pyroglutamic acid (-17.027 Da) of glutamine. Iodoacetamide derivatization (carbamidomethyl) of cysteine was set as a fixed modification. The maximum missed cleavage was set to 2, the precursor mass tolerance was set to 7 ppm, and the default fragment mass tolerance of 0.05 Da was used. Minimum score of output was set to 2, minimum pp value and pp2 value was set to 4.3. The minimum pp tag was set to 4.0, and the maximum number of PTMs was set to 4; score of output was set to 2, minimum pp value and pp2 value was set to 4.3. The minimum pp tag was set to 4.0, and the maximum number of PTMs was set to 4. Re-analysis of benchmark NUVPD data Byonic NUVPD is one of very few algorithms adapted for database searches of LC-MS scale negative polarity datasets (MassMatrix and OMSSA are the only others familiar to the authors). The greatest gains in protein and peptide identifications were made by Byonic for the NUVPD data. In an effort to normalize instrumental variability, the original NUVPD benchmark dataset (citation given below) was re-analyzed using the newly customized Byonic NUVPD algorithm. This original NUVPD results reported in the reference shown below were obtained using a S3
ThermoFisher Scientific Orbitrap Elite mass spectrometer where UVPD is performed in the HCD cell, and these results were re-processed using the Byonic algorithm. Madsen et al. identified 659 proteins and 2350 peptides using NUVPD of a tryptic digest of HeLa cell lysate in the 2013 study. Re-analysis of the triplicate data using the Byonic NUVPD algorithm resulted in identification of 1114 proteins and 4417 peptides respectively, a 69% improvement in protein identifications and an 88% improvement in peptide identifications, thus demonstrating the enhancement in performance derived from the improvement of the Byonic algorithm. Madsen, J. A.; Xu, H.; Robinson, M. R.; Horton, A. P.; Shaw, J. B.; Giles, D. K.; Kaoud, T. S.; Dalby, K. N.; Trent, M. S.; Brodbelt, J. S. Mol. Cell. Proteomics 2013, 12 (9), 2604 2614. S4
Figure S1. MS/MS spectra generated by A) HCD (35 NCE) and B) UVPD (2 pulses, 2.5 mj per pulse) for peptide HNGAPAIDGIDDTIISDDTAR (3+) from a tryptic digest of proteins extracted from human hepatocytes. S5
Figure S2. Negative polarity MS/MS spectra generated by A) HCD (35 NCE) and B) UVPD (2 pulses, 2.5 mj per pulse) for peptide ELEQVCNPIISGLYQGAGGPGPGGFGAQGPK (2-) from a tryptic digest of proteins extracted from human hepatocytes. S6
Figure S3. Overlap of peptides from HCD, UVPD and NUVPD replicates from tryptic digests of proteins extracted from human hepatocytes. S7
Figure S4. Overlap of UVPD peptides identified by Byonic and SEQUEST+Percolator from tryptic digests of proteins extracted from human hepatocytes. S8
Figure S5. A) Portions of peptides identified by NUVPD versus UVPD from tryptic digests of proteins extracted from human hepatocytes. The peptides are ordered based on isoelectric point. B) Distribution of isoelectric point of peptides identified by UVPD and NUVPD. S9
Figure S6. A-J) Annotated UVPD mass spectra for top ten peptides found among the peptides identified by Byonic but not SEQUEST + Percolator. A) UVPD (2+) B) UVPD (2+) S10
C) UVPD (3+) D) UVPD (3+) S11
E) UVPD (2+). F) UVPD (3+) S12
G) UVPD (2+) H) UVPD (3+) S13
I) UVPD (3+) J) UVPD (3+) S14
Figure S7. ROC plot for NUVPD data analyzed by MassMatrix (blue) and Byonic (orange) from tryptic digests of proteins extracted from human hepatocytes. S15
Figure S8. ROC plot comparing 1D and 2D FDR results for UVPD data from tryptic digests of proteins extracted from human hepatocytes. The graphs indicate that the majority of improvement is based on the search algorithm used, not inclusion of 2D FDR. S16
Figure S9. ROC plot comparing 1D and 2D FDR results for NUVPD data from tryptic digests of proteins extracted from human hepatocytes.the graphs indicate that the majority of the improvement is based on the search algorithm used, not inclusion of 2D FDR. S17
Fig S10. Distribution of sequence coverages obtained for proteins by HCD reported by Byonic (blue) and SEQUEST (red) from tryptic digests of proteins extracted from human hepatocytes.. S18
Fig S11. Distribution of sequence coverages obtained for proteins by UVPD reported by Byonic (blue) and SEQUEST (red) from tryptic digests of proteins extracted from human hepatocytes. S19
Fig S12. Distribution of sequence coverages obtained for proteins by NUVPD reported by Byonic (blue) and MassMatrix (green) from tryptic digests of proteins extracted from human hepatocytes. S20
Fig S13. Distribution of sequence coverages obtained for HCD, UVPD, and NUVPD from tryptic digests of proteins extracted from human hepatocytes. S21
Fig S14. A) HCD, B) UVPD, and C) NUVPD of phosphorylated LKDLFDYSPPLHK from a tryptic digest of proteins extracted from human hepatocytes, showing varying degrees of phosphate retention. S22