Development of an Automated Multi-Dimensional Workflow and. its Applications in Clinical Proteomics

Size: px
Start display at page:

Download "Development of an Automated Multi-Dimensional Workflow and. its Applications in Clinical Proteomics"

Transcription

1 Development of an Automated Multi-Dimensional Workflow and its Applications in Clinical Proteomics by Majlinda Kullolli To The Department of Chemistry and Chemical Biology In partial fulfillment of the requirements for the degree of Doctor of Philosophy in the field of Chemistry Northeastern University Boston, Massachusetts April

2 Development of an Automated Multi-Dimensional Workflow and its Applications in Clinical Proteomics by Majlinda Kullolli ABSTRACT OF DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Chemistry in the Graduate School of Arts and Sciences of Northeastern University, April

3 Abstract Plasma and/or serum are attractive biofluid specimens for the detection and identification of disease-specific biomarkers. Due to the complexity and wide dynamic range of protein concentrations in these samples, pre-fractionation is necessary in order to reduce the complexity of the sample prior to protein identification by mass spectrometry. Lectins are widely distributed in nature and have the ability to recognize carbohydrates structures. Thus, for several decades the specificity of lectincarbohydrate recognization has been explored in biology and medicine. The major application has been lectin affinity chromatography; where the unique specificity of a ligand-biomolecule interaction is considered to be one of the most specific separation methods to isolate glycoproteins. My thesis work focused on the development of high performance multi lectin affinity chromatography support (HP-M-LAC). The support was fully characterized to obtain a lectin affinity HPLC column designed for optimal capture of the plasma glycoproteome. For this purpose the ligand density, immobilization kinetics, and elution conditions were optimized. Due to the high flow rate/pressure properties of this HPLC support; the HP-M-LAC has been automated for high throughput sample fractionation in clinical proteomics. This platform has been applied to a number of clinical proteomics studies, such as colon cancer, multiple sclerosis, obesity and type 2 diabetes after gastric bypass surgery. As part of a biomarker discovery program, we have studied well-defined clinical specimens and have found that with 3

4 specific disease mechanisms, proteins that are up or down-regulated can be identified and verified by ELISA and Western blotting. 4

5 ACKNOWLEDGEMENTS There comes a time that we have the chance to acknowledge all the people that have directly shaped our lives and work. I want to express my deepest gratitude to my family, friends, teachers, and colleagues. First, I would like to thank my advisors Dr. Marina Hincapie and Professor William S. Hancock, for giving me the opportunity to join the research group. Dr. Hincapie provided me with more than technical guidance, feedback, and research ideas; her mentoring, support and advice have helped me become the scientist that I am today. I sincerely thank all the current and former members of the Barnett Institute who trained me in different analytical techniques. I especially thank Dr. Tomas Rejtar for assisting me with his expertise in the analytical field and bioinformatics, his encouragement, and friendship. Also, I would like to thank Dr. Enrique Arrevalo and Russ Constantineau for spending hours in training me on mass spectrometry. I would like to thank Dr. Jonathan Bones for taking his time to read and comment on my thesis. Looking back in the years I spent at Northeastern University there is so many people whose help I appreciate including Jefrey Keilselman, for being my IT support during my Ph.D. years at Northeastern University, members of the Barnett Institute, Jana Volf, Bill O Neil, and Felicia Martin. Many thanks go to the present and former members of Prof. Hancock and Prof. Karger s groups who over the years became the most valuable resource especially Agnes Rafalko, Fateme Tousi, Zhi Zeng and Dr. Tatiana Plavina. 5

6 I am grateful to all my friends from all over the world, who crossed my path over these years. I thank you all for always being there for me through the toughest times, and always offering their kindness, support and warm friendship: Manjola Tatili Cronstrom, Anjeza Flamuraj, Drilona Baxhaku, Julinda Tatili, Dayana Argoti, Rose Ganthungu, Milena Virranoskwi and Petrena Papadopoulos. Finally, my biggest thanks are reserved for my family. I thank my family for the value they placed on education, of their unconditional love, patience, full encouragement, and for supporting me morally and financially through this long and beautiful journey. 6

7 TABLE OF CONTENTS Abstract...3 Acknowledgements...5 Table of Contents...7 List of Figures List of Tables Chapter 1. Proteomics: Overview of Technologies for Plasma Biomarker Discovery 1.1. Introduction Clinical Proteomics and Biomarker Discovery Human Plasma Proteome Proteomic Methodologies for Global Plasma Protein Profiling Two-Dimensional Gel Electrophoresis Protein Profiling Shotgun Proteomics Enzymatic Digestion of Proteins Mass Spectrometry Protein Identification and Database Search Quantitative Proteomics Stable Isotope Labeling Label-free Quantitation Removal of Abundant Proteins

8 1.3.2 Plasma Glycoproteomics Lectin Affinity Chromatography in Human Plasma Glycoproteome Plasma Peptidomics Validation of Results Background of Proteomics Research in Specified Disease Areas Multiple Sclerosis Obesity, Diabetes, and Gastric Bypass Surgery Conclusions References Chapter 2. Preparation of a High Performance Multi Lectin Affinity Chromatography (HP-M-LAC) Adsorbent for the Analysis of Human Plasma Glycoproteins and an Automated Platform for Fractionation of Human Plasma Glycoproteome in Clinical Proteomics Abstract Introduction Experimental Material and Chemicals Immobilization Chemistry and Ligand Optimization Depletion of Abundant Proteins Using Multiple Affinity Removal Column HP-M-LAC Column Characterization Automation and Fractionation of Human Plasma

9 Total Protein Concentration Assay HP-M-LAC Platform Reproducibility Sample Desalting Peptide Seperation and Sequencing by nano-lc-ms/ms Database Searching and Data Analysis Results Kinetics and Ligand Density Optimization Column Evaluation Column Characterization and Performance Recovery Studies of HP-M-LAC Platform Development of a Strategy for Plasma Glycoproteome Reproducibility Studies of HP-M-LAC Platform Specificity of the Multi Lectin Column Conclusion References Chaper 3 Investigation of Potential Biomarkers of Multiple Sclerosis Using Proteomics and Peptidomic Analysis of Human Plasma Abstract Introduction Material and Methods Materials and Reagents Proteomic Analysis Trypsin Digestion

10 Sample Desalting Peptide Separation and Sequencing by nano-lc-ms/ms Database Searching and Data Analysis Peptidomic Analysis ELISA and Western blot Confirmation of Identified Candidate Markers Results and Discussion Background Proteomic Analysis Peptidomic Analysis Apolipoprotein E Fibrinogen Cytoskeletal Proteins Conclusions References Chapter 4 Analysis of Plasma Proteome for Obese Subjects and Obese Type 2 Diabetes Subjects After Gastric Bypass Surgery Abstract Introduction Experimental Materials and Reagents Study Design and Plasma Samples Methods

11 Trypsin Digestion Evaluation of Trypsin Digestion by HPLC Peptide Separation and Sequencing by nano-lc-ms/ms Database Searching and Data Analysis Results and Discussion Platform Reproducibility Trypsin Digestion Proteomic Analysis Proteomic Analysis of Individual Patients Proteins Discriminating Between Diabetic Responders and Non-diabetic versus Diabetic Non-responders Comparison of Pools versus Individuals Conclusions References

12 Appendix Additional Publication Related to Research Work Performed During this PhD Graduate Course Analysis of EDRN-WHI Samples using depletion of the top 12 abundant proteins combined with Multi-Lectin Affinity Chromatography 197 SeMoP: A New Computational Strategy for the Unrestricted Search for Modified Peptides Using LC MS/MS Data 221 A HUPO test sample study reveals common problems in mass Spectrometry based proteomics

13 LIST OF FIGURES Chapter 1 Figure 1. Overview of Human Plasma Proteome Figure 2. Overview of Electrospray Mechanism Figure 3. Illustration of FTICR-MS Figure 4. Most Abundant Proteins in Plasma Figure 5. A schematic workflow of plasma analysis Figure 6. Data Analysis Workflow for Biomarker Discovery Figure 7. Illustration of Sandwich ELISA Method Chapter 2 Figure 1. Ligand Ratio Immobilization Figure 2. Binding Characteristics Figure 3. Loading Capacity Figure 4. HP-M-LAC Platform Figure 5. Comparison of Agarose and HP-M-LAC Platform Figure 6. Stability of Antibody Columns Figure 7. Reproducibility of HP-M-LAC as Measured by 1D SDS-PAGE Figure 8. Measurement of the Reproducibility of the HP-MLAC

14 Figure 9. Specificity of M-LAC Column Showed by re-chromatography 123 Chapter 3 Figure 1. A Schematic Diagram for Data Analysis Figure 2. Proteomic Workflow Figure 3. Peptidomic Workflow Figure 4. Desalting of Human Plasma Filtrate using C 18 Column Figure 5. Apolipoprotein E Figure 6. Fibrinogen Peptidome Figure 7. Vinculin Proteome Analysis Figure8. Gelsolin, Proteomic and Peptidomic Figure 9. Thymosin beta-4 Peptidome Analysis Chapter 4 Figure 1. The Typical Chromatogram obtain during HP-M-MLAC Figure 2. SDS-PAGE Analysis of the Pools Figure 3. Reproducibility Study Figure 4. Desalting of Peptides prior to LC-MS/MS Analysis Figure 5. Trypsin Digestion of the Pools Figure 6. Trypsin Digestion of the Pools

15 Figure 7. Data Analysis Design Figure8. Acute Phase Reactant Proteins Figure 9. Lipid Binding Proteins Figure 9. Hormone Binding Proteins Figure 10. Analysis of Attractin

16 LIST OF TABLES Chapter 1 Table 1. Lectin Specificity Chapter 2 Table 1. Kinetic s Immobilization Table 2. Quality Control for Each Lectin Column Table 3. Protein Recoveries for HP-M-LAC Table 4. Examples of Proteins Identified with Higher Coverage by Acetic Acid Table 5. Reproducibility and Recovery Study of HP-M-LAC Platform Table 6. Partial List of Proteins Identified and glycoprotein distribution In the HP-M-LAC Fractions Chapter 3 Table 1. Interesting Proteins Identified in the Plasma of Multiple Sclerosis and Matched Controls Table 2. Proteins whose Fragments were Identified to be Present at Different Concentration in the Plasma of Multiple Sclerosis

17 Table 3. Summary of Analysis Chapter 4 Table 1. Results of the HP-M-LAC Column for both Sets of Pools Run one month apart Table 2. Summary of Digestion Protocols Table 3. Proteins of Interest that were Seen up and down Regulated in Disease Samples of the Bound Fraction Table 4. Proteins of Interest that were Seen up and down Regulated in Disease Samples of the Unbound Fraction Table 5. Summary of the Individual Patients Analyzed Table 6. Results of the HP-M-LAC for Individual Patients

18 Chapter 1 Proteomics: Overview of Technologies for Plasma Biomarker Discovery 1.1 Introduction 18

19 One of the most successful achievements in the life sciences has been the characterization of the human genome. Knowledge regarding the genetic make up of human beings has enabled new frontiers in gene therapy, diagnostics, and drug discovery. However, the completion of the human genome has forward a more challenging project: the characterization of the human proteome, as proteins are the functionally expressed products of the genes. 1 The term proteome was first coined in the early 1990 s and was used to describe the set of all proteins expressed by a given genome, cell, tissues, or an organism. 2 Proteomics is not without criticism, some have described it as new name for protein biochemistry. 3 However, this overlooks one of the essential features of proteomics. In contrast with protein biochemistry, which focuses on the detailed chemical characterization of a protein, proteomics takes a broader, more comprehensive view and utilizes a systematic approach to understand the biology of the proteins. 3 Furthermore, another difference between protein biochemistry and proteomics is that proteomics is discovery driven and not hypothesis-driven. 3 For example, proteomics aims to discover new disease markers that may reveal the etiology of different diseases, whereas the protein biochemical approach focuses on the characterization of a protein in detail. 3 The broad application of proteomics is in the identification, characterization, and quantification of proteins in cells, tissues, or body fluids. This has the potential to accelerate our understanding of disease, and facilitate the discovery of diagnostic, prognostic, or therapeutic response markers. The composition of the proteome within a clinical subject varies on the external and internal factors such as environment, disease, 19

20 age, sex and even the last meal. As a result, new tools and platforms have been developed to study the proteome. Proteomic technologies have significantly impacted the life sciences over the past 15 years. 4 The rapid development of current proteomic methods is a direct result of advances in technologies, such as protein and peptide separation, for example, multidimensional chromatographic modes and electrophoresis, which are applied to reduce the sample complexity and thereby, to expand the dynamic range. 5 A number of technologies are used in proteomics to identify, quantify, and characterize proteins, such as twodimensional electrophoresis, liquid chromatography, imaging, mass spectrometry, and bioinformatic tools. The application of proteomic techniques in the medical field is known as clinical proteomics. Clinical proteomics is aimed of developing new diagnostics and prognostic tests, and identifying new therapeutic targets. The research presented in this thesis is focused on the development of a new automated platform for plasma proteomics with subsequent application to clinical samples derived from patients with colon cancer, autoimmune (multiple sclerosis) and metabolic syndrome diseases (obesity and comorbidities) for biomarker discovery. Serum and plasma are the most commonly used diagnostic fluids for biomarker discovery. Plasma has several advantages over other biological fluids; it is easily obtained, is the most abundant source of proteins. Blood circulating through the entire organism and is in intimate contact with organs and tissue. It is thought that in addition to the normal proteins, proteins resulting from pathological changes are released into the blood-stream. 6 The study of the plasma proteome is 20

21 a challenge due to the complexity and the extremely wide range of concentration of proteins, 10 orders of magnitude. 7 In addition, the plasma proteome is heavily glycosylated (glycoproteome). 8 Glycosylation of proteins is one of the most ubiquitous post-translational modifications observed in eukaryotic organisms; it is estimated that roughly half of the mammalian proteome consists of glycoproteins. 9 A commonly used approach to study plasma glycoproteome is lectin affinity chromatography. Lectin affinity chromatography has been a major analytical tool in the glycoproteomics field. 8 Lectins are proteins that have affinity for particular oligosaccharide structures displayed at the surface of a protein. We previously reported the use of multi-lectin affinity chromatography (M-LAC) for studying the plasma and/or serum glycoproteome. 8 We showed that a multi-lectin affinity column (M-LAC) provided a comprehensive capture of glycoproteins from biological fluids and was sensitive to the changes in glycosylation under altered physiological state Clinical Proteomics and Biomarker Discovery Clinical proteomics has the potential of providing tools for diagnosis and prognosis. 10 With this objective most of the clinical proteomic research is focused on biomarker discovery with blood (serum or plasma). The term biomarker has traditionally referred to analytes in biological samples that track changes in disease. 11 Now a biomarker is defined as an umbrella coalescence term which covers the use and development of tools and technologies, monitoring of drug discovery, development, and understanding of prediction causes, progression, regression, outcome, diagnosis, and treatment of disease

22 Biomarkers are classified according to their intended functions into four different categories: prognostic, predictive, pharmocodynamic and, surrogate endpoints. Prognostic biomarkers are used to determine the disease progression in patients; predictive biomarkers are used to determine the likelihood that a patient will respond to a treatment; pharmocodynamic biomarkers test the biological response after drug treatment, and the surrogate endpoints biomarkers 12, 13 are intended to be used as a substitute for a clinical efficacy endpoint. Although there are numerous biomarkers to predict disease or disease course, the number of biomarkers receiving FDA approval has decreased during the last years. This is attributed to the challenges associated with the lack of biomarker validation. 7 For a successful protein candidate marker identified in plasma to become a biomarker, four phases need to be followed: study design, discovery, identification, and clinical implementation. 14 Study design is the most critical phase of the process as the objective is to have a detailed clinical question, the types and number of samples to be analyzed need to be defined, and the proteomic workflow being used need to be described and understood in order to generate meaningful data for subsequent interpretation. 14 In order to obtain good statistical data, sufficient number of samples and a variety of defined samples needs to be employed. In the discovery phase the objective is to identify a number of potential candidate markers. During the validation phase, the candidate markers identified in the discovery phase are evaluated within a larger and more heterogeneous population. 15 There are two different approaches for biomarker discovery in 22

23 proteomic analysis. The targeted approach which is based on evaluating a specific biomarker, on either the basis of the biological rationale, or derived from 16, 17 other sources. The other approach is the de novo discovery approach that uses different proteomics technologies for discovery of a biomarker and validates the potential biomarker candidate. 16 The clinical assay implementation phase entails the development and optimization of the biomarkers that are robust, sensitive and quantitative to be used be utilized in the clinic. These assays may be either chromatographic assays or antibody based assays Human Plasma Proteome Diagnosing disease outcome based on plasma profiling is a particularly attractive concept. Is believed, that plasma is the most informative sample that can describe an individual s current state of health. 7 There have been always discussions regarding whether plasma or serum should be collected for proteomic analysis. Several studies have shown that a large number of peptides are detected in serum samples, and not in plasma samples. 18 This is believed to be due to ex vivo proteolysis and subsequent protein degradation during the clotting procedure of plasma, suggesting that plasma may be a better biofluid than serum for biomarker discovery. 19 Plasma samples are collected everyday in large amounts. A big challenge with plasma is sample collection and handling. Blood contains a number of proteases; a few of them are associated with the coagulation pathways. 20 Proper sample collection, handling, and storage are required to maintain sample integrity. Samples should be obtained at multiple collection and different collection sites in order to minimize systematic bias; storage of samples at 23

24 4 C, and freeze-thaw cycle should be avoided to minimi ze coagulation of plasma 21, 22 and associated proteolysis. Handling parameters, such as overnight fasting time, time and speed of centrifugation have been shown to have a minor effect on plasma proteome. However, standardize protocols for plasma/serum handling, storage, and collections are required in order to obtain reproducible data. Lastly, another issue with biomarker discovery using human plasma is biological variability; protein expressions in samples from diverse human populations could be very significant due to environmental factors and lifestyle. 23, 24 In addition, the technical challenge of studying serum/plasma proteome is the sample complexity and the steep dynamic range, due to the high concentration of serum albumin at ~50 mg/ml and very low abundant proteins, such as cytokines, which circulate at levels of ~1-10 pg/ml (Figure1). Due to this large dynamic range of proteins in plasma, there are a number of techniques that can assist with decreasing the dynamic range of plasma prior to mass spectrometric analysis. 7 For instance, reducing the concentration of the top high abundant proteins with affinity based capture reagent, has been shown to improve the discovery and detection of less abundant proteins An alternative method to reduce the dynamic range of protein concentration is ProteoMiner protein enrichment technology. This technology is based on treatment of complex protein samples with a large and highly diverse library of hexapeptides bound to chromatographic supports. In theory, each unique hexapeptide binds to a unique protein sequence. The beads a low binding site therefore, high-abundance proteins quickly saturate the binding sites; excess protein is washed out during the procedure. 27 In contrast, low- 24

25 abundance proteins are concentrated on their specific ligands, thereby decreasing the dynamic range of proteins in the sample. The limits of this technology include situations where the ligand library diversity is not always sufficient for all proteins within the sample, the dissociation constant has a value above the initial concentration, or when an insufficient volume sample is available as to provide enough analyte to concentrate beyond the lower limit of detection. 27 Other chromatographic separations like strong cation exchange, reversed phase, or enrichment techniques prior to mass spectrometry have also proved beneficial. 28 Different proteomics methods have been applied to human plasma, the most commonly used for proteomics analysis are: a) two dimensional polyacrylamide gel electrophoresis (2D-PAGE), b) proteomic profiling (SELDI-TOF MS), and c) shutgun proteomics (LC-MS/MS). 25

26 Tissue Leakage Interleukins, etc Classical Plasma Proteins Figure 1. Overview of the entire dynamic range of the plasma proteome. (adapted from reference 10) Methodologies for Global Plasma Protein Profiling Two-Dimensional polyacrylamide gel electrophoresis Two-dimensional (2D) gel electrophoresis was the first proteomics technique and is still widely used in the proteomics research. This technique 26

27 29, 30 consists of two dimensions of separation. The first dimension consists of isoelectric focusing (IEF), where the proteins in a sample are separated by their isoelectric points (pis) in a strip or tube of acrylamide gel containing an immobilized ph gradient (IPG). Following focusing, the IPG strips containing the proteins are incubated with sodium dodecyl sulfate (SDS), a denaturing detergent that adds a negatively charge to each protein in relation to its mass. In the second dimension, the proteins are separated by SDS polyacrylamide gel electrophoresis (SDS- PAGE) by molecular weight. The proteins can be visualized by various methods, such as staining with reagents that contain a chromophore or fluorophore, or by autoradiography. 31 2D-DIGE is an alternative staining method for identifying differentially abundant proteins. Different protein samples are individually labeled with fluorescent dyes (Cy3 and Cy5) that have similar mass and charge, and excite at different wavelengths. The samples are mixed together in equal amounts and run on the same gel. Using a fluorescent scanner, one can visualize both samples on the gel and quantitatively compare the amount of protein in a spot from each sample. Different samples are run in the same gel in order to compare the abundance between gel spots. The gel spots are then excised and subjected to enzymatic in-gel digestion by trypsin or other sequence specific protease. The peptides are then analyzed using a mass spectrometer (MS). The mass-to-charge ratio of the peptides is measured, and algorithms, such as MASCOT are used to compare the mass-to-charge ratios of the peptides to a database for protein 32, 33 identification. This approach is widely used in separating isoforms of identical 27

28 proteins. Although this method has been applied to various biomarker discoveries, it has several drawbacks. The main concern with this method is reproducibility, time and is a labor intensive method, the exclusion of the proteins with high and low pi and molecular weights are problem Protein extraction/digestion and peptide concentration from gel spots is variable and sometimes inefficient. Due to the large dynamic range of the proteomic samples, the same spot can have many proteins. In order to overcome this problem a specific fractionation or enrichment can be employed prior to 2D-PAGE Protein Profiling (SELDI-TOF) Expression profiling requires a very sensitive method to be able to detect the minimum changes between two different samples, as well as high throughput, to analyze a large number of samples. As mentioned above, 2D-PAGE provides excellent separation of samples but is not a high-throughput method, for this reason, there is a need for more sensitive, less labor intensive and higher throughput methods. Surface-enhanced laser desorption/ionization time of flight MS (SELDI-TOF) is a chip-based method where multiple samples can be analyzed 37, 38 using very small sample aliquots (1uL for plasma/serum and for cells). There are three major components of the SELDI-TOF technique: a) the protein chip array, b) mass analyzer, and c) data analysis software. The protein chip arrays are composed of different chromatographic modes of separations that are used to bind the protein of interest. The sample is loaded into the chip, then allowed enough time to bind with the chromatographic surface, then washed to remove the unbound proteins with the appropriate solvent. After the chip is 28

29 allowed to dry, a UV absorbing matrix like sinapinic acid is added. The chip is placed into the mass analyzer to analyze the bound proteins. The mass analyzer is time of flight (TOF) based, the laser irradiates the sample, which is then ionized and accelerated towards the ion detector. A low resolution complete proteome profile is obtained based upon the m/z values detected. The main disadvantage of SELDI is that, it gives only spectral features based on m/z values. Thus, SELDI is a good indicator of protein expression but it lacks the specificity of LC-MS. When the technology was first developed there were great expectations in the clinical biomarker discovery field, however this was proven not to be the case, due to the low resolution and specificity Shutgun Proteomics (LC-MS/MS) The combination of liquid chromatography (LC) with MS has allowed the analysis of the proteome in greater depth. The shutgun approach has been linked to the development and advancement of the MS and tandem MS technologies. 39 During a typical shutgun experiment a protein mixture is cleaved by proteolytical digestion first (most commonly used protease in proteomics is trypsin that cleaves at the C-terminal arginine or lysine residue), resulting in several tens-thousands of peptides. The resulting peptide mixture is separated by multiple tandem chromatographic methods prior to introduction to MS. The final step of the analysis is to interface a nano-capillary reversed-phase HPLC column with a mass spectrometer. 40 The shotgun approach using LC-MS/MS technique has distinct advantages over gel based techniques in terms of speed, sensitivity, scope of analysis, and 29

30 dynamic range. 41 Qualitative and quantitative analysis can be performed using shotgun proteomics. If qualitative analyses are desired unlabeled sample can be used. Proteins can be identified using a database searching algorithms such as SEQUEST, XTandem or MASCOT. If quantitative comparison of two samples is desired, samples need to be labeled using a stable isotope labeling Enzymatic Digestion of Proteins Enzymatic digestion prior to MS is a fundamental step in proteomic analysis of complex mixtures. This thesis work concentrates on using in solution digestion prior to MS using trypsin. Trypsin is the most commonly used enzyme in shotgun proteomic, due to the specificity of cleavage (at arginine and lysine), resulting in double and tripled charged peptides, which are then analyzed by LC-MS/MS. 39 In solution trypsin digestion has some drawbacks such as; long digestion time, low through put, and sample handling. In solution trypsin digestion, protein folding will protect the protein from enzymatic proteolysis, and therefore, denaturing step is necessary. Commonly used denaturing reagents are 8 M Urea, 6 M guanidine- HCl, or thiourea. For complex mixtures and large proteins, breaking disulfide bond is necessary in order to obtain high sequence coverage in MS. Reduction of dissulfide bonds is accomplished by using reducing agents such as dithiothreitol (DTT) or tris-(2-carboxyethyl)posphine (TCEP). The reduction of disulfide bonds is followed by the alkylation of cystein using iodoacetamide (IAA). As result, all of these steps in sample preparation for in-solution trypsin digestion cannot yet be fully automated Mass Spectrometry (MS) 30

31 Mass spectrometry (MS) has become the essential tool in proteomics due to its ability to characterize small amounts of biological samples. This sensitivity associated with MS analysis requires very clean samples. The other aspect of this advantage is that even the smallest contaminant can ruin a perfect and well executed analysis. Therefore sample preparation is key in obtaining good mass spectrometric data. In this thesis, all the proteomic analysis were performed using ESI-Ion Trap or ESI-Linear Ion Trap Fourier Transformer MS. In this section, a brief overview of ESI, Ion Trap and FT-MS will be given. Mass Spectrometry was first developed at Cambridge University in 1897 from Thomson and Aston. By the mid 1930 s MS had become an established technology for separation of atomic ions by mass. 42 In 1940 s, MS had spread from the academia to industry, the first commercially available MS was launched in 1948, at this time MS it had a very limited detection range of 300 Da and very limited resolution. 42 This was followed by development of time-of-flight (Wiley and Maclaren) and quadruple mass analyzers (Paul) in The next major development was the coupling of gas-chromatography to MS. 43 Over the last 25 years new isolation techniques were developed, such as fast particle desorption, electrospray ionization and matrix-assisted laser desorption/ionization. Currently, there is a wide variety of mass spectrometers available with various performance capabilities for the detection of proteins and biological molecules by virtue of their sensitivity, speed and easily to use. MS can identify molecules in the attomole/femtomole range and it can provide protein identified in minutes instead 31

32 of hours, advancements that have revolutionized analytical chemistry and facilitated the entire field of proteomics. Any mass spectrometer has four main components, the source where the ionization takes place, the mass analyzer where the ions are sorted by (m/z), the detector where the ion abundance is detected, and system where the data is analyzed. 44 Ionization Methods: There are two types of ionization methods in mass spectrometer soft ionization and hard ionization methods. For soft ionization methods like ESI and MALDI the ionization can occurs either by adding or removing hydrogen to create charged species. The addition of a hydrogen results in producing a positively charged ion [M+H] + in a acid environment. Negatively charged ions are created by the removal of an hydrogen ion, forming a molecule ion [M+H] -, generally in an alkaline environment. 45 Multiple charged ions are created for large biomolecules, as these biomolecules have multiple ionization sites capable. Depending on the number of the hydrogen atoms added, a different m/z ratio will be acquired. 46 The m/z ratio is determined by the following formula For positively charged ions m/z = (M+zm p )/ z For negatively charged ions m/z = (M-zm p )/ z where M is the molecular weight of the analyte in Da, z is the charge and m p is the proton mass presented in Da ( usually 1 or 1.008). 45 ESI is one of the two major ionization techniques currently employed. Compared to other ionization techniques, ESI is able to form single and multiply charged ions. For this ions to be formed, the solution is introduced through a small 32

33 47, 48 diameter needle in the presence of high electric field. This electric field creates unstable liquid cone (Taylor cone), due to this instability the droplets are moving towards the counter electrode. As charges built up on the tip of the Taylor cone a neubalizing gas, N 2, is used to reduce the size of the droplets due to evaporation. The net charged of the droplets is kept constant and as a result the Rayleigh limit is exceeded, electrostatic repulsion greater than the surface tension holding the droplets together result in the explosion of the droplets and expulsion 49, 50 of gas-phase ions. The charged ions then continue due to the potential difference and travel towards the counter electrode into the mass spectrometer as shown in Fig 2. Furthermore, ESI is the most applicable ion source of coupling MS with liquid chromatography (LC), by coupling MS with LC the analysis could be done online or offline. During online analysis the peptides of interest are eluted from the LC column of choice into the ESI source and detected by MS. During offline analysis the sample could be eluted from the LC column and analyzed via ESI-MS at a later time. ESI has the advantage that the needle can be optimized for different flow rates, the needle orifice can be as small as 1 µm, leading to formation of small microdroplets. 47 The main advantage of having smaller droplets is associated with increase in the sensitivity, higher tolerance for buffer salts, and the lack of necessity to use a sheath gas. Nanospray has few advantages over microspray even though they could be interchangeable, nanospray delivers the sample from nl per min, and the amount of solvent used is much smaller than the micropumps. Most modern pumps available for LC-MS have micro and nano flow rate capabilities; the micro pump is used for sample loading and the 33

34 nano pump is used for gradient delivery for peptide elution to MS. In the microlc system the chromatography is performed at 3-10µL/ min, the LC column used is about 10-20cm long with inner diameter of 300µm. The separation performed under microlc conditions is fast, thus multiple samples can be analyzed in a short time, but the sensitivity is low. To achieve better sensitivity and identify peptides at low femtomole levels the column diameter needs to be reduced from 300 µm to µm. As result of smaller column diameter, the separation needs to be performed at lower flow rates due to back pressure limitations. The separation of the peptides in nanolc system is performed at nL/min. Coupling of LC with the mass spectrometer has become a universal tool in life sciences. Even though tandem MS can analyze peptides, the complexity of the peptides from the proteomics workflow is still a big challenge for MS alone, for this reason coupling of LC with MS was a requirement in the field. The constraints to perform LC-MS analysis online include volatile solvents and ion-pairing reagent that will not suppress ionization. 51 The most commonly used reagents in the upstream proteomics experiments are water/acetonitrile in the presence of formic acid. NanoLC systems are even more challenging to use, especially for quantitative analysis as the reproducibility needs to be carefully monitored. The most commonly used source for online LC-MS separation is ESI, this coupling is convenient because both HPLC and ESI operates in liquid phase. Offline separation is required usually when using MALDI ionization. In this case the eluted peptides from the LC column are spotted in the MALDI plate and mixed with the 34

35 matrix to form crystals. After the spotting into the MALDI plate, the plate is transferred into the mass spectrometer. Oxidation High Voltage Reduction Taylor Cone Electrospray Needle Column Eluate Fission Capillary Endplate Capillary Atmospheric Pressure High Vacuum Figure 2. Electrospray mechanism Solvent [M A + H] + Droplet approaching Coulomb Explosion and Mass Analyzers: The most Evaporation essential component Rayleigh Limit in MS ion is evaporation the mass yielding analyzer, desolved ions [M B + H] + where the analytes are separated based on their m/z. There are five types of mass analyzers, magnetic sector, fourier transform ion-cyclotron resonance (FT- ICR), time of flight (TOF), quadrupole, and quadrupole ion trap. The characteristic of each mass analyzer includes: mass resolving power, mass accuracy, mass range, linear dynamic range, precision, efficiency, duty cycle, speed, cost and size. 52 The remainder of this section will only focus on linear trap quadrupole (LTQ) which was used for most of the research of this dissertation. An LTQ works on the principle of the quadrupole mass analyzer, which is the most widely, used type of MS. The quadrupole consists of four parallel surfaces in the shape of 35

36 cylindrical rods; the ratio of frequency (RF) to direct current (DC) is mantained constant, while RF amplitude and DC potential are varied. 53 The ions produced into the ion source travel through the middle of the quadrupoles where they will be trapped by the changing in the polarity of the electrodes. At a certain frequency (RF), ions with a specific m/z will become destabilized and will exit through the detector and create a mass spectrum. 53 To differentiate the linear ion trap from the quadrupole ion trap, the linear ion trap has the advantage that the traps are designed to increase the ion storage capacity and faster scan time. Ion traps are the most widely used instruments in proteomics, due to the many of advantages associated with this instrument. 53 Ion traps are very robust, not very expensive, since they have the ability to trap ions they can be used for MS n, they can be easy coupled with other mass analyzers such as orbitrap, FTICR, to overcome one of the disadvantage of linear trap, that is, the mass accuracy. Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS or FT-MS) employs a magnetic field and is capable of accurate measurements from 5 ppm up to 0.5 ppm. An LTQ-FT-ICR hybrid instrument was used for the data presented on Appendix A. Nobel Laureate EO Lawrence in 1929 was the first to introduce the cyclotron in 1974 Alan Marshall and Melvin Comisarow were the first to successfully use the FT-ICR. In the FT-ICR the ions are trapped in a cell under a magnetic field with electric trapping plates. 40, 54 The ions are excited to a larger cyclotron radius by an oscillating field perpendicular to the magnetic field. The larger the magnetic field, the greater the number of ions that can be trapped 36

37 and the greater the resolution that can be obtained. The mass of the ions is determined based on the cyclotron frequency of the ions in a fixed magnetic field following fourier transformation The hybrid instrument, LTQ-FT-ICR, uses the advantages of both types of mass analyzers for the identification of peptide and post-translational modifications. While the FTICR acquires an accurate mass measurement of the parent ion, the fast-scanning linear ion trap can simultaneously acquire multiple 58, 59 MS/MS datasets. Figure 3. Illustration of FTICR-MS (reprinted from reference 44) Ion Detection and Data Recording: The key component in the ion detection is the dynode. There are two types of dynodes; the discrete dynode electron multipliers and continuous dynode electron multipliers. 53 The continuous electron multiplier dynode is made of glass doped in lead and it has the shape of a trumpet. A voltage of kv is applied across the length of the detector. The ions leaving the mass analyzer and entering the detector eject electrons and move along the 37

38 surface ejecting more electrons with very high impact. 53 The current generated when the ions hit the first electron multiplier is greatly amplified (up to 10 6 ) which is large enough to be passed to a recording device. The electron multipliers have a few advantages; they are rugged and very reliable devices as well as they are very fast (with nanosecond respond time). The main disadvantage of the electron multiplier is their short lifetime. The electron multipliers will generate analogue signals, these signals will be digitized by a computer. The digitization is done by analogue-to-digital converter (ADC) resulting in series of voltage pulses being fed to a computer generating the mass spectrum Protein Identification and Database Search Currently there are two main strategies for protein identification in complex mixture: first, mapping strategies which rely mostly on the accurate mass and retention time; and second, tandem MS which is the most common approach for protein identification. 4 This approach relies on consequtive MS/MS fragmentation of the peptide. Peptides are first selected for fragmentation either by targeted approach or data-dependent in the mass spectrum and then are fragmented by one of the following methods collision induced (CID) or electron transfer selection (ETD). 4 For the work in this thesis we used data-dependent MS n (the most intense ions are selected for fragmentation) using CID fragmentation method. In CID, fragmentation occurs at the weakest bond in the peptide upon exposure to increased energy and collision gas; this often leads into problems with peptides that have labile modification such as glycosylation and phosphorylation. 4 38

39 Once that the ions have been selected, fragmented, and detected by MS, the ions are submitted to a database search where a peptide is assigned for each ion. In this case, the peptides are generated by an in silico digest of a proteome database and then a theoretical mass spectrum is predicted for each peptide. 4 The theoretical mass spectra of each peptide are compared to the experimental spectrum and a peptide assignment is performed on the basis of the best match of the theoretical spectrum with the experimental spectrum The most commonly used algorithm to interpret MS/MS spectra and assign sequences was developed in 1994 by the Yates group. 41 These algorithms are used in different databases to identify proteins. Common databases include NCBI, and Swiss-Prot. These databases maintain protein and peptide sequences derived from both genomic 63, 64 studies and experimental data. The SEQUEST algorithm assigns peptides based on their cross-correlation score (X corr ) between their theoretical MS/MS and the experimental mass spetrum of the peptides. The X corr strictly measures the quality of the match between the experimental and the theoretical mass spectra. However, there could be mismatched peptides assignments; therefore data is initially filtered by X corr to eliminate low confidence identification. The Human Proteome Organization (HUPO) has published X corr acceptance criteria (single charge X corr > 1.9, double charge X corr > 2.2, triple charge X corr > 3.75) for identifications performed using an LTQ instrument 18. In addition to filtering with X corr, a spectrum is further analyzed by using other filtering criteria such as probability based scoring. Thermo Corporation has incorporated a probability model which works with SEQUEST in 39

40 the Bioworks 3.2 software. probability scoring system. 65 Peptide and Protein Prophet have an alternative Together, the HUPO X corr criteria and 95% probability score will give high-confidence peptide based protein identifications. In this thesis work we used SEQUEST algorithm with SwissProt database, integrated in a Computational Proteomics Analysis System (CPAS). CPAS is a database and analysis tool, developed at Fred Hutchinson Cancer Research Center (FHCRC), which manages proteomics workflows. CPAS has many tools built into the viewer that can be helpful with analyzing the results including Xcorr 66, 67 filtering, peptide/protein probability, mass, unique peptides etc Quantitative Shotgun Proteomics Proteomics platforms that can efficiently identify and quantify changes in abundance of proteins or their modifications can offer a great promise for advancing biomedical research. It has become routinely used in proteomics laboratories for identification of a large number of proteins, thanks to advances in proteomic platforms, computing power and advances in bioinformatic. Relative quantitation has become the primary quantitation technique used in proteomic studies. Label-free quantitation and stable isotope labeling are the main methods for relative quantitation Stable Isotope Labeling Labeling of proteins and peptides with heavy isotopes such as deuterium, carbon-13, nitrogen-15, and oxygen-18 has become widely used in quantitative proteomics. Stable isotope labeling is based on stable isotope theory which states that a stable isotope-labeling peptide is chemical identical to its native counterpart 40

41 and therefore the two peptides also behave identically during chromatographic and/or mass spectromic analysis. 68 Many mass spectrometers with accurate mass capabilities can recognize the mass difference between the labeled and unlabeled peptide. Quantitation is achieved by comparing the peptide intensity of the two peptides. 69 The labeling approaches can be either incorporated metabolically in proteins, or using enzymatic or chemical modification. 68 In the stable isotope label is introduced into proteins during cell growth and division. This approach is most widely used in comparing cell culture based experiments, where one of the media is labeled with 15 N or 13 C and the other media is unlabeled. 68 The cell lysates are digested and the peptides mixed at equivalent amounts. Quantitation is achieved by comparing the intensity of the labeled peptides with the unlabeled peptide. This approach can only be used with cell cultures. In 2002 Mann and co-workers developed a similar isotope labeling for amino acid in cell culture (SILAC). 70 This approach consists of 13 C6-arginine and 13 C6-lysine labeling which ensures that all the tryptic cleavage will carry at least one labeled amino acid except the C-terminal of the peptide. In this approach the cells are grown in two identical cell cultures but one of them contains the light and the other one the heavy form of the particular amino acid. Since there is no chemical difference between the light and heavy amino acid the cells behave exactly the same, and the intensities of the labeled peptide versus unlabeled peptide are compared. SILAC is efficient and reproducible with 100 % incorporation of the label into the cells. However the metabolomic incorporation has some limitations such as; some cells lines are sensitive to changes in media compositions, another limitation is the limited 41

42 numbers of labels, which affects the ability to multiplex more than two samples at the time. 68 Enzymatic stable isotope labeling works by introducing the labeling at the enzymatic digestion. The enzymatic labeling can be performed either during the proteolytic digestion by trypsin or Glu-C, or after the digestion during the second incubation step with the protease in the present of heavy water (H 2 18 O). 68 The corporation of the heavy oxygen will occur at the C-termini of the peptide resulting in a mass shift of 2 Da per 18 O atom. Trypsin and Glu-C introduce two oxygen atoms resulting in a mass shift of 4 Da. The proteolytic digestion is performed in parallel between the normal digestion conditions and the heavy water digestion. 68 The peptides are mixed at an equivalent amount, the peak area of the labeled peptide and the unlabeled peptide is then measured and compared. A disadvantage of the enzymatically labeled incorporation is that full labeling is rarely achieved and different peptides incorporates the label at different rates, which complicates the data analysis. 68 Numerous strategies have been developed to incorporate chemical labeling to proteins. Gygi and co-workers in 1999 developed the isotope-coded affinity tag (ICAT) approach in which the thiol groups of the cysteine residues are specifically derivatized with reagents containing zero or eight deuterium atoms as well as a biotin group for affinity purification. 40 The drawback of this approach is that cysteine residues are rare which makes ICAT unsuitable for quantifying proteins that contain few or no cysteine residues 71. Another group of labeling reagents targets the amino acids in the N-terminus of the peptide and the epsilon-amino 42

43 group of the lysine residue. This labeling is achieved via specific N- hydrosysuccinimide (NHS) chemistry or other active esters an example of this approach is isotope-tag for relative and absolute quantification (itraq) A different approach that has been used in proteomics analysis is the isotope-labeled synthetic peptides spiked to a protein digest known as AQUA. 75 Unlike metabolomic labeling, where relative quantitation is performed for a large number of proteins, the addition of a synthetic peptide will give a more quantitative amount of one or few particular proteins. This approach is widely used for analysis and validation of biomarkers in a large number of clinical samples Labe-free Quantitation The label-free quantitation approach protein allows comparisons between two biological samples based on the relative intensity of the extracted ion chromatograms from complex mixtures. Therefore, this approach does not require extra sample handling steps and chemical/enzymatic reactions of the samples to be compared. However highly reproducible sample preparation, LC separation, electrospray, and MS performance are required. The principle of label-free quantitation is that there is a correlation between the intensity of MS signal and peptide concentration in a biological sample. Several groups have shown this correlation with simple mixture of a few analytes 77 and with peptides from mixtures 78, 79 of several proteins spiked into serum. Label-free quantitation is based on two protein measurements. The first measurement is based on peptide peaks area or peptide peak heights. The correlation of the peptide/protein with peak area has been first shown by Chelius et al. 79 They demonstrated the effect by loading 10 43

44 fmol to 100 pmol of myoglobin digest to a nanolc-ms and analyzed by LC/MS/MS the peak area of the extracted peptides. A linear increase with increasing concentration of sample was observed. 79 The correlation coefficient of the peak area vs the amount injected to nanolc was found to have a very strong correlation with r 2 = This correlation was observed even when the myoglobin peptides were spiked into the plasma. Although these studies showed that the quantification of peptides could be directly achieved by peak intensity comparison, this approach has some constraints such as; run to run sample differences and chromatographic shift over the course of multiple sample injection. 80 Peak area quantitation for a particular peptide can be challenging due to the random nature of the peptide selection for MS/MS analysis in a complex mixture. However, peak area quantitation using high resolution mass spectrometers is more facilitated when using lower resolution instruments. In addition, peak area measurements for a high abundant protein is more accurate than a low abundant protein. An alternative to peak area quantitation is quantitation based on spectral counts. Relative quantitation by spectral count has become widely used in biomarker discovery. This approach is based on the comparison of the MS/MS spectra of a protein in each LC/MS/MS runs. This is possible because there is a correlation between the abundance of the proteins and the MS/MS spectra identified for the protein. Liu et al demonstrated the correlation between the protein abundance and spectral counts per protein with a correlation coefficient of r 2 = Therefore, spectral counts can be used for relative protein 44

45 quantification. However, spectral count has been shown to work well with high to medium abundant proteins and not low abundant proteins indentified by less than 4 peptides Label-free approaches are the least accurate among all of the quantitation techniques. Nonetheless, label-free quantitation has no limits in how many samples can be compared, no time consumed on preparing synthetic peptides or finding the right labeling reagents and no additional costs for labeling reagents. 68 This approach has become accepted as first pass to identify candidate biomarkers using shotgun proteomics. As such, this method was applied throughout the work described in this thesis Abundant Protein Depletion and Plasma Fractionation Protein profiling of biological samples such as plasma and serum is very challenging regardless of the proteomic approach. The plasma proteome is dominated by a population of high abundant proteins which tend to mask the identification of low abundant proteins. 7 Reducing the concentration of these abundant proteins in plasma is a necessary step to identify medium to low abundant proteins. Twelve major proteins in human plasma/serum represent approximately 96% of the total protein mass in plasma (Fig. 4). 85 There are a variety of depletion methods available for removal of these proteins from the plasma/serum sample. Originally, dye-based separation methods were used to remove albumin (HSA), which present at least 50% of the plasma mass. 86 While this method for removal of albumin has high capacity, a disadvantage of this approach is the lack of specificity. The removal of immunoglobulins (IgG) is 45

46 commonly achieved by immobilizing protein A or protein G on to the affinity resins, which binds to the Fc region of IgG Several multicomponent affinity matrices based on antibody capture to target the most abundant plasma proteins were developed and are commercially available. For instance, Multiple Affinity Removal System (MARS) 6 and 14 from Agilent and IgY-12 and 20 from Sigma, these columns will deplete the top 6, 12, 14, and 20 most abundant proteins, 90, 91 respectively. One concern when affinity based depletion strategies are used, includes the loss of potentially important biomarkers due to non specific binding to depleted proteins or to nonspecific interactions with the affinity column. 92 For instance, HSA is known as a nonspecific binding protein due to its biological role as a carrier protein. 93 Figure 4. Most Abundant Proteins in Plasma However, it has been shown that depletion alone is not sufficient for plasma analysis. Further separation or enrichment techniques are needed to detect low abundant proteins. Other orthogonal fractionation techniques are required to achieve a more comprehensive analysis prior to LC-MS/MS. Different 46

47 chromatographic techniques have been used to fractionate either at the peptide or protein level, and can be integrated with or without depletion of abundant proteins, such as; RP-LC, 94 SEC, 95 anion exchange chromatography (AEC), 28 SDS-PAGE (slices of gel bands), 96 liquid-phase IEF, 97 and affinity chromatography. 98 Different laboratories apply different approaches for plasma fractionation; some researchers would combine two or more techniques in order to identify more proteins. For instance, Hanash and co-workers, Integrated Proteomic Analysis (IPAS), following the immunodepletion of abundant proteins the fractions are subjects to intact protein fractionation by anion exchange followed by reversed-phase chromatography. 28 The individual fractions collected from the reversed-phase fractionation are then digested and analyzed by LC-MS/MS. In this approach the extensive sample fractionation allows for the reduction of sample complexity. It was demonstrated that this method could identify and quantify proteins over , 99 orders of magnitude. Speicher and co-workers have utilized a different approach to studying the plasma proteome, which consists of immunodepletion of abundant proteins followed, by isoelectric focusing and one size dimensional electrophoreses separation. 100 The proteins are then subjected to in-gel digestion and analyzed by LC-MS/MS. This strategy resulted in identification of a number of low-abundance proteins (low ng/ml - pg/ml range). 100 The drawback of these two platforms is that they require large amount of plasma for sample fractionation and they are labor intensive. Therefore, these techniques are only useful for the first stage of biomarker discovery. 47

48 An alternative approach to plasma fractionation is to enrich subproteome such as glycoproteome, phospoproteome, or cysteinal-subproteome, which will reduce sample complexity and enhance the identification of medium and low abundant proteins using affinity chromatography methods. 94 Enrichment of the cysteinal subproteome uses an affinity method that targets cystine containing peptides using a thiol resin. This approach could be coupled with different fractionation techniques such as strong cation exchange prior to MC-MS/MS analysis. 94 Protein phosphorylation plays an important role in complex formation, degradation, and protein localization. 101 The methods used for enrichment of phosphoproteins include either immunoaffinity chromatography (IMAC) using immobilized metals such Fe 3+, Ga 3+, Al 3+ or Zr 4+, or the use of titanium dioxide columns followed by mass spectrometry Plasma Glycoproteomics The most abundant subproteome of human plasma is the glycoproteome. Glycoproteins are involved and play major roles in a myriad of cellular and biological functions, including immune defense, cell growth and differentiation, cellcell adhesion and others. 8 It is believed than more than half of the mammalian human proteome is glycosylated. 8 Changes in protein glycosylation patterns have been associated with a variety of disease including cancer and autoimmune disorders. 9 Currently, several glycoproteins are being used in clinical diagnostic in cancer, including mucin glycoprotein (e.g. CA125, CA19-9, CA15-3). Glycosylation is the covalent attachment of a monosaccharide to the N of asparagines (N-linked) or attachment of a monosaccharide to the O atom of serine 48

49 and threonine (O-linked). 106 Consensus sequences exists which predict where N- glycosylation can occur Asn-Xaa-Ser, -Asb-Xaa-Thr-, or Asn-Xaa-Cys where Xaa can be any amino acid residue but proline. 106 There are no consensus sequences for O-linked glycosylation, a major reason why O-linked analysis is significantly more challenging. N-linked oligosaccharides have common core structure of five sugars and differ in their branching. N-linked oligosacharide are classified into three main categories: high mannose, complex, and hybrid. 106 Other known glycosylation include C-glycosylation that occurs on tryptophan residues, S-linked glycosylation through a sulfur atom on cysteine or methionine. 107,108 The role played by carbohydrate moiety of glycoprotein includes stabilization of the protein structure; protection from degradation by proteases, control the protein solubility, and protein half-life in the blood. Gycosylation is found to change the hydrophobicity of the protein, due to the influence of the sugar on the solvation of the peptide. 109 A comprehensive analysis of glycoproteins in complex biological samples consists of the enrichment of glycoprotein or glycopeptides, multidimensional protein or peptide separation, tandem mass spectrometric analysis, and bioinformatics data interpretation. Enrichment of glycoproteins in human plasma is analytically challenging due to enormous complexity of the protein and dynamic range of protein concentration in sample. 8 The two approaches most widely used for glycoprotein enrichment in biological samples are hydrazide chemistry-based solid phase extraction methods and lectin affinity chromatography based enrichment methods. 49

50 The hydrazide chemistry-based method has been applied to isolate glycoproteins through the binding of the glycan to beads which are derivatized with hydrazide group. 110 In this approach the carbohydrate present on the glycoprotein is first oxidized to form and aldehyde group, which sequentially reacts with the hydrazide groups. The glycans are cleaved from the protein by using PNGase F enzyme; this enzyme will specifically release an N-glycosylated protein or peptide from its corresponding sacharide group. 111 study human plasma N-glycoproteome. 110 This approach has been widely used to Smith and co-works have combined the enrichment of N-glycopetides with the immunoaffity depletion of abundant proteins and strong cation exchanges followed by LC-MS/MS. Using this platform, various low abundance proteins were identified ranging from low µg/ml to pg/ml concentration. 111 The main disadvantage of this approach is that the information regarding the glycan structure or site has been lost during the glycan release by the enzymatic cleavage. A more favorable approach used in glycoproteomic for enrichment of glycoproteins from biological sample is lectin chromatography Lectin Affinity Chromatography for Analysis of Human Plasma Glycoproteome Lectin affinity chromatography has been a major analytical tool in the glycoproteomics field, due to affinity of lectins for carbohydrate structures. Lectins are found in plants, animal, and microorganisms and have different affinities for different glycan motif. 112 Table 1 gives a summary of the most frequent used lectins and their specificity. 113 The binding affinity of glycans to lectins ( M - 1 ) is lower than the affinity of the antigen antibody interaction ( M -1 ) 114,

51 This property is advantageous for affinity chromatography, since elution of adsorbed proteins is more efficient and recoveries of bound proteins are generally robust. Several laboratories have reported the use of lectins for clinical samples; differences in lectin binding patterns have been associated with possible differences in glycosylation in disease samples. Lectin affinity chromatography (LAC) has been used for decades to purify glycoprotein from complex mixtures. A lectin affinity column is prepared by first immobilizing the lectins to a solid support, 116, 117 early application used agarose as a solid support. Agarose is still widely used in glycoproteomics, due to low cost and ease of use. The main disadvantage of agarose support is that it cannot withstand high pressure, thus the enrichment of the glycoproteins needs to be performed under gravity. Recently, high-pressure lectin columns have been produced; previous work by Novotny et al and Regnier et al showed that lectins could be bound to silica supports to produce lectin columns 118, 119 with good chromatographic properties. However, there are ph limitations associated with silica supports and non specific absorption can be a problem. Polymeric supports represent perhaps the best alternative, providing a stable surface with low reactivity and high protein recoveries. The work described in Chapter 2 will report on the development and characterization of lectins immobilized to polymeric solid supports as well as evaluation of the chromatographic properties for applications in plasma proteomics/glycoproteomics Table 1. Some of the most frequently used lectins for affinity chromatography of glycoproteins and oligosaccharides (Adapted from reference 95) 51

52 Trivial abbreviation Concanavalin A Pea Lectin RCA-1 L4PHA E4PHA SNA LTA Lotus Lectin UEA-I HPA Jacalin Lectin PNA AAL MAH, MAA, MAL Source of lectin Canavalia ensiformis Pisum sativum Ricinus Communis agglutinin Phaseolous vulgaris leukoagglutinin Phaseolous vulgaris erytroagglutinin Sambucus nigra agglutinin Tetragonobulus purpureas agglutinin Ulex europaeus I Helix pomatria agglutinin Artocarpus integrioflia Arachis hypogaea (peanut) Aleuria aurantia Maackia amurensis Monosaccharide specificity α-d-mannose α-d-mannose β-d-galactose β-d-galactose β-d-galactose B-D-Galactose α -D-Fucose α-d-fucose N-α-acetyl-D-galactosamine α-d-galactose β-gal(1 3)galNAc α-d-fucose Sialic Acid, defined trisaccharide structure Hapten sugar α-methyl-d-mannoside α-methyl-d-mannoside Lactose N-Acetyl-D-galactosamine N-Acetyl-D-galactosamine Lactose Fucose Fucose N-Acetyl-D-galactosamine α-d-methylgalactoside Galactose L-Fucose Lactose Comments on binding Low binding to bi-antennal glycopeptide; Bi- and tri-antennal glycopeptide containing L-fucose in the glycan core Glycopeptides with β-(1 4)- linked terminal galactosyl residues Tri- and tetra-antennal types of glycopeptides Bi- and tri-antennal types of glycopeptides and oligosaccharides Sialic acid containing glycopeptides with NeuAc-α(2 6)Gal terminal sequences Glycopeptides with fucose residues at the outer part of the glycan Oligosaccharides with outer α-lfucose residues N- and O-linked oligosaccharides with terminal α-galnac with high affinity Glycopeptides and oligosaccharides with O-αgalactosyl linked terminals Glycopeptides and oligosaccharides with O-βgalactosyl linked terminals Fucose-linked (α-1,6) to N- acetylglucosamine or fucose-linked (α-1,3) to N-acetyllactosamine MAH II appears to bind only particular carbohydrate structures that contain sialic acid or a defined trisaccharide structure 52

53 In lectin affinity chromatography, a complex protein mixture is applied to an immobilized lectin bed, where the proteins with no affinity towards the ligand will be washed out, while the bound glycoproteins could be eluted either by competitive dissorption or using a low ph elution buffer. Lectin affinity in plasma has been performed either in single, serial or multi lectin approaches. Novotny and coworkers used concavalin A (ConA) to enrich high mannose glycoconjugates from complex mixtures in combination with immunodepletion affinity. 120 ConA is the most commonly used and well characterized lectin. ConA at neutral ph exists as a tetramer of 26,000 Da subunits. ConA binds to high mannose sugars through a complex created from the metal ions Mn 2+ or Ca 2+ binding with the hydroxyl groups of the sugar. Therefore, ConA requires Mn 2+ or Ca 2+ for sacharide binding. These metal ions will bind to the carboxyl group of the side chain of glutamic acid and aspartic acid of ConA lectin, protonation at low ph of these amino acid residues will remove the metal and change the conformation of the binding cavities. Therefore, a neutral ph condition is important to maintain the sacharide binding 121, 122 activity of ConA. For a more complete capture of the glycoproteome, serial lectin affinity (S- LAC) was introduced. 123 As indicated by its name, S-LAC involves a series of lectin columns with different specificities. The first group to introduce this approach was Endo and co-workers. 124 They used ConA (affinity for high mannose sugars) in the first affinity step, followed by aleuria aurantia (affinity for L-fucose sugars) in the second affinity step, and datura stramonium agglutinin ((ß-1,4) linked N- 124, 125 acetylglucosamine) in the last step. This approach was applied to 53

54 oligosaccharide mixtures. 124 The S-LAC approach was later adopted by Regnier and co-workers and applied to human plasma. 126 In their application, glycopeptides from human plasma were enriched using sambucus nigra agglutin (SNA) and ConA to enrich for N-linked glycoforms. 126 Glycopeptide fractions were then isotopically labeled with heavy and light tags and mixed prior to deglycosylation using PNGase F. The deglycosylated samples were further fractionated on a reversed phase column coupled with ESI-MS The key advantage of SLAC versus single lectin is that a broader range of glycoprotein can be isolated and several glycoprotein subsets can be selected from a complex mixture based on the specificity of lectins. Lectin-monosacharide interactions are weak interactions (millimolar range), thus binding of lectins to carbohydrate may require affinity enhancement. There are several possible ways by which this is achieved: ligand multivalency, by extanding the sugar structure interacting with other areas of protein surface and by the combination of both these approaches. 122 To enhance the affinity of the lectins for the plasma glycoproteome we developed the multi-lectin affinity approach (M-LAC), which uses mixtures of lectins to give rise to multivalent association with plasma glycoproteins, resulting in better capture of the plasma glycoproteome. 8 We reported previously on the use of multi-lectin affinity chromatography (M-LAC) for studying the plasma and/or serum proteome. 8 The originally developed M-LAC media consisted of ConA, jacalin (JAC), and wheat germ agglutin (WGA) lectins cross-linked to a soft agarose support. Each lectin has an affinity for different glycoproteins, ConA recognizes α-mannose residues in the N-glycan core structure. However, ConA has no affinity or very low 54

55 affinity for tri and tetra-antennary N-glycans. 129 Jacalin lectin is isolated from the seeds of jack fruit (Artocarpus integrifolia) and composed of four subunits, two of approximately 10,000 daltons and two of 16,000 daltons each. 130 Jacalin (JAC) has been used to isolate O-glycosidically linked oligosaccharides, preferring the structure galactosyl (ß-1,3) N-acetylgalactosamine. 131 Wheat germ agglutin is isolated from Triticum vulgaris, a 36 kda molecular weight protein consisting of two identical subunits. 132 WGA exhibits affinity for oligosaccharides containing terminal N-acetylglucosamine or chitobiose, structures which are common to many serum 133, 134 and membrane glycoproteins. Previously, our laboratory has demonstrated that M-LAC column could be used to improve the specificity/selectivity of glycoproteins isolated from complex samples with excellent enrichment 135, 136 capabilities. We have also demonstrated that the M-LAC approach could be used to detect changes in glycosylation. 98 In a study by Wang et al the linear ion trap/ft-ms was used to characterize the structural changes in the glycosylation 9, 98 motifs on captured plasma proteins. In addition, Plavina et al demonstrated that the combination of immunodepletion of albumin and IgG prior to M-LAC fractionation/enrichment is an effective method for identifying differential abundance of proteins from disease and healthy control samples in a biomarker discovery study. The workflow of this method is shown in Fig 5. The method consisted of depletion of top 2 abundant proteins, followed by M-LAC fractionation into unbound and bound glycosylated fractions, followed by nanolc-ms/ms analysis of the tryptic digested fractions. This platform resulted in better depth of analysis of the human plasma and enabled the identification of medium abundant 55

56 proteins at the µg/ml level. 136 This platform has been successfully applied to various clinical proteomic biomarker studies for example, Plavina et al applied this platform to discover potential biomarkers for psoriasis. 136 This study found cytoskeleton/cell adhesion, proteases/protease inhibitors, lipoproteins, complement and Ca-binding proteins that were present at significantly different levels in normal and psoriatic samples. 137 Furthermore, this platform was used in other autoimmune studies such as obesity, diabetes, and hypertension, 138 rheumatoid arthritis, 139 and also applied in different cancer studies such as breast cancer. 140, 141 Internal Standard Plasma Removal of Abundant Proteins Quality Control Check-Points Total protein recovery Multi-Lectin Column (M-LAC) Total protein recovery Non-Bound Proteins Glycosylated Proteins Trypsin Digestion Peptide Fractionation Peak area at 280/214 nm Figure 5. A schematic workflow of plasma analysis. (Adapted from reference 129) 1D nanolc-ms/ms (LTQ MS) Retention time data-dependent acquisition This platform was shown to be reproducible, however low protein recoveries Quantitation of Data Analysis Label-Free Quantitation internal standard (75%) were obtain due to sample manipulation. In Chapter 2, we discuss the migration from the agarose support to high performance solid support, as well as the automation of this platform. In Chapter 3 and 4, we describe application of the automated platform to autoimmune and metabolic diseases Plasma Peptidomics 56

57 The low molecular weight (LMW) proteins of plasma or serum are referred to as the plasma or serum peptidome. The LMW could be either small proteins (15 kda), molecules, such as hormones, cytokines, and growth factors, also peptides that are released from larger protein precursors during protein processing or result from degradation of protein precursor by proteolytic activities. 142 Human blood is attractive for diagnostic purposes; however the use of peptidomics markers for diagnostic purposes is controversial. The major problem in peptidomics is the stability of the samples, sample collection, processing, storage, etc. 143 Another concern is related to the sample type, whether to use plasma or serum for peptidomic analysis. 143 Some groups have suggested that serum is preferential to plasma for peptidomics, 143 however, when a study was performed by Human Proteome Organization (HUPO) to attempt to describe the difference between plasma and serum, a high number of peptides were identified only in the serum samples and not detectable in the plasma. 18 These findings were not surprising, since the coagulation process that leads to blood clotting is a multiprotease cascade, cleaving at each step many proteins. 18 During the preparation of serum 18, 142, 144 from blood many artificial processes occur. Many investigators perform their peptidomic analysis in plasma samples. When preparing plasma, the proteolytic cascade is suppressed, therefore only endogenous proteolytic activity is detected. 18 Schulz-Knappe and co-workers showed that different inhibitors added to the plasma led to different peptide patterns. 145 They showed that plateletdepletion of plasma avoids platelets activation with release of proteins, prepared 57

58 by gentle removal of the platelets, was strongly recommended for peptidomic analysis. 145 Several strategies have been developed to study LMW proteome of plasma including, reversed phase, precipitation, and ultra filtration. The first study of plasma peptidome was performed by Petricoin et al. using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS). The study identified peptide patterns that were different from ovarian carcinoma and matched controls Villanueva and co-workers employed reversed-phase supermagnetic C 8 silica beads to capture peptides from the serum and analyze them by MALDI-TOF MS. They detected a total of 400 peptides from 50 µl of plasma. The main disadvantage of this method is the specificity of the binding to C 8 beads is very low A different approach was utilized by Tirumalai et al and coworkers, this method employed ultracentrifugation using solvent to disrupt protein-protein interaction so that the LMW component can be free to pass through the molecular weight cutoff filters, the LMW components were then subjected to trypsing digestion and SCX fractionation prior to LC-MS/MS. 152 This approach resulted in identification of 341 human plasma proteins. 152 The Tirumalai approach was adapted and modified by our group, briefly, the sample is diluted with 20% acetonitrile to break down protein-protein interaction and then subjected to ultrafiltration using filters with 10 kda molecular weight cutoff and cleaned-up prior to analysis by nanolc-ft-ms/ms. The main advantage of this approach compare to the Tirumalai approach is the elimination of the enzymatic digestion step, which allowed measurements of solely endogenous 58

59 proteolytic activity. 153 This method was successfully applied to different biomarker studies in our lab, and used in the biomarker discovery study described in Chaper Validation of Results The word validation has completely different connotations depending on whether one is a mass spectrometrist, a medical researcher, or a clinical chemist. During the last years only 5 protein markers have been approved by Food and Drug Administration for measurements in plasma or serum samples. 154 This is due to high false discovery rates in proteomics methods, combined with the lack of robust methods for verification of biomarkers in large clinical sample sets. 155 In principle, antibody based measurement are the best assay to use in the clinic; however, developing new antibody assay is both very expensive, time consuming (1-1.5 years), antibodies may lack specificity and multiple antibodies are typically required. The proteomics data presented in this thesis were verified by ELISA and Western blot, a workflow of our data analysis is shown in Fig 6. Nevertheless, a brief overview of other validation techniques is provided below. LC/MS identifications Selection of proteins of interest Strictly Filtered IDs Aggregate replicate data sets Manual validation of peptide(s) identification; perform peak area quantitation Remove one-hit wonders Validation with ELISA/Western blot Quantitative comparison of protein lists based on spectral counts 59

60 Figure 6. Data Analysis Workflow for Biomarker Discovery Validation of markers using mass-spectrometric based methods has emerged for verification of biomarker discovery using stable isotope dilution multiple reaction-monitoring mass spectrometry. 156 This approach is based on selecting a few peptides per candidate protein and a synthetic stable isotope labeled analogue of each peptide is used as an internal standard. 156 This way, the protein concentration is measured by comparing the signal from the labeled peptide and the unlabeled peptide. This approach has some advantages over the antibody-based assays, such as ELISA and Western blotting; the analyte detected by mass spectrometer can be characterized, multiplexing capabilities meaning several proteins can be analyzed in one run and small amounts of plasma are required for all measurements. 156 A different approach used for verification of candidate markers is stable isotope standards with capture by antipeptide antibodies (SISCAPA). 150 This approach combines the advantage of specific immunoaffinty enrichment and targeted MRM approach. In SISCAPA, the peptides are selected from the discovery phase, and antipeptide antibodies are generated against these peptides. After the tryptic digestion of the plasma, a known amount of stable isotope peptide is added, both the added peptide and sample derived peptides are enriched and 157, 158 the amount is measured by multiple reaction monitoring (MRM). Here, enzyme-linked immunosorbent assay (ELISA) was used for verification of proteins of interest. ELISA is the most commonly used approach for verification and clinical validation, the advantages of ELISA assays are; high 60

61 sensitivity, specificity, permitting the quantification of proteins in human plasma at 159, 160 low concentration, level picograms/ml range. Validation of biomarkers in crude plasma is accomplished by using the sandwich ELISA method. In this assay, the absolute amount of the antigen in an unknown sample down to low pg/ml can be determined. This assay requires two antibodies; the first antibody is bound to a microtiter well plate, and then blocked with a protein solution to minimize any nonspecific binding. Following blocking, the antigen (sample) is added and incubated to allow binding with the capture antibody. The unbound proteins are then washed off before adding the labeled secondary antibody. The antigen-antibody complex is then detected with a conjugate antibody against the second antibody species and quantitated through the use of colorimetric substrate (Fig 7). The concentration of the antigen in the samples is determined from the graphic interpretation of the calibration curve constructed from the values obtained 161, 162 from standards. However, there are few disadvantages of this assay such as availability to antibodies, multiplexing, and development of the antibodies. When ELISA development is not feasible, Western blotting may serve as an alternative method to verify candidate markers. Biotin Labeled Detection Antibody Avidin HRP TMB Substrate Antigen Colored Product Figure 7. Illustration of Sandwich ELISA Method Capture Antibody 61

62 Western blotting is a widely used analytical technique to detect a known protein in a complex mixture and to obtain qualitative and semi-quantitative data about the protein of interest. In Western blotting, a complex sample is run on SDS- PAGE to separate protein in the mixture based on size. Then the separated proteins are transferred from the gel to a thin membrane (nitrocellulose membrane or PVDF) using electrophoresis. The membrane is then incubated with a blocking solution to minimize any nonspecific binding. Following the blocking, the membrane is incubated with the primary antibody against the protein of interest. This is followed by detection of the bound anti-protein antibody with an antiimmunoglobulin antibody conjugated to an enzyme such as horseradish peroxidase Background on Proteomics Research in Specified Disease Areas Multiple Sclerosis Multiple sclerosis (MS) is an autoimmune disease that affects the brain and spinal cord (central nervous system). MS is a chronic disease characterized by multiple areas of inflammatory demyelination, axonal degradation and glial sclerosis. MS has a very heterogeneous clinical presentations and courses. 166 There are three different types of MS, relapsing-remitting MS (RRMS), secondary progressive MS (SPMS), and primary progressive MS (PPMS). Even though MS has been around for a very long time the pathology of the disease is still unclear. Currently, there is no biomarker available for diagnosis of multiple sclerosis. 167 A pressing need exists to identify and develop novel candidate markers for multiple sclerosis. 167 Biomarkers in multiple sclerosis can be divided in different categories, 62

63 such as; biomarkers reflective of antigen-processing and presentation which have been suggested to differentiate between RRMS and SPMS, changes in cellular subpopulation, these markers were shown to have important immunoregulatory role in animal models, biomarkers of axonal/neuronal damage, biomarkers of blood barrier disruption and biomarkers of oxidative stress. 167 Multiple proteomic studies carried out in the last few years have reported potential candidate markers. Proteomic approaches have included 2-DE and peptide finger printing, LC-ESI- MS/MS 168 and SELDI 115. Rithdech et al. studied human plasma of children who met the international definition of pediatric MS. They analyzed human plasma rather than cerebrospinal fluid for several reasons: plasma is easy obtained, is highly sensitive to pathological changes, 169 and disruption of blood barrier is associated with MS. 166 As an initial step, to study the plasma samples from affected children and healthy controls, they applied 2-DE in combination with mass spectrometry. Using this approach, they identified a subset of proteins with significant differences between pediatric MS and healthy controls. 170 In Chapter 3, we describe a study of human plasma of patients affected with multiple sclerosis by applying our proteomic and peptidomic methods Obesity, Diabetes, and Gastric Bypass Surgery Obesity has emerged as one of the major global epidemics and is now reaching alarming proportions world wide. 171 Obesity is a complex disease but specifically refers to the excess amount of fat in the body. Obesity patients have a decreased quality of life and are at higher risk to develop diseases such as diabetes mellitus and cardiovascular disease. 171 Most of the obesity proteomic 63

64 studies reported to date have been focused on protein profiling of adipose tissue and adipocyte secretome. 172 Diabetes mellitus is one of the most common metabolic syndrome diseases, in which majority of the patients are type 2 diabetes. Type 2 diabetes is characterized by hyperglycemia due to disturbance in insulin function. 172 Numerous studies have been undertaken to identify biomarkers for type 2 diabetes most of the studies were concentrated on the genetic defects and protein level changes in the progression of disease. 172 In these studies, the most commonly applied proteomic technologies were, 2-DE and MALDI-TOF MS. 172 We have previously reported on applying depletion combined with M-LAC for the analysis of 138, 171 human plasma of diabetic patients, obese, and patients with hypertension. Currently, the only durable treatment for both excess body weight and medical conditions associated with obesity is bariatic surgery. Gastric bypass surgery can be performed in different ways (i) by creation of a small thumb sized pouch from the upper stomach, re-arranged by bypass of the remaining stomach; (ii) re-contracting the GI tract to allow bile and pancreatic enzymes to enter the esophagus from the small intestial. 173 A rare side effect of the surgery on obese patients is that patients develop a post operative hypoglycemia condition after 1-2 years of the surgery. Four out of five diabetic patients that undergo the gastric bypass surgery have a full recovery from diabetes. 174 From these observations it was believed that insulin resistant is no longer the cause of diabetes but rather diabetes is a defense mechanism. Proteomic technologies have been used to 171, 174 understand obesity and diabetes followed by gastric bypass surgery. 64

65 Chapter 4 describes a study designed by Johnson & Johnson, where 13 Caucasian women, 8 diabetic and 5 non-diabetic patients between of ages and BMI score of went under gastric bypass surgery. This study was a longitudinal study; the blood was withdrawn from the patients 2-4 weeks before surgery, 7-12 days following the surgery, and three months after the surgery. The main goals of the study were to identify novel biomarkers in plasma samples that correlated obese non-diabetic patients with obese diabetic patients, prior and post gastric bypass surgery. 1.5 Conclusions In summary, this thesis focused on developing and optimizing methods to address some of the challenges of clinical proteomics, such as sample handling, manipulation, and method automation. In Chapter two we describe a prefractionation/enrichment high performance M-LAC platform coupled with nanolc-ms/ms to improve sample preparation in proteomic analysis by minimizing sample handling and manipulation. The M-LAC has been proven to be able to be able to improve the dynamic range, to capture the glycoproteome reproducible and with high recoveries. Using this technology various studies have been performed by our group to to identify marker candidate in multiple sclerosis patients (Chapter three), and in type II diabetic obese patients undergoing gastric bypass surgery (Chapter four). In these studies, we have identified potential and attempted to verify their changes in abundance by immuno-based assay techniques. 65

66 1.6 References 1. Stein, L. D., Human genome: end of the beginning. Nature 2004, 431, (7011), Wilkins, M. R.; Pasquali, C.; Appel, R. D.; Ou, K.; Golaz, O.; Sanchez, J. C.; Yan, J. X.; Gooley, A. A.; Hughes, G.; Humphery-Smith, I.; Williams, K. L.; Hochstrasser, D. F., From proteins to proteomes: large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Biotechnology (N Y) 1996, 14, (1), W., S. D., Proteome Analysis: Interpreting the Genome. Elsevier B. V.: 2004; Vol. 1, p Mallick, P.; Kuster, B., Proteomics: a pragmatic perspective. Nat Biotechnol 28, (7), Matt, P.; Carrel, T.; White, M.; Lefkovits, I.; Van Eyk, J., Proteomics in cardiovascular surgery. J Thorac Cardiovasc Surg 2007, 133, (1), Meng, Z.; Veenstra, T. D., Proteomic analysis of serum, plasma, and lymph for the identification of biomarkers. Proteomics Clin Appl 2007, 1, (8), Anderson, N. L.; Anderson, N. G., The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 2002, 1, (11), Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. J Chromatogr A 2004, 1053, (1-2), Wang, Y.; Wu, S. L.; Hancock, W. S., Approaches to the study of N-linked glycoproteins in human plasma using lectin affinity chromatography and nano- HPLC coupled to electrospray linear ion trap--fourier transform mass spectrometry. Glycobiology 2006, 16, (6), Apweiler, R.; Aslanidis, C.; Deufel, T.; Gerstner, A.; Hansen, J.; Hochstrasser, D.; Kellner, R.; Kubicek, M.; Lottspeich, F.; Maser, E.; Mewes, H. W.; Meyer, H. E.; Mullner, S.; Mutter, W.; Neumaier, M.; Nollau, P.; Nothwang, H. G.; Ponten, F.; Radbruch, A.; Reinert, K.; Rothe, G.; Stockinger, H.; Tarnok, A.; Taussig, M. J.; Thiel, A.; Thiery, J.; Ueffing, M.; Valet, G.; Vandekerckhove, J.; Verhuven, W.; Wagener, C.; Wagner, O.; Schmitz, G., Approaching clinical proteomics: current state and future fields of application in fluid proteomics. Clin Chem Lab Med 2009, 47, (6),

67 11. Naylor, S., Biomarkers: current perspectives and future prospects. Expert Rev Mol Diagn 2003, 3, (5), Ariazi, E. A.; Ariazi, J. L.; Cordera, F.; Jordan, V. C., Estrogen receptors as therapeutic targets in breast cancer. Curr Top Med Chem 2006, 6, (3), Donegan, W. L., Tumor-related prognostic factors for breast cancer. CA Cancer J Clin 1997, 47, (1), Jacobs, J. M.; Adkins, J. N.; Qian, W. J.; Liu, T.; Shen, Y.; Camp, D. G., 2nd; Smith, R. D., Utilizing human blood plasma for proteomic biomarker discovery. J Proteome Res 2005, 4, (4), Azad, N. S.; Rasool, N.; Annunziata, C. M.; Minasian, L.; Whiteley, G.; Kohn, E. C., Proteomics in clinical trials and practice: present uses and future promise. Mol Cell Proteomics 2006, 5, (10), Veenstra, T. D., Global and targeted quantitative proteomics for biomarker discovery. J Chromatogr B Analyt Technol Biomed Life Sci 2007, 847, (1), Schiess, R.; Wollscheid, B.; Aebersold, R., Targeted proteomic strategy for clinical biomarker discovery. Mol Oncol 2009, 3, (1), Omenn, G. S.; States, D. J.; Adamski, M.; Blackwell, T. W.; Menon, R.; Hermjakob, H.; Apweiler, R.; Haab, B. B.; Simpson, R. J.; Eddes, J. S.; Kapp, E. A.; Moritz, R. L.; Chan, D. W.; Rai, A. J.; Admon, A.; Aebersold, R.; Eng, J.; Hancock, W. S.; Hefta, S. A.; Meyer, H.; Paik, Y. K.; Yoo, J. S.; Ping, P.; Pounds, J.; Adkins, J.; Qian, X.; Wang, R.; Wasinger, V.; Wu, C. Y.; Zhao, X.; Zeng, R.; Archakov, A.; Tsugita, A.; Beer, I.; Pandey, A.; Pisano, M.; Andrews, P.; Tammen, H.; Speicher, D. W.; Hanash, S. M., Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publiclyavailable database. Proteomics 2005, 5, (13), Tammen, H.; Schulte, I.; Hess, R.; Menzel, C.; Kellmann, M.; Mohring, T.; Schulz-Knappe, P., Peptidomic analysis of human blood specimens: comparison between plasma specimens and serum by differential peptide display. Proteomics 2005, 5, (13), Steyn, J. M.; Muller, F. O., Comparative effects of two commercially available blood collection tubes on plasma concentrations of drugs. S Afr Med J 1980, 57, (4), Raijmakers, M. T.; Menting, C. H.; Vader, H. L.; van der Graaf, F., Collection of blood specimens by venipuncture for plasma-based coagulation assays: necessity of a discard tube. Am J Clin Pathol 133, (2),

68 22. D'Alesandro, M. M.; Gruber, D. F.; Reed, H. L.; O'Halloran, K. P.; Robertson, R., Effects of collection methods and storage on the in vitro stability of canine plasma catecholamines. Am J Vet Res 1990, 51, (2), Nissum, M.; Foucher, A. L., Analysis of human plasma proteins: a focus on sample collection and separation using free-flow electrophoresis. Expert Rev Proteomics 2008, 5, (4), Zakaria, M.; Brown, P. R., Investigation of clinical methodology for sample collection and processing prior to the reversed-phase liquid chromatographic determination of UV-absorbing plasma constituents. Anal Biochem 1982, 120, (1), Fortis, F.; Guerrier, L.; Areces, L. B.; Antonioli, P.; Hayes, T.; Carrick, K.; Hammond, D.; Boschetti, E.; Righetti, P. G., A new approach for the detection and identification of protein impurities using combinatorial solid phase ligand libraries. J Proteome Res 2006, 5, (10), Fortis, F.; Guerrier, L.; Righetti, P. G.; Antonioli, P.; Boschetti, E., A new approach for the removal of protein impurities from purified biologicals using combinatorial solid-phase ligand libraries. Electrophoresis 2006, 27, (15), Guerrier, L.; Thulasiraman, V.; Castagna, A.; Fortis, F.; Lin, S.; Lomas, L.; Righetti, P. G.; Boschetti, E., Reducing protein concentration range of biological samples using solid-phase ligand libraries. J Chromatogr B Analyt Technol Biomed Life Sci 2006, 833, (1), Wang, H.; Clouthier, S. G.; Galchev, V.; Misek, D. E.; Duffner, U.; Min, C. K.; Zhao, R.; Tra, J.; Omenn, G. S.; Ferrara, J. L.; Hanash, S. M., Intact-proteinbased high-resolution three-dimensional quantitative analysis system for proteome profiling of biological fluids. Mol Cell Proteomics 2005, 4, (5), Celis, J. E.; Rasmussen, H. H.; Olsen, E.; Madsen, P.; Leffers, H.; Honore, B.; Dejgaard, K.; Gromov, P.; Hoffmann, H. J.; Nielsen, M.; et al., The human keratinocyte two-dimensional gel protein database: update Electrophoresis 1993, 14, (11), Appel, R. D.; Sanchez, J. C.; Bairoch, A.; Golaz, O.; Miu, M.; Vargas, J. R.; Hochstrasser, D. F., SWISS-2DPAGE: a database of two-dimensional gel electrophoresis images. Electrophoresis 1993, 14, (11), Schrattenholz, A.; Groebe, K., What does it need to be a biomarker? Relationships between resolution, differential quantification and statistical validation of protein surrogate biomarkers. Electrophoresis 2007, 28, (12),

69 32. Henzel, W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C.; Grimley, C.; Watanabe, C., Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc Natl Acad Sci U S A 1993, 90, (11), Unlu, M.; Morgan, M. E.; Minden, J. S., Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 1997, 18, (11), Baker, M. A.; Hetherington, L.; Aitken, R. J., Identification of SRC as a key PKA-stimulated tyrosine kinase involved in the capacitation-associated hyperactivation of murine spermatozoa. J Cell Sci 2006, 119, (Pt 15), Bergen, H. R., 3rd; Muddiman, D. C.; O'Brien, J. F.; Hoyer, J. D., Normalization of relative peptide ratios derived from in-gel digests: applications to protein variant analysis at the peptide level. Rapid Commun Mass Spectrom 2005, 19, (19), Peng, J.; Gygi, S. P., Proteomics: the move to mixtures. J Mass Spectrom 2001, 36, (10), Paweletz, C. P.; Trock, B.; Pennanen, M.; Tsangaris, T.; Magnant, C.; Liotta, L. A.; Petricoin, E. F., 3rd, Proteomic patterns of nipple aspirate fluids obtained by SELDI-TOF: potential for new biomarkers to aid in the diagnosis of breast cancer. Dis Markers 2001, 17, (4), Emmert-Buck, M. R.; Gillespie, J. W.; Paweletz, C. P.; Ornstein, D. K.; Basrur, V.; Appella, E.; Wang, Q. H.; Huang, J.; Hu, N.; Taylor, P.; Petricoin, E. F., 3rd, An approach to proteomic analysis of human tumors. Mol Carcinog 2000, 27, (3), Kelleher, N. L., Top-down proteomics. Anal Chem 2004, 76, (11), 197A- 203A. 40. Gygi, S. P.; Han, D. K.; Gingras, A. C.; Sonenberg, N.; Aebersold, R., Protein analysis by mass spectrometry and sequence database searching: tools for cancer research in the post-genomic era. Electrophoresis 1999, 20, (2), Yates, N.; Wislocki, D.; Roberts, A.; Berk, S.; Klatt, T.; Shen, D. M.; Willoughby, C.; Rosauer, K.; Chapman, K.; Griffin, P., Mass spectrometry screening of combinatorial mixtures, correlation of measured and predicted electrospray ionization spectra. Anal Chem 2001, 73, (13),

70 43. McLafferty, R. S. G. a. F. W., Early gas chromatography/mass spectrometry J. Am. Soc. Mass Spectrom. 1993, 4, (5), Glish, G. L.; Vachet, R. W., The basics of mass spectrometry in the twentyfirst century. Nat Rev Drug Discov 2003, 2, (2), University, B., ESI Mechanism. In. 46. McLuckey, S. A.; Goeringer, D. E.; Glish, G. L., Collisional activation with random noise in ion trap mass spectrometry. Anal Chem 1992, 64, (13), Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M., Electrospray ionization for mass spectrometry of large biomolecules. Science 1989, 246, (4926), Cech, N. B.; Enke, C. G., Practical implications of some recent studies in electrospray ionization fundamentals. Mass Spectrom Rev 2001, 20, (6), Daniel C. Taflin, T. L. W., E. James Davis, Electrified droplet fission and the Rayleigh limit. Langmuir 1989, 5, (2), Sparkman, J. T. W. O. D., The mass Spectrometer, in Introduction to Mass Spectrometry. John Wiley & Sons, Ltd: West Sussex: 2007; p Dodbiba, E.; Xu, C.; Payagala, T.; Wanigasekara, E.; Moon, M. H.; Armstrong, D. W., Use of ion pairing reagents for sensitive detection and separation of phospholipids in the positive ion mode LC-ESI-MS. Analyst 136, (8), McLuckey, S. A.; Wells, J. M., Mass analysis at the advent of the 21st century. Chem Rev 2001, 101, (2), Ion Trap Mass Spectrometer. In Ed. 54. Haas, W.; Faherty, B. K.; Gerber, S. A.; Elias, J. E.; Beausoleil, S. A.; Bakalarski, C. E.; Li, X.; Villen, J.; Gygi, S. P., Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol Cell Proteomics 2006, 5, (7), Winger, B. E.; Campana, J. E., Characterization of combinatorial peptide libraries by electrospray ionization Fourier transform mass spectrometry. Rapid Commun Mass Spectrom 1996, 10, (14),

71 56. Senko, M. W.; Canterbury, J. D.; Guan, S.; Marshall, A. G., A highperformance modular data system for Fourier transform ion cyclotron resonance mass spectrometry. Rapid Commun Mass Spectrom 1996, 10, (14), Comisarow, M. B.; Marshall, A. G., The early development of Fourier transform ion cyclotron resonance (FT-ICR) spectroscopy. J Mass Spectrom 1996, 31, (6), Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S., Fourier transform ion cyclotron resonance mass spectrometry: a primer. Mass Spectrom Rev 1998, 17, (1), Website, F. I Aebersold, R.; Mann, M., Mass spectrometry-based proteomics. Nature 2003, 422, (6928), Cantin, G. T.; Venable, J. D.; Cociorva, D.; Yates, J. R., 3rd, Quantitative phosphoproteomic analysis of the tumor necrosis factor pathway. J Proteome Res 2006, 5, (1), Adam, B. L.; Qu, Y.; Davis, J. W.; Ward, M. D.; Clements, M. A.; Cazares, L. H.; Semmes, O. J.; Schellhammer, P. F.; Yasui, Y.; Feng, Z.; Wright, G. L., Jr., Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Res 2002, 62, (13), Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R., Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 2002, 74, (20), Tabb, D. L.; MacCoss, M. J.; Wu, C. C.; Anderson, S. D.; Yates, J. R., 3rd, Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. Anal Chem 2003, 75, (10), Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R., A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 2003, 75, (17), Rauch, A.; Bellew, M.; Eng, J.; Fitzgibbon, M.; Holzman, T.; Hussey, P.; Igra, M.; Maclean, B.; Lin, C. W.; Detter, A.; Fang, R.; Faca, V.; Gafken, P.; Zhang, H.; Whiteaker, J.; States, D.; Hanash, S.; Paulovich, A.; McIntosh, M. W., Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J Proteome Res 2006, 5, (1),

72 67. Cottingham, K., CPAS: a proteomics data management system for the masses. J Proteome Res 2006, 5, (1), Bantscheff, M.; Schirle, M.; Sweetman, G.; Rick, J.; Kuster, B., Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem 2007, 389, (4), Everley, P. A.; Krijgsveld, J.; Zetter, B. R.; Gygi, S. P., Quantitative cancer proteomics: stable isotope labeling with amino acids in cell culture (SILAC) as a tool for prostate cancer research. Mol Cell Proteomics 2004, 3, (7), Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M., Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 2002, 1, (5), Li, J.; Steen, H.; Gygi, S. P., Protein profiling with cleavable isotope-coded affinity tag (cicat) reagents: the yeast salinity stress response. Mol Cell Proteomics 2003, 2, (11), Overgaard, A. J.; Thingholm, T. E.; Larsen, M. R.; Tarnow, L.; Rossing, P.; McGuire, J. N.; Pociot, F., Quantitative itraq-based Proteomic Identification of Candidate Biomarkers for Diabetic Nephropathy in Plasma of Type 1 Diabetic Patients. Clin Proteomics 6, (4), Sun, C. Y.; Xia, G. W.; Xu, K.; Ding, Q., [Application of itraq in proteomic study of prostate cancer]. Zhonghua Nan Ke Xue 16, (8), Unwin, R. D., Quantification of proteins by itraq. Methods Mol Biol 658, Gerber, S. A.; Rush, J.; Stemman, O.; Kirschner, M. W.; Gygi, S. P., Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci U S A 2003, 100, (12), Kettenbach, A. N.; Rush, J.; Gerber, S. A., Absolute quantification of protein and post-translational modification abundance with stable isotope-labeled synthetic peptides. Nat Protoc 6, (2), Froehlich, J. W.; Chu, C. S.; Tang, N.; Waddell, K.; Grimm, R.; Lebrilla, C. B., Label-free liquid chromatography-tandem mass spectrometry analysis with automated phosphopeptide enrichment reveals dynamic human milk protein phosphorylation during lactation. Anal Biochem 408, (1), Voyksner, R. D.; Lee, H., Investigating the use of an octupole ion guide for ion storage and high-pass mass filtering to improve the quantitative performance of 72

73 electrospray ion trap mass spectrometry. Rapid Commun Mass Spectrom 1999, 13, (14), Chelius, D.; Bondarenko, P. V., Quantitative profiling of proteins in complex mixtures using liquid chromatography and mass spectrometry. J Proteome Res 2002, 1, (4), Fatima, N.; Chelius, D.; Luke, B. T.; Yi, M.; Zhang, T.; Stauffer, S.; Stephens, R.; Lynch, P.; Miller, K.; Guszczynski, T.; Boring, D.; Greenwald, P.; Ali, I. U., Label-free global serum proteomic profiling reveals novel celecoxibmodulated proteins in familial adenomatous polyposis patients. Cancer Genomics Proteomics 2009, 6, (1), Liu, H.; Sadygov, R. G.; Yates, J. R., 3rd, A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 2004, 76, (14), Dicker, L.; Lin, X.; Ivanov, A. R., Increased power for the analysis of labelfree LC-MS/MS proteomics data by combining spectral counts and peptide peak attributes. Mol Cell Proteomics 9, (12), Sadygov, R. G.; Liu, H.; Yates, J. R., Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases. Anal Chem 2004, 76, (6), Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G., Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteomics 2005, 4, (10), Issaq, H. J.; Xiao, Z.; Veenstra, T. D., Serum and plasma proteomics. Chem Rev 2007, 107, (8), Steel, L. F.; Trotter, M. G.; Nakajima, P. B.; Mattu, T. S.; Gonye, G.; Block, T., Efficient and specific removal of albumin from human serum samples. Mol Cell Proteomics 2003, 2, (4), Fu, Q.; Bovenkamp, D. E.; Van Eyk, J. E., A rapid, economical, and reproducible method for human serum delipidation and albumin and IgG removal for proteomic analysis. Methods Mol Biol 2007, 357, Olver, C. S.; Webb, T. L.; Long, L. J.; Scherman, H.; Prenni, J. E., Comparison of methods for depletion of albumin and IgG from equine serum. Vet Clin Pathol 39, (3),

74 89. Fu, Q.; Garnham, C. P.; Elliott, S. T.; Bovenkamp, D. E.; Van Eyk, J. E., A robust, streamlined, and reproducible method for proteomic analysis of serum by delipidation, albumin and IgG depletion, and two-dimensional gel electrophoresis. Proteomics 2005, 5, (10), Sitnikov, D.; Chan, D.; Thibaudeau, E.; Pinard, M.; Hunter, J. M., Protein depletion from blood plasma using a volatile buffer. J Chromatogr B Analyt Technol Biomed Life Sci 2006, 832, (1), Huang, L.; Harvie, G.; Feitelson, J. S.; Gramatikoff, K.; Herold, D. A.; Allen, D. L.; Amunngama, R.; Hagler, R. A.; Pisano, M. R.; Zhang, W. W.; Fang, X., Immunoaffinity separation of plasma proteins by IgY microbeads: meeting the needs of proteomic sample preparation and analysis. Proteomics 2005, 5, (13), Omenn, G. S., Strategies for plasma proteomic profiling of cancers. Proteomics 2006, 6, (20), Petricoin, E. F.; Liotta, L. A., Clinical applications of proteomics. J Nutr 2003, 133, (7 Suppl), 2476S-2484S. 94. Shen, Y.; Jacobs, J. M.; Camp, D. G., 2nd; Fang, R.; Moore, R. J.; Smith, R. D.; Xiao, W.; Davis, R. W.; Tompkins, R. G., Ultra-high-efficiency strong cation exchange LC/RPLC/MS/MS for high dynamic range characterization of the human plasma proteome. Anal Chem 2004, 76, (4), Zhang, J.; Xu, X.; Gao, M.; Yang, P.; Zhang, X., Comparison of 2-D LC and 3-D LC with post- and pre-tryptic-digestion SEC fractionation for proteome analysis of normal human liver tissue. Proteomics 2007, 7, (4), Frazer, G. S.; Bucci, D. M., SDS-PAGE characterization of the proteins in equine seminal plasma. Theriogenology 1996, 46, (4), Jin, Y.; Manabe, T., Performance of agarose IEF gels as the first dimension support for non-denaturing micro-2-de in the separation of high-molecular-mass plasma proteins and protein complexes. Electrophoresis 2009, 30, (6), Yang, Z.; Hancock, W. S., Monitoring glycosylation pattern changes of glycoproteins using multi-lectin affinity chromatography. J Chromatogr A 2005, 1070, (1-2), Wang, H.; Hanash, S., Intact-protein based sample preparation strategies for proteome analysis in combination with mass spectrometry. Mass Spectrom Rev 2005, 24, (3),

75 100. Tang, H. Y.; Ali-Khan, N.; Echan, L. A.; Levenkova, N.; Rux, J. J.; Speicher, D. W., A novel four-dimensional strategy combining protein and peptide separation methods enables detection of low-abundance proteins in human plasma and serum proteomes. Proteomics 2005, 5, (13), Hu, L.; Zhou, H.; Li, Y.; Sun, S.; Guo, L.; Ye, M.; Tian, X.; Gu, J.; Yang, S.; Zou, H., Profiling of endogenous serum phosphorylated peptides by titanium (IV) immobilized mesoporous silica particles enrichment and MALDI-TOFMS detection. Anal Chem 2009, 81, (1), Lv, Y.; Liu, Q.; Bao, X.; Tang, W.; Yang, B.; Guo, S., Identification and characteristics of iron-chelating peptides from soybean protein hydrolysates using IMAC-Fe3+. J Agric Food Chem 2009, 57, (11), Carrascal, M.; Ovelleiro, D.; Casas, V.; Gay, M.; Abian, J., Phosphorylation analysis of primary human T lymphocytes using sequential IMAC and titanium oxide enrichment. J Proteome Res 2008, 7, (12), Corthals, G. L.; Wasinger, V. C.; Goodlett, D. R., Offline Micro-IMAC Enrichment of Phosphoproteins. CSH Protoc 2007, 2007, pdb prot Lindgren, G. E., Immobilized metal affinity chromatography (IMAC). Am Biotechnol Lab 1994, 12, (7), Spiro, R. G., Protein glycosylation: nature, distribution, enzymatic formation, and disease implications of glycopeptide bonds. Glycobiology 2002, 12, (4), 43R- 56R Hofsteenge, J.; Muller, D. R.; de Beer, T.; Loffler, A.; Richter, W. J.; Vliegenthart, J. F., New type of linkage between a carbohydrate and a protein: C- glycosylation of a specific tryptophan residue in human RNase Us. Biochemistry 1994, 33, (46), Stepper, J.; Shastri, S.; Loo, T. S.; Preston, J. C.; Novak, P.; Man, P.; Moore, C. H.; Havlicek, V.; Patchett, M. L.; Norris, G. E., Cysteine S-glycosylation, a new post-translational modification found in glycopeptide bacteriocins. FEBS Lett 585, (4), Cheng, S.; Edwards, S. A.; Jiang, Y.; Grater, F., Glycosylation enhances peptide hydrophobic collapse by impairing solvation. Chemphyschem 11, (11), Zhang, H.; Li, X. J.; Martin, D. B.; Aebersold, R., Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol 2003, 21, (6),

76 111. Liu, T.; Qian, W. J.; Gritsenko, M. A.; Camp, D. G., 2nd; Monroe, M. E.; Moore, R. J.; Smith, R. D., Human plasma N-glycoproteome analysis by immunoaffinity subtraction, hydrazide chemistry, and mass spectrometry. J Proteome Res 2005, 4, (6), Lee, A.; Nakano, M.; Hincapie, M.; Kolarich, D.; Baker, M. S.; Hancock, W. S.; Packer, N. H., The lectin riddle: glycoproteins fractionated from complex mixtures have similar glycomic profiles. Omics 14, (4), Monzo, A.; Olajos, M.; De Benedictis, L.; Rivera, Z.; Bonn, G. K.; Guttman, A., Boronic acid lectin affinity chromatography (BLAC). 2. Affinity micropartitioningmediated comparative glycosylation profiling. Anal Bioanal Chem 2008, 392, (1-2), Satish, P. R.; Surolia, A., Exploiting lectin affinity chromatography in clinical diagnosis. J Biochem Biophys Methods 2001, 49, (1-3), Irani, D. N.; Anderson, C.; Gundry, R.; Cotter, R.; Moore, S.; Kerr, D. A.; McArthur, J. C.; Sacktor, N.; Pardo, C. A.; Jones, M.; Calabresi, P. A.; Nath, A., Cleavage of cystatin C in the cerebrospinal fluid of patients with multiple sclerosis. Ann Neurol 2006, 59, (2), Spivak, J. L.; Small, D.; Hollenberg, M. D., Erythropoietin: isolation by affinity chromatography with lectin-agarose derivatives. Proc Natl Acad Sci U S A 1977, 74, (10), Glenney, J. R., Jr.; Walborg, E. F., Jr., Lectin affinity chromatography of cell surface proteins of Novikoff tumor cells. J Supramol Struct 1979, 11, (4), Flurer, C.; Borra, C.; Beale, S.; Novotny, M., Fused silica packed microcolumns as micropreparative tools in protein analytical studies. Anal Chem 1988, 60, (17), Drake, P. M.; Schilling, B.; Niles, R. K.; Braten, M.; Johansen, E.; Liu, H.; Lerch, M.; Sorensen, D. J.; Li, B.; Allen, S.; Hall, S. C.; Witkowska, H. E.; Regnier, F. E.; Gibson, B. W.; Fisher, S. J., A lectin affinity workflow targeting glycositespecific, cancer-related carbohydrate structures in trypsin-digested human plasma. Anal Biochem 408, (1), Madera, M.; Mann, B.; Mechref, Y.; Novotny, M. V., Efficacy of glycoprotein enrichment by microscale lectin affinity chromatography. J Sep Sci 2008, 31, (14), Friberg, S.; Hammarstrom, S., Concanavalin A and other lectins in the study of tumor cell surface organization. Adv Exp Med Biol 1975, 55,

77 122. Lis, N. S. a. H., Lectins Durham, M.; Regnier, F. E., Targeted glycoproteomics: serial lectin affinity chromatography in the selection of O-glycosylation sites on proteins from the human blood proteome. J Chromatogr A 2006, 1132, (1-2), Endo, T., Fractionation of glycoprotein-derived oligosaccharides by affinity chromatography using immobilized lectin columns. J Chromatogr A 1996, 720, (1-2), Endo, Y.; Tsuchida, Y.; Miyazaki, J.; Kaneko, M., [Analysis of lectin-affinity of alpha fetoprotein-diagnostic approach]. Gan To Kagaku Ryoho 1983, 10, (2 Pt 2), Qiu, R.; Regnier, F. E., Use of multidimensional lectin affinity chromatography in differential glycoproteomics. Anal Chem 2005, 77, (9), Jung, K.; Cho, W.; Regnier, F. E., Glycoproteomics of plasma based on narrow selectivity lectin affinity chromatography. J Proteome Res 2009, 8, (2), Xiong, L.; Regnier, F. E., Use of a lectin affinity selector in the search for unusual glycosylation in proteomics. J Chromatogr B Analyt Technol Biomed Life Sci 2002, 782, (1-2), Brewer, C. F.; Bhattacharyya, L., Specificity of concanavalin A binding to asparagine-linked glycopeptides. A nuclear magnetic relaxation dispersion study. J Biol Chem 1986, 261, (16), Young, N. M.; Johnston, R. A.; Watson, D. C., The amino acid sequences of jacalin and the Maclura pomifera agglutinin. FEBS Lett 1991, 282, (2), Houles Astoul, C.; Peumans, W. J.; van Damme, E. J.; Barre, A.; Bourne, Y.; Rouge, P., The size, shape and specificity of the sugar-binding site of the jacalin-related lectins is profoundly affected by the proteolytic cleavage of the subunits. Biochem J 2002, 367, (Pt 3), Nagata, Y.; Burger, M. M., Wheat germ agglutinin. Isolation and crystallization. J Biol Chem 1972, 247, (7), Nagata, Y.; Goldberg, A. R.; Burger, M. M., The isolation and purification of wheat germ and other agglutinins. Methods Enzymol 1974, 32, (Part B), Nagata, Y.; Burger, M. M., Wheat germ agglutinin. Molecular characteristics and specificity for sugar binding. J Biol Chem 1974, 249, (10),

78 135. Yang, Z.; Harris, L. E.; Palmer-Toy, D. E.; Hancock, W. S., Multilectin affinity chromatography for characterization of multiple glycoprotein biomarker candidates in serum from breast cancer patients. Clin Chem 2006, 52, (10), Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6, (2), Plavina, T.; Hincapie, M.; Wakshull, E.; Subramanyam, M.; Hancock, W. S., Increased plasma concentrations of cytoskeletal and Ca2+-binding proteins and their peptides in psoriasis patients. Clin Chem 2008, 54, (11), Dayarathna, M. K.; Hancock, W. S.; Hincapie, M., A two step fractionation approach for plasma proteomics using immunodepletion of abundant proteins and multi-lectin affinity chromatography: Application to the analysis of obesity, diabetes, and hypertension diseases. J Sep Sci 2008, 31, (6-7), Zheng, X.; Wu, S. L.; Hincapie, M.; Hancock, W. S., Study of the human plasma proteome of rheumatoid arthritis. J Chromatogr A 2009, 1216, (16), Zeng, Z.; Hincapie, M.; Haab, B. B.; Hanash, S.; Pitteri, S. J.; Kluck, S.; Hogan, J. M.; Kennedy, J.; Hancock, W. S., The development of an integrated platform to identify breast cancer glycoproteome changes in human serum. J Chromatogr A 1217, (19), Orazine, C. I.; Hincapie, M.; Hancock, W. S.; Hattersley, M.; Hanke, J. H., A proteomic analysis of the plasma glycoproteins of a MCF-7 mouse xenograft: a model system for the detection of tumor markers. J Proteome Res 2008, 7, (4), Schulz-Knappe, P.; Zucht, H. D.; Heine, G.; Jurgens, M.; Hess, R.; Schrader, M., Peptidomics: the comprehensive analysis of peptides in complex biological mixtures. Comb Chem High Throughput Screen 2001, 4, (2), Diamandis, E. P., Peptidomics for cancer diagnosis: present and future. J Proteome Res 2006, 5, (9), Schrader, M.; Schulz-Knappe, P., Peptidomics technologies for human body fluids. Trends Biotechnol 2001, 19, (10 Suppl), S Schulz-Knappe, P.; Schrader, M.; Zucht, H. D., The peptidomics concept. Comb Chem High Throughput Screen 2005, 8, (8), Fredolini, C.; Meani, F.; Luchini, A.; Zhou, W.; Russo, P.; Ross, M.; Patanarut, A.; Tamburro, D.; Gambara, G.; Ornstein, D.; Odicino, F.; Ragnoli, M.; 78

79 Ravaggi, A.; Novelli, F.; Collura, D.; D'Urso, L.; Muto, G.; Belluco, C.; Pecorelli, S.; Liotta, L.; Petricoin, E. F., 3rd, Investigation of the ovarian and prostate cancer peptidome for candidate early detection markers using a novel nanoparticle biomarker capture technology. Aaps J 12, (4), Petricoin, E. F.; Belluco, C.; Araujo, R. P.; Liotta, L. A., The blood peptidome: a higher dimension of information content for cancer biomarker discovery. Nat Rev Cancer 2006, 6, (12), Liotta, L. A.; Petricoin, E. F., Serum peptidome for cancer detection: spinning biologic trash into diagnostic gold. J Clin Invest 2006, 116, (1), Villanueva, J.; Philip, J.; DeNoyer, L.; Tempst, P., Data analysis of assorted serum peptidome profiles. Nat Protoc 2007, 2, (3), Villanueva, J.; Shaffer, D. R.; Philip, J.; Chaparro, C. A.; Erdjument- Bromage, H.; Olshen, A. B.; Fleisher, M.; Lilja, H.; Brogi, E.; Boyd, J.; Sanchez- Carbayo, M.; Holland, E. C.; Cordon-Cardo, C.; Scher, H. I.; Tempst, P., Differential exoprotease activities confer tumor-specific serum peptidome patterns. J Clin Invest 2006, 116, (1), Robbins, R. J.; Villanueva, J.; Tempst, P., Distilling cancer biomarkers from the serum peptidome: high technology reading of tea leaves or an insight to clinical systems biology? J Clin Oncol 2005, 23, (22), Tirumalai, R. S.; Chan, K. C.; Prieto, D. A.; Issaq, H. J.; Conrads, T. P.; Veenstra, T. D., Characterization of the low molecular weight human serum proteome. Mol Cell Proteomics 2003, 2, (10), Zheng, X.; Baker, H.; Hancock, W. S., Analysis of the low molecular weight serum peptidome using ultrafiltration and a hybrid ion trap-fourier transform mass spectrometer. J Chromatogr A 2006, 1120, (1-2), Diamandis, E. P., Cancer biomarkers: can we turn recent failures into success? J Natl Cancer Inst 102, (19), Gutman, S.; Kessler, L. G., The US Food and Drug Administration perspective on cancer biomarker development. Nat Rev Cancer 2006, 6, (7), Anderson, L.; Hunter, C. L., Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteomics 2006, 5, (4), Anderson, N. L.; Anderson, N. G.; Haines, L. R.; Hardie, D. B.; Olafson, R. W.; Pearson, T. W., Mass spectrometric quantitation of peptides and proteins using 79

80 Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). J Proteome Res 2004, 3, (2), Whiteaker, J. R.; Zhao, L.; Zhang, H. Y.; Feng, L. C.; Piening, B. D.; Anderson, L.; Paulovich, A. G., Antibody-based enrichment of peptides on magnetic beads for mass-spectrometry-based quantification of serum biomarkers. Anal Biochem 2007, 362, (1), Lequin, R. M., Enzyme immunoassay (EIA)/enzyme-linked immunosorbent assay (ELISA). Clin Chem 2005, 51, (12), Yalow, R. S.; Berson, S. A., Immunoassay of endogenous plasma insulin in man. J Clin Invest 1960, 39, Itoh, K.; Suzuki, T., Antibody-guided selection using capture-sandwich ELISA. Methods Mol Biol 2002, 178, Dixit, C. K.; Vashist, S. K.; O'Neill, F. T.; O'Reilly, B.; MacCraith, B. D.; O'Kennedy, R., Development of a high sensitivity rapid sandwich ELISA procedure and its comparison with the conventional approach. Anal Chem 82, (16), Burnette, W. N., "Western blotting": electrophoretic transfer of proteins from sodium dodecyl sulfate--polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein A. Anal Biochem 1981, 112, (2), Ochiai, H., [Electric transfer of peptides from acrylamide gels to nitrocellulose sheets: Procedure and some applications of "Western blotting" (author's transl)]. Seikagaku 1982, 54, (2), Sharma, P.; Ganguly, N. K.; Sehgal, R.; Srivastava, R. K., Western blotting. Trop Gastroenterol 1989, 10, (1), Kermode, A. G.; Thompson, A. J.; Tofts, P.; MacManus, D. G.; Kendall, B. E.; Kingsley, D. P.; Moseley, I. F.; Rudge, P.; McDonald, W. I., Breakdown of the blood-brain barrier precedes symptoms and other MRI signs of new lesions in multiple sclerosis. Pathogenetic and clinical implications. Brain 1990, 113 ( Pt 5), Bielekova, B.; Martin, R., Development of biomarkers in multiple sclerosis. Brain 2004, 127, (Pt 7), Noben, J. P.; Dumont, D.; Kwasnikowska, N.; Verhaert, P.; Somers, V.; Hupperts, R.; Stinissen, P.; Robben, J., Lumbar cerebrospinal fluid proteome in multiple sclerosis: characterization by ultrafiltration, liquid chromatography, and mass spectrometry. J Proteome Res 2006, 5, (7),

81 169. Anderson, N. L.; Polanski, M.; Pieper, R.; Gatlin, T.; Tirumalai, R. S.; Conrads, T. P.; Veenstra, T. D.; Adkins, J. N.; Pounds, J. G.; Fagan, R.; Lobley, A., The human plasma proteome: a nonredundant list developed by combination of four separate sources. Mol Cell Proteomics 2004, 3, (4), Rithidech, K. N.; Honikel, L.; Milazzo, M.; Madigan, D.; Troxell, R.; Krupp, L. B., Protein expression profiles in pediatric multiple sclerosis: potential biomarkers. Mult Scler 2009, 15, (4), Brandacher, G.; Golderer, G.; Kienzl, K.; Werner, E. R.; Margreiter, R.; Weiss, H. G., Potential applications of global protein expression analysis (proteomics) in morbid obesity and bariatric surgery. Obes Surg 2008, 18, (7), Hwang, H.; Bowen, B. P.; Lefort, N.; Flynn, C. R.; De Filippis, E. A.; Roberts, C.; Smoke, C. C.; Meyer, C.; Hojlund, K.; Yi, Z.; Mandarino, L. J., Proteomics analysis of human skeletal muscle reveals novel abnormalities in obesity and type 2 diabetes. Diabetes 59, (1), Sheipe, M., Breaking through obesity with gastric bypass surgery. Nurse Pract 2006, 31, (10), 12-4, 17, 18, 21; quiz Hall, T. C.; Pellen, M. G.; Sedman, P. C.; Jain, P. K., Preoperative factors predicting remission of type 2 diabetes mellitus after Roux-en-Y gastric bypass surgery for obesity. Obes Surg 20, (9),

82 Chapter 2 Preparation of a High Performance Multi-Lectin Affinity Chromatography (HP-M-LAC) Adsorbent for the Analysis of Human Plasma Glycoproteins and an Automated Platform for Fractionation of Human Plasma Glycoproteome in Clinical Proteomics Publications 1. Kullolli, M.; Hancock, W. S.; Hincapie, M. J Sep Sci 2008, 31, Ralin, W. D. D., C.S.; Silver, E.J.; Travis, C.J.; Kullolli, M.; Hancock, W.S.;Hincapie, M. Clinical Proteomics Kullolli, M.; Hancock, W. S.; Hincapie, M. Anal. Chem. 2010, 82(1),

83 Abstract In this chapter we report on the preparation of an improved multi-lectin affinity support for high performance liquid chromatography separations. We combined the selectivity of three different lectins: concacanavalin A (Con A), wheat germ agglutinin (WGA) and jacalin (JAC). Each lectin was first covalently immobilized onto a polymeric matrix and then the 3 lectin media were combined in equal ratios. The beads were packed into a column to produce a mixed-bed multilectin HPLC column (HP-M-LAC) for fast chromatographic affinity separations. The support was characterized with respect to kinetics of immobilization, ligand density and binding capacity for human plasma glycoproteins. A high lectin density (15 mg/ml of beads) was found to be optimal for the binding of glycoproteins from human plasma. A single clinical sample can be fractionated in less than 10 min. per run; making this a useful sample preparation tool for proteomics/glycoproteomics studies associated with disease abormalities. Furthermore, we describe the development of an automated platform for the study of the plasma glycoproteome. The method consists of targeted depletion in-line with glycoprotein fractionation. A key element of this platform is the enabling of high throughput sample processing in a manner that minimizes analytical bias in a clinical sample set. The system, named High Performance Multi Lectin Affinity Chromatography (HP-M-LAC), is composed of a serial configuration of depletion column in-line with multi lectin affinity chromatography (M-LAC) which consists of three mixtures of lectins Concanavalin A (Con A), Jacalin (JAC) and Wheat Germ Agglutinin (WGA). We have demonstrated that this platform gives high recoveries 83

84 for the fractionation of the plasma proteome ( 95%) and excellent stability (over 200 runs). In addition, glycoproteomes isolated using the HP-M-LAC platform were shown to be highly reproducible and glycan specific as demonstrated by rechromatography of selected fractions and proteomic analysis of the unbound (glycoproteome 1) and bound (glycoproteome 2) fractions. 2.1 Introduction 84

85 Glycosylation of proteins is one of the most ubiquitous post-translational modifications observed in eukaryotic organisms, it is estimated that roughly half of the mammalian proteome consists of glycoproteins. In addition, glycoproteins are involved and play major roles in a myriad of cellular and biological functions, including immune defense, cell growth and differentiation, cell-cell adhesion and others. 1 As such, given the important role of glycoproteins, a number of analytical tools have been developed over the years and continue to emerge to study protein-carbohydrate interactions. In recognition of the importance of glycosylation, many proteomics researchers have turned their attention to the glycoproteome in search of disease diagnostic, prognostic and treatment biomarkers. 2 The field of proteomics has expanded rapidly, driven by advances in analytical technologies such as sample preparation, nanoscale liquid chromatography and high-sensitivity mass spectrometry. Despite these technological advances, identifying proteins of interest from a complex biological sample still poses a significant analytical challenge due to dynamic range limitations and the degree of complexity of biological samples. Therefore, sample pre-fractionation before proteomic analysis to enrich specific subproteomes, such as glycoproteins increases the likelihood of identifying proteins of interest. In particular, lectin affinity chromatography (LAC) is effective for simplifying a complex sample prior to proteomic analysis and for targeting certain types of glycoproteins

86 Lectins are widely distributed in nature and have the ability to recognize carbohydrates on the surface of proteins; and other studies have shown that the affinity of lectins for sugars is lower than the corresponding antibody-antigen interaction. 6 This property is advantages for affinity chromatography, since elution of absorbed proteins is more efficient and the recoveries of the bound proteins are generally robust. 7-9 Thus, for several decades the specificity of lectin - carbohydrate recognization has been explored in biology and medicine using lectins as affinity ligands. 10 A major application has been in affinity chromatography, where the unique specificity of a ligand-biomolecule interaction is considered to be one of the most specific separation methods available to isolate 11, 12 specific classes of proteins, such as glycoproteins. Several laboratories have reported on the use of lectins for clinical samples; differences in lectin binding patterns have been associated with possible differences in glycosylation in disease samples. A large number of glycoproteomic workflows employ lectin enrichment and mass spectrometric analysis of purified proteins or glycopeptides. The choice of lectins depends on the nature of the hypothesis that surrounds the glycoproteome of the project being studied. Both serial lectin affinity chromatography (S- LAC) and multi-lectin affinity chromatography have been explored as tools to investigate glycosylation changes. Alterations in glycosylation patterns have been linked to the progression of several diseases, most notably cancer Commonly used lectins, such as concavalin A (ConA) or wheat germ agglutinin (WGA) have overlap affinity for a broad range of different type of glycan structures. 86

87 Therefore it is a challenge to select the appropriate lectin for the affinity selection of a given glycan or glycoprotein and achieve complete binding of the targeted analyte. It was for these reasons that we developed the multi-lectin affinity approach (M-LAC), which uses admixtures of lectins, and gives rise to multivalent association with plasma glycoproteins, resulting in better capture of the plasma glycoproteome. 15 There is a need for well characterized lectin-affinity supports for selective isolation of plasma glycoproteins. However, most clinical applications, have utilized immobilized lectins on agarose-based media. The major drawback of this matrix is that chromatography conditions are severely constrained due to flow rate and back pressure limitations. This significantly increases analysis time and sample handling which can lead to sample loses and, thus impacts the proteomics analysis. More recently, high-pressure lectin columns have been produced; previous work by M. Novotny et al and F. Regnier et al showed that lectins could be bound to silica supports to produce lectin columns with good chromatographic 13, 16, 17 properties. However, there are ph limitations associated with silica supports and non specific absorption can be a problem. Other supports that have been used for affinity chromatography are Toyopearl resins. Toyopearl AF-Tresyl- 650 M resin is highly reactive with primary amines and thiol groups. This resin has good characteristics for affinity with large pores size (1000 Å) and µm particle size; however this support can not withstand high pressure (45 psi). 18 Therefore, polymeric supports represent perhaps the best alternative, providing a stable surface with low reactivity and high protein recoveries. 87

88 We reported previously on the use of multi-lectin affinity chromatography (M-LAC) for studying the plasma and/or serum proteome. The originally developed M-LAC media consisted of lectins cross-linked to a soft agarose support. We have now extended the M-LAC technology to the development of a high performance multi lectin column (HP-M-LAC) containing ConA, WGA and JAC covalently attached to a styrene-divinylbenzene support matrix coated with a cross-linked polyhydroxylated polymer (POROS ) with an active aldehyde functional group. This chapter will report on the development and characterization of HP-M-LAC support as well as evaluation of the chromatographic properties for applications in plasma proteomics/glycoproteomics. The field of clinical glycoproteomics has dramatically intensified, and an effort has focused on glycoproteins because of their biological significance and relevance to disease. The plasma glycoproteome has significant clinical value as a source of biomarkers, and proteins in plasma are predominantly glycosylated The complexity of plasma proteome, the wide dynamic range and glycoprotein heterogeneity has been the major obstacles for the discovery of clinical biomarkers. Nonetheless, there is compelling motivation in continuing to study this biofluid. There is general agreement that to detect candidate biomarkers present at moderate to low protein concentration, it is necessary to first remove high abundance proteins. The most commonly used method for simplification of the proteome utilizes affinity based techniques; these are advantageous because of their high selectivity. Among these, the so called immunodepletion columns are widely used. Monoclonal and polyclonal antibodies are a promising choice for 88

89 removal of high abundance proteins, due to their specificity, but they may not recognize all form of the proteins. Major manufactures of these immunoaffinty depletion columns are Agilent, Genway and Sigma. 22 Furthermore, we have previously reported on the combination of abundant protein depletion with M-LAC 23, 24 and have shown a deeper mining of the plasma glycoproteome. The HP-M- LAC can be easily integrated with abundant protein depletion or other chromatography modes for multidimensional sample fractionation. As is becoming more appreciated in the proteomic community, effective sample preparation is an essential step in comparative proteomics studies. Thus our interest has been the development of a sample fractionation workflow that minimizes the number of sample handling steps and resultant losses, ex-vivo proteolysis or chemical modifications. In this chapter we report that the HP-M-LAC approach has been effectively integrated with protein depletion prior to the glycoprotein enrichment step and in-line sample concentration/ desalting before trypsin digestion and mass spectrometry analysis. To this end, we report on the development of a robust and reproducible high performance automated platform for plasma fractionation that allows high throughput sample processing for clinical proteomics. 2.2 Experimental Section Materials and Chemicals. Aldehyde POROS- 20 AL (20 µm beads, A pores size) cat number and lot number AL , Protein A resin (POROS PA 50 µm), POROS-R1-50 resin and POROS anti-hsa 89

90 (2.0 ml) column were purchased from Life Technologies, (Carlsbad, CA) Unconjugated lectins: concanavalin A (ConA), jacalin (JAC), wheat germ agglutinin (WGA), were purchased from Vectors Laboratories (Burlingame, CA). Sodium cyanoborohydride, sodium sulfate, sodium chloride, ultra pure (hydroxymethyl)aminomethane hydrochloride, sodium azide, glycine, guanidine hydrochloride, dithiothreitol (DTT), tris(2-carboxyethyl) phosphine hydrochloride (TCEP-HCl), ammonium bicarbonate, iodoacetamide, manganese chloride, calcium chloride, and Ponceau S were purchased from Sigma (St. Louis, MO). PEEK columns were purchased from Isolation Technology (Milford, MA). Bradford protein assay kit, trifluoroacetic acid (TFA), formic acid, glacial acetic acid, and HPLC grade acetonitrile, HPLC grade water, and trypsin were purchased from Thermo Scientific ( Waltham, MA ). Plasma was purchased from Bioreclamation Inc (Long Island, NY). LC-MS columns 150 mm x 75 µm i.d. were purchased from New Objectives (Woburn, MA), reversed phase C18 Magic bead size 5 µm, pore size 300 Å were purchased from Microm BioResource (Auburn, CA). Coomassie blue, SDS-PAGE gels, Emerald ProQ glyco staining kit were purchased from Invitrogen (Carlsbad, CA). MARS (custom made) six protein depletion column (10 mm x 10 mm) and binding and elution buffer (buffer A and buffer B) were purchased from Agilent (Palo Alto, CA) Immobilization Chemistry and Ligand Optimization Covalent attachment of lectins was accomplished using Schiff base chemistry following the manufacture s protocol with minor optimizations. Briefly, lectins were reacted with the POROS 20 AL support. Formation of a stable 90

91 amine bond was accomplished through reductive amination with cyanoborohydride. The ligand density and kinetics of the immobilization was optimized to obtain an affinity support for effective capture of plasma glycoproteins. Preparation of the solid support was done by measuring g of the POROS 20 AL support and resuspending in 2 ml of 100 mm HEPES buffer, ph 8.2. The beads were washed 3 times with 5 ml of HEPES buffer using centrifugation (4,000 x g for 5 min) to sediment the beads; each time the supernatant was removed and discarded. After the last wash, a 50% slurry was made by adding 1 ml of 100 mm HEPES buffer ph 8.2. The lectin solutions were prepared as follows: 25 mg of ConA was dissolved in 1 ml of 100 mm HEPES, ph 8.2, 0.2 M methyl-alpha-dmannopyranoside, 0.2 methyl-alpha-d-glucopyranoside,1 mm CaCl 2 and 1 mm MnCl 2, 400µL of 2 M sodium sulfate was added very slowly (final concentration of sulfate 600 mm), 25 mg of WGA was dissolved in 2 ml of 100 mm HEPES, ph 8.2, 1 M sodium sulfate, 0.5 M N-acetyl glucosamine and 25 mg of JAC was dissolved in 2 ml of 100 mm HEPES, ph 8.2, 1 M sodium sulfate, 0.1 M melibiose. The concentration of sulfate for the reaction was optimized for each lectin. An aliquot of the starting material (20 µl) was removed and added to 980 µl of water to determine the initial protein concentration using a spectrophotometer. The ultraviolet (UV) absorption at 280 nm was measured and protein concentration was calculated using the concentration coefficient based on the optical density (O.D) for a 0.1 % solution measured at 280 nm; the values provided by Vector Laboratories were: ConA (1.2), WGA (1.46) and JAC (1.5). Lectin binding to the support was accomplished at 4 different ligand densities; the 91

92 corresponding volume of each lectin was added to a constant amount of POROS 20-AL support to obtain ratios of 7.5, 15, 30 and 45 mg/ml of beads. Sodium cyanoborohydride was added to a final concentration of 20 mm, the reaction was carried out at room temperature (RT) with constant mixing of the beads using a rocking station. The kinetics of ligand immobilization was followed for 2.5, 5.0, 7.0 and 24 hr by removing 5.0 µl of the lectin-support media and separating the supernatant from the beads by centrifugation for 30 sec on a microcentrifuge; 1.5 µl of the supernatant were spotted onto a small piece of nitrocellulose ( spot test ) to get a qualitative assessment of coupling reaction completion. The nitrocellulose membrane was stained with a 0.1 % solution of Ponceau S reagent for 30 sec and excess stain was removed using several quick washes of water. Analysis of the coupling was followed by visualization of a decrease in lectin concentration compared to the initial lectin solution. At 24 hr the reaction was quenched with Tris buffer ph 7.5, to a final concentration of 0.5 M. A second aliquot (20 mm) of sodium cyanoborohydride was added to reduce the secondary imines; the solution was mixed at RT for an additional 2hr and centrifuged for 5 min at 4000 x g. The supernatant was removed and saved for protein concentration measurement. The beads were washed 4 times with M-LAC binding buffer (25 mm tris, 0.5 M sodium chloride, 1 mm CaCl 2, 1 mm MnCl 2 and 0.05% sodium azide, ph 7.4). The protein concentration of the reaction supernatant and the washes were measured using UV at 280 nm to calculate coupling efficiency. Four different HP-M-LAC columns, each containing a different ligand to matrix ratio 92

93 of Con A, WGA and JAC were prepared by mixing equal volumes (1:1:1). The beads were packed under high pressure into a PEEK column (4.6 mm x 30 mm) using a self packing device (Aplied Biosystems) and following the manufacturer s instructions. The HP-M-LAC column was equilibrated with 5 ml of binding buffer at 4 ml/min, and plasma was slowly loaded at ml/min for 3 min, the flow rate was set to 4 ml/min and unbound (flow through) protein was washed-off with 5mL of binding buffer. The bound glycoproteins were eluted with 5 ml of elution (100 mm acetic acid) buffer or saccharide mixture Evaluation of Column Performance Using a Protein Standard Quality Control. Newly made columns were tested with 50 µl of reference plasma (approximately 2.1 mg of human plasma), WGA and M-LAC column were also tested with fetuin standard. The fetuin standard was made by dissolving 20 mg in 2 ml of M-LAC binding buffer for a final concentration of 10 mg/ml, 70 µg was loaded onto each of the columns using the method described above. Binding Characteristics. To compare the binding characteristics of M-LAC with the single lectin columns; 3 single lectin columns were packed containing the same amount of lectin as in the M-LAC as follows. Two single lectin columns were packed with the immobilized Con A, WGA, and one column containing the mixture of the 3 individual lectins (ConA, WGA and JAC) combined in a 1:1:1 (23 mg: 8.0 mg:11 mg) molar equivalent ratio to make the M-LAC column. The amount of Con A and WGA in the single lectin affinity column was the same as that of the M-LAC support. 93

94 Thyroglobulin protein was prepared by dissolving 20 mg in 2 ml of M-LAC binding buffer for a final concentration of 10 mg/ml, 5mg of the thyroglobulin solution was injected onto each of the ConA and M-LAC columns. In order to increase the residency time, the protein was loaded slowly (0.05ml/min) for 5 min. The flow rate was then increased to 2.5 ml/min. Unbound protein was washed with 10 column volumes (CV) of binding buffer and the bound protein was eluted with 11 CV of 100 mm acetic acid ph 3.8. The lectin columns were neutralized with 11 CV of neutralization buffer (0.5M Tris, 1M sodium chloride and 0.05% sodium azide, ph 7.4), and equilibrated with 10 CV of binding buffer Depletion of Abundant Proteins Using the Multiple Affinity Removal Column (MARS ) Chromatography was performed on a BioCad chromatography workstation (Applied Biosystems, Foster City, CA). The MARS column was equilibrated with 3 column volumes (CV) of binding buffer A at 2.5 ml/min. One hundred µl of plasma was diluted 1:5 with buffer A and loaded at 0.5 ml/min for 7 min onto the column. The flow rate was then set at 2.5 ml/min and depleted plasma was washed off with 3 CV of buffer A and collected. The bound proteins were released using elution buffer B for 5 CV. The column was then neutralized with 5 CV of neutralization buffer (0.5M Tris-HCl, 1M sodium chloride, 0.05% sodium azide, ph 7.5) and re-equlibrated with 3 CV of buffer. A total of 3 ml of crude plasma (30 injections, 0.1 ml/run) were processed to obtain sufficient material for characterization of the lectin affinity support. The volume of the depleted plasma was reduced using Amicon 5 kda molecular weight cut off filters 94

95 (Millipore, MA). The total protein concentration was measured using the Bradford protein assay; 12.0 mg of depleted plasma were obtained with a concentration of 2.9 mg/ml, this material was labeled reference depleted plasma and was aliquoted and stored at -70ºC HP-M-LAC Column Characterization The characterization of HP-M-LAC support was performed using the reference depleted plasma obtained as described above and column capacity was studied by injecting different amounts from µg of plasma. Desorption of bound glycoproteins was evaluated using two different buffer conditions, using either competitive saccharide displacement (25 mm Tris, 0.5 M sodium chloride, 0.2 M methyl α-d-mannopyranoside, 0.2 M methyl α-dglucopyranoside, 0.5 M N-acetylglucosamine, 0.1 M melibiose) or low ph elution (100 mm acetic acid at ph 3.8). The amount of protein eluted was measured by Bradford protein assay and recoveries were calculated. Fractions were subjected to trypsin digestion and LC-MS using the method described below Automation and Fractionation of human plasma Protein fractionation was carried out using an automated abundant protein depletion in-line with HP-M-LAC and sample desalting/concentration with a reversed phase column (RP trap). Chromatography was performed using a Prominence 2D HPLC (Shimadzu) equipped with 2 pumps, a sample injection valve and three additional auxiliary valves which allowed operation of multiple columns. Columns were switched on or off line for sample binding, column 95

96 washing and elution purposes; each valve was controlled independently from the software through contact closures Plasma Enrichment and Fractionation The depletion columns used in this study were: Protein A (4.6mm x 50mm) and POROS anti-hsa column to reduce the plasma concentration of immunoglobulin G (IgG) and albumin, respectively. The M-LAC column was cross-linked to a high pressure support following the above protocol. The reversed-phase trap (RP-trap) used was a POROS-R1 resin. The columns (protein A, M-LAC and RP-trap) were packed under high pressure into PEEK columns with using a self packing device (Life Technologies). The order of the columns in the platform was as follows; Protein A was placed in front of the anti-albumin and the two columns were connected with a union and placed in the HPLC first valve, followed by M-LAC in valve 2 and the RP-trap in valve 3. The configuration of this process is shown in Fig. 4. The sample (50 ul) was diluted 1:4 with binding buffer and was loaded in the protein depletion columns. The sample was introduced to the columns at a flow rate of 0.5 ml/min and ml of binding buffer was passed through the depletion columns, which gave enough volume to transfer the depleted plasma into the HP-MLAC column. The HP-MLAC column was taken offline and the protein depletion columns were eluted with 100 mm glycine ph 2.5 at 5.0 ml/min. The Protein depletion columns were then washed with 5 column volumes (CV) of neutralization buffer (0.25 M Tris, 1 M NaCl, ph 7.5) at 5.0 ml/min and equilibrated with 5 CV of binding buffer at 5.0 ml/min. Proteins with no affinity for the HP- M- 96

97 LAC column (unbound HP-M-LAC fraction) were washed with 5 CV of binding buffer at 5.0 ml/min and directly captured by the RP-trap which was previously equilibrated with 5 CV of aqueous buffer (0.1 % TFA in 5 % acetonitrile, 95 % water). The protein was then eluted with 70% organic phase (0.1% TFA in acetonitrile) at 5 ml/min and the column was immediately equilibrated with the aqueous phase. The protein peak from the RP trap was collected and immediately diluted with water to bring the concentration of acetonitrile to 30 % and neutralized with 20 µl of 1 M tris buffer, ph 9.0. The enriched glycoproteins bound to the HP- M-LAC column were then eluted with 5 CV of elution buffer (100mM acetic acid ph 3.8) at 5.0 ml/min and again captured on the RP-trap column; the trap was washed and glycoproteins were eluted, diluted and neutralized as described above for the unbound HP-M-LAC fraction. The volume of the unbound and bound HP-M-LAC eluted from the RP-trap was reduced to 100 ul using vacuum centrifugation Total Protein Concentration Assay Total protein concentrations were measured using the Bradford 25 protein assay as per manufacture s instructions. Protein recovery for both depletion and M-LAC fractionation steps were determined M-LAC Columns Reproducibility To validate the workflow of HP-MLAC platform, repetitive runs were performed. The unbound and bound HP-MLAC samples were run in 1D SDS-PAGE and stained with Coomassie blue dye to visualize the separated proteins; glycoproteins were stained with a Schiff-base reagent according to instructions in the manufacturers reagent kit (300 ProQ Emerarld, Invitrogen). Fractions were 97

98 subjected to in solution trypsin digestion and LC-MS analysis using previously described methods. 23 Briefly, proteins were denatured with 6 M guanidine. The samples were reduced with 5 mm TCEP for 15 min at room temperature and alkylated with 15 mm iodoacetamide for 25 min at room temperature in the dark. The reaction was then quenched by addition of another aliquot of 5 mm DTT. After diluting samples with 100 mm ammonium bicarbonate, ph 8.0 to bring guanidine- HCl concentration down to 1.2 M. Trypsin (1:40 w/w) was added to the samples and incubated for 16 hrs at 37 C Sample Desalting Prior to LC-MS the unbound and bound HP-MLAC samples were desalted using reversed-phase HPLC. The peptides were separated from the non or partially digested proteins using a POROS R1 reversed phase column. The mobile phase A was composed of 0.1% TFA in water, and mobile phase B was 0.1% TFA in HPLC grade acetonitrile. The digested proteins were loaded on the column at 2% solvent B and washed for 3 min to remove salts and other reagents from trypsin digestion. The bound peptides were eluted with a step gradient: 28% solvent B for 3 min, to collect the peptides to be analyzed by nano-lc-ms/ms, and then 95% solvent B for 5 min, to elute larger peptides, partially digested and/or non-digested proteins. The separation was performed at 3.0 ml/min and monitored at both 280nm and 214 nm. Peptides eluted with 28 % B were concentrated down to 50 ul using speed vacuum concentration Peptide Separation and Sequencing by nano-lc-ms/ms 98

99 The nano-lc-ms/ms was performed using an Eksigent system (Dublin, CA) interfaced with an LTQ linear ion trap mass spectrometer (Thermo Fisher Scientific). The composition of solvent A was 0.1 % (v/v) of formic acid in water and that of solvent B 0.1 % (v/v) of formic acid in HPLC grade acetonitrile. Concentrated peptide samples were injected using an autosampler onto a C18 reversed-phase capillary column (150 mm x 75 µm i.d.) packed in-house with Magic C18. The flow rate was 300 nl/min; the gradient was from 5 % solvent B to 40 % solvent B over 105 min, then from 40% solvent B to 60 % solvent B over 20 min, then from 60 % solvent B to 90 % solvent B over 5 min and held isocratic at 90 % solvent B for 15 min. The mass spectrometry method was the same as described previously. 23 Briefly, the electrospray conditions were: temperature of the ion transfer tube, 245 C; spray voltage, 2.0 kv; normalized collision energy, 35%. Data dependent MS/MS analysis was carried out using MS acquisition software (Xcalibur 2.0, ThermoFisher Scientific). Each MS full scan was acquired in a profile mode in the mass range between m/z 400 and 2000, followed by 7 MS/MS scans of the 7 most intense peaks. Dynamic exclusion was continued for duration of 2 min Database Searching and Data Analysis Protein/peptide identifications were obtained through a database search against a human proteomic database using the Sequest search engine (ver. 3.0) and stored in CPAS. 26 The database search was conducted against human protein databases Swiss-Prot (release 52 with protein sequences). The databases 99

100 consist of normal and reversed protein sequences to facilitate estimation of the false positive rate. 27 Trypsin was specified as the digestion enzyme with up to two missed cleavages and carboxyamidomethylation was designated as a fixed modification of cysteine. In order to minimize the level of false positive identifications, criteria that would yield an overall confidence of over 95% for peptide identification were established for filtering raw peptide identifications. This was achieved by filtering the Sequest results using the so called HUPO criteria 28 DCn F 0.10, X corr F 1.9, 2.2, and 3.75 for singly, doubly, and triply charged ions, respectively, followed by validation using Peptide Prophet analysis 28 with cutoff at 0.95 peptide probability to eliminate low confidence identifications Results and Discussion A rapid chromatographic affinity separation of plasma proteins was developed in order to increase the throughput of sample processing. This is important in clinical proteomics, where comparative analysis of disease and control samples requires large sample sets and technical replicates for meaningful statistical analysis. Sample fractionation which utilizes long chromatography runs can lead to an increase in ex-vivo proteolysis. Therefore, minimizing sample processing time while maximizing analysis of multiple samples can have a great impact on the quality and quantity of sample preparation for biomarker studies. For lectin immobilization we chose POROS 20-AL media. The mechanical, chemical, stability and binding properties of the support are key in the preparation of robust affinity chromatographic media. The hydrophilic coating of the resin minimizes the non-specific binding and very large pores and particle size (20 µm) 100

101 is suitable for coupling of large molecular weight ligands. These characteristics reduce significantly column pressure, so it is possible to operate these columns at high flow rates for very fast chromatographic separations. This chromatographic support provides high mass transfer and leads to immobilization of ligand at high densities. Optimization of conditions led to a higher binding capacity for plasma glycoproteins compared to the original agarose based M-LAC Kinetics and ligand density optimization. We evaluated the ligand density and reaction time by lectin immobilization at 4 different ligand concentrations (from 7.5 mg/ml- 45mg/mL beads) and coupling for up to 24 hr. The results are shown in Fig.1 and indicate that saturation of the beads with the lectins occurs at ~ 30 mg/ml. The saturation curve was nearly identical for the 3 lectins suggesting that the large pore size of this support can easily accommodate molecules of various sizes (106 kda (ConA), 66 kda (JAC) and 36 kda (WGA)). To analyze the influence of ligand density of the support on glycoprotein adsorption, crude plasma (80 ul approximately 3.36 mg of proteins) was applied onto four HP-M-LAC columns, each containing different lectin densities. (wt/vol). Plasma was separated into an unbound and a bound fraction. Protein recoveries were calculated for each fraction; the mass balance is presented in Table 1 and shows that maximum binding of plasma glycoprotein occurs at a lectin density of 15 mg/ml of packed beads. The data indicates that the highest concentrations of lectins on the surface of the support decreased the binding capacity for plasma glycoproteins. This is not surprising, since the immobilization chemistry used in 101

102 this study couples the lectins directly onto the bead. With no spacer arm, high ligand density most likely caused stearic hindrance effects and reduced the dynamic binding capacity. Following on these results, we prepared small batches of immobilized lectins to study the kinetics of the immobilization and followed lectin coupling at 2.5, 5.0, 7.0 and 24 hr. The attachment of the lectins to the support was determined by the nitrocellulose spot test. At 2.5 and 5.0 hr coupling was incomplete, as judged from protein staining on the membrane. Based on these observations, we then repeated the coupling reaction for 7 hr at two ligand densities (15 and 30 mg/ml). Columns were packed and evaluated using crude plasma as described above. Seen in Table 1, higher adsorption of plasma glycoproteins was observed for 7.0 hr compared to 24 hr of reaction time and confirmed the results that optimal adsorption occurs with a ligand density of 15 mg/ml packed beads. This amount corresponds to at least twice the lectin concentration (mg/ml) found in the commercially available agarose-conjugates, where ConA 6mg/mL, JAC 4mg/mL, and WGA 7mg/mL. (Vector Laboratories.). Furthermore, this data clearly shows that prolonged incubation times at room temperature affects glycoprotein capture, probably caused by loss of available active ligand. We concluded that in order to adsorb the maximum amount of plasma glycoproteins, the support must contain a ligand density of 15 mg/ml with immobilization allowed to proceed for 7.0 hr. Using these conditions, we then prepared a large batch of HP-M-LAC material for further characterization. Lectin coupling efficiency was high and typically reached greater than 95 %. Figure 1. Ligand ratio immobilization 102

103 Ligand density optimization Immobilized lectin (mg/ml) Initial lectin concentration (mg/ml) ConA JAC WGA Figure 1. Four different amount of each of the lectins were immobilized with POROS 20 AL to find the best ligand ratio, this was obtain by keeping the amound of bead constant and changing the amount of lectins during immobilization. As shown in the figure, the saturation of the beads was 30 mg/ml. Table 1. Kinetic s of immobilization Concentration of lectins 7 hours 24 hours Immobilized to the Reaction time Reaction time beads (mg/ml) Bound (%)* Unbound (%)* Total (%)* Bound (%)* Unbound (%)* Total (%)* *Average of duplicate runs Column Evaluation The support was packed into a PEEK column with a bed volume of ~ 0.5 ml (4.6 x 30mm). Each column was made, tested and combined to make the M-LAC column. The reproducibility of different cross-linking batches was tested as seen in Table 2. We established column performance by injection of 2.1 mg of reference plasma on each of the columns. The percentage of bound glycoprotein was calculated from the peak area and it corresponded to approximately 8 % for ConA, 103

104 10% for JAC, 25 % for WGA, and 18% for M-LAC of the total amount injected of plasma. We tested the WGA and M-LAC column using bovine fetuin. Bovine fetuin contains nearly 30% carbohydrate and it possesses three asparagines linked bi- and tri-antennary carbohydrate chains with terminal sialic acid residues and thus recognized by WGA. 29 We studied column reproducibility by loading 70 µg of fetuin onto the WGA and M-LAC column and we calculated peak are of the bound and unbound fraction as seen in Table 2.1. Table 2. Quality Control for Each Individual Lectin Column Columns Unbound Bound ConA-Column-1 ConA-Column-2 92 ± ± ± ± 1.3 Jacalin-Column-1 Jacalin-Column ± ± ± ± 0.8 WGA-Column-1 WGA-Column ± ± ± ± 1.6 M-LAC-Column-1 M-LAC-Column ± ± ± ± 0.6 Table 2 Quality control for each of the columns tested with 50 µl of human plasma. The percentage of bound glycoprotein was calculated from the peak area percentage. Table 2.1. Quality Control for WGA and M-LAC Column Columns Unbound Bound WGA-Column-1 WGA-Column ± ± ± ± 0.25 M-LAC-Column-1 M-LAC-Column ± ± ± ± 0.03 Table 2 Quality control for WGA and M-LAC columns tested with 70 µg of fetuin standard. The percentage of bound glycoprotein was calculated from the peak area percentage Kinetic Analysis Using Multiple and Single Lectin 104

105 As recall from the introduction the hypothesis of the combination of the specific lectin (ConA, JAC and WGA) in one column can lead to functional advantages, due to multi-valency and an increase in binding strength over single lectins. To examine the binding characteristic of the M-LAC versus single lectin column we used a prototype imaging instrument (LFIRE) along with our affinity method. Label-Free Internal Reflection Ellipsometry (LFIRE) is based on the principle of ellipsometry, a widely used optical technique for measuring the thickness and optical parameters of ultra-thin films, which can be calibrated directly to the amount of the material bound on a wide range of surfaces (analysis performed by our 30, 31 collaborator Maven Biotechnologies). Affinity chromatography was performed as described in materials and methods. To examine the binding affinity of single lectin columns (ConA and WGA) and multi-lectin column (M-LAC) we used porcine thyroglobulin. Thyroglobulin has 14 N-linked glycosylation sites, 4 of which are alpha-mannose and thus displays specificity for ConA; the other 10 glycosylation are GlcNac and are thus recognized by WGA. 32 The results are shown in fig 2 indicating that the amount of thyroglobulin bound to M-LAC (55%) is significantly higher than that of Con A (1.6%) or WGA (21%). The increased binding observed with the M-LAC is most likely due to the multi-site attachment of different glycan structures to the different lectins on the M-LAC column. Lectin-carbohydrate interactions are relatively weak ( M -1 ), however, in vivo, lectin exhibit high affinity and specificity for glycoproteins on the 105

106 surface of cells. It has been postulated that multiple protein-carbohydrate interactions act synergistically to bring about avidity and specificity. The kinetic measurements are shown in fig 2d. The binding affinity of thyroglobulin for the mixture of lectins (K a =2.1 x 10 6 M -1 ) was found to be about 6 and 15 times higher over ConA ( M -1 ) and WGA ( M -1 ), respectively. Figure 2 Binding characteristics 106

107 Figure 2a 2d: a), b) and c) represent the affinity chromatography of thyroglobulin on M- LAC, ConA and WGA respectively. Chromatographic conditions are described in materials and methods. Bound thyroglobulin was eluted with solvent B (100 mm acetic acid, ph 3.8) and protein elution was monitored at 280 nm. The quantitation was performed using peak area with 1.6%, 21% and 55% bound to ConA, WGA and M-LAC respectively. d) Kinetic fit results show the binding affinity about six times higher for the mixed lectin interaction over ConA and over an order of magnitude over WGA Column Characterization and Performance Since HP-M-LAC was designed for plasma proteomic/glycoproteomics analysis, the binding capacity of the column was measured using depleted plasma to reflect typical experimental conditions. It is now well accepted that removal of abundant proteins is necessary in proteomic analysis in order to increase the dynamic range for biomarker discovery. Thus the binding capacity measurement of the HP-M-LAC was performed using plasma depleted of 6 abundant plasma proteins. Depletion was accomplished using the MARS column to remove 107

108 albumin, immunoglobulin G, immunoglobulin A, transferrin, alpha-1-antitrypsin, and haptoglobulin using the method described in materials and methods. Depletion of these proteins typically has resulted in bound protein recoveries of about 80 % and the remaining 20 % of the proteins are then fractionated using the HP-M-LAC. To evaluate the binding capacity of the column different amount of depleted plasma were injected (250, 500, 750, 1000 and 2000 µg). As seen in Fig 3, at concentration above 500 µg of depleted plasma, the HP-M-LAC started to show glycoprotein saturation. Thus, using a 0.5 ml HP-M-LAC column approximately 500 µg of plasma depleted of 6 abundant proteins can be fractionated; this corresponds to approximately 100 µl of crude plasma which is compatible with most commercially available depletion columns. The immobilization of the support can be easily scaled to produce bigger HP-M-LAC columns with greater binding capacity for the enrichment of low level glycoproteins from plasma. Column reproducibility was examined using the reference depleted plasma. This sample was repeatedly analyzed (5-times) and protein concentration in individual fractions was determined using Bradford protein assay and Table 3 summarizes the results. It can be seen that on average 60 % of the glycoproteins in plasma bound to the HP-M-LAC in a reproducible manner with good protein recoveries (93%) and coefficient of variations (CV) of approximately 7.9 % and 6.8 % for the unbound and bound fractions, respectively. Compared to our previous work using lectin-agarose conjugates this support has yielded better overall recovery; from 80% with soft gels to greater than 90+ %. Figure 3: Loading Capacity 108

109 Loading Capacity 3.00E+06 Peak area 2.00E E E Amount of depleted plasma Figure 3. Different amount of depleted plasma were injected onto the column to obtain loading capacity. Table 3. Protein recoveries obtained from HP-M-LAC N=5 Unbound (%) Bound (%) Total (%) Mean Std dev CV Table 3. Five replicate runs were performed using the depleted plasma to establish column performance. Typically desorption of glycoproteins from lectin affinity columns is accomplished using competitive saccharide elutions. For throughput purposes and economic reasons we decided to investigate the use of low ph elution buffer. Optimal release of glycoproteins was tested using a mixture of saccharides and 100 mm acetic acid elution. A comparison of the 2 elution methods indicated that at least 10% more glycoproteins are recovered when elution is performed using 109

110 low ph. We performed a proteomic analysis to determine if the higher proteins recoveries observed with acidic elution could yield identifications of more proteins. The analysis of the bound glycoproteins led to the identification of approximately 120 proteins from crude plasma using conservative filtering criteria; of these, 75 % were found to be glycoproteins. This number is based on the Swiss-Prot database since there is no fully annotated glycoprotein database available. Compared to previous studies using agarose-immobilized lectins 15, the HP-M-LAC column was found to have the same oligosaccharide specificity. A partial list of the proteins identified is shown in Table 4, the data shows that the two different elution conditions led to the identification of roughly the same proteins. However, higher peptide coverage was observed with acidic elution. This could be beneficial for increasing the confidence of identification of low level proteins from complex samples, such as plasma, where low abundant proteins are typically identified by 1-2 peptides. In contrast, the use of acetic acid to elute the bound fraction from HP-M-LAC column could decrease column lifetime; however, we have been able to perform up to 150 runs of crude plasma. Although the chemistry used for the immobilization of the lectins forms a very stable bond, the possibility that lectin leaching could occur cannot be disregarded. Given the number of runs that can be performed on a column, suggests that leaching of the lectins, if it occurs is insignificant. A disadvantage of using acetic acid is that information regarding glycosylation changes could be lost. The use of immobilized lectins with different specificities has been used for structural characterization of carbohydrate changes such as changes in glycosylation patterns related to diseases. Therefore the 110

111 choice of buffer elution will greatly depend on the intended use of the HP-M-LAC column. For global proteomic studies of depleted plasma, where both the unbound and bound fractions are analyzed, the use of low acetic acid elution is preferred. This allows very fast sample preparation for profiling comparative protein abundances of tens or hundreds of disease and controls samples. The cycle time for the HP-M-LAC is less than 10 min per sample without sacrificing protein selectivity. Sugar elution is less economical and chromatographic separations may need to be carried out at lower flow rates to avoid high HPLC pressure due to the viscosity of the sugars. However, these minor issues are out-weighed by the benefits associated with saccharide elution in glycoproteomics studies aimed at studying disease-associated glycosylation changes. Table 4. Examples of proteins identified with higher coverage by using acetic acid elution Name of the protein 100 mm Acetic acid Average # of peptides Monosaccharide elution Average # of peptides 111

112 Alpha-2-macroglobulin Complement C4-A Complement factor H Complement factor B Alpha-1B-glycoprotein Afamin Angiotensinogen N-acetylmuramoyl-L-alanine amidase Serum amyloid P-component Immunoglobulin J CD5 antigen-like Table 4. Protein identified from LC-MS/MS runs to investigate the difference between two different elution buffers. As shown in the table, the coverage of proteins identified using 100 mm acetic acid as elution buffer is higher than the proteins identified using saccharide elution. We have produced an HP- M-LAC packing material using an application driven design, where the ligand density, immobilization kinetics and elution conditions were carefully optimized for human plasma and or serum. The affinity adsorbent has comparable binding properties to the original M-LAC published work by Yang et al, 2004; but with better protein recoveries and throughput for sample preparation in clinical proteomics/glycoproteomics studies. Due to the favorable pressure/flow characteristics of the POROS 20-AL material, the HP-MLAC was easily integrated into existing proteomics workflows for multidimensional sample fractionation and deeper mining of the plasma proteome Recovery Studies of HP-M-LAC Platform This platform consists of sequential, multidimensional HPLC fractionation of plasma. The crude plasma is diluted with binding buffer (25 mm Tris, 0.5 M NaCl, 1mM MnCl 2, 1 mm CaCl 2 ph 7.4, 0.05% sodium azide) the buffer composition is compatible with all the affinity columns in the platform. In the first dimension, 112

113 albumin and immunoglobulin G were depleted from plasma using the corresponding affinity columns, the sample was then loaded into the HP-M-LAC column, where the unbound was washed and concentrated/desalted into the reversed-phase trap (RP-trap) column. The sample was concentrated using the speed vacuum apparatus. The bound fraction from HP-M-LAC was eluted and concentrated into the trap column. A diagram of the workflow of our approach is shown in Fig. 4. In this manner, the plasma sample gets loaded into the depletion and the HP-M- LAC enrichment column and is ready for trypsin digestion, minimizing sample manipulations which could introduce losses of low level proteins. We evaluated the performance of in-line protein depletion and M-LAC fractionation using normal human plasma. Five independent loadings of the human plasma sample were performed on the HP-M-LAC platform. The total protein concentration in each fraction (unbound, bound and depleted plasma) was measured using the Bradford protein assay and Table 5 summarizes the data. It can be seen that on average the recovery was 96 % with a SD of < 3 with a major fraction of the plasma proteins being removed in the depletion step (approximately 80%). The two glycoprotein fractions were roughly equivalent (approximately 10% of the load) with overall SD of 3 for all fractions. Figure 4. HP-M-LAC workflow. Syringe Loading Waste Depletion 113 Sample Loop

114 Figure 4. Diagrammatic representation of the automated platform, which indicates the flow and valve connection for the depletion step and glycoproteome fractionation with in-line desalting after both the depletion and lectin step. The performance of each chromatographic operation was monitored with a UV detector. A side by side comparison was performed between our previously described immunodepletion and M-LAC method 6 with the HP-M-LAC platform. Figure 5 gives a partial list of the proteins identified in both platforms and the spectral counts for each of the proteins identified in both unbound and bound fractions. As seen in figure 5, the number of spectral counts identified in the HP-M-LAC platform is higher than the spectral counts identified using M-LAC agarose platform. The total number of the spectral counts identified is shown in the inserted graphs for both fractions unbound and bound identified in both platforms. Table 5. Reproducibility and recovery study of HP-M-LAC platform a,b Bound Depleted plasma Unbound M-LAC Bound M-LAC Total Recovery (%) 114 Mean

115 a Number of replicates, N=5 b The protein concentration in each sample was measured using the Bradford Protein Assay Figure 5. Comparison between agarose and HP-M-LAC platforms. Spectral Counts Identified p Protein 60 BD-2P-MLAC-Inline 50 BD-Agarose-MLAC A1BG_HUMAN AACT_HUMAN AFAM_HUMAN ANT3_HUMAN APOC2_HUMAN APOH_HUMAN CERU_HUMAN Proteins Identifies CFAH_HUMAN CLUS_HUMAN T o ta l S p e c tr a l C o u n ts Id e n tif Total Peptides Identified BD-2P-MLAC-Inline BD-Agarose-MLAC s Identified p ein UNB-2P-MLAC-Inline-2 UNB-Agarose-MLAC

116 Fig 5. Comparison of the two different platform side by side (agarose and HP-MLAC), In figure 5 shows a partial list of the protein identified on both platforms and the sectral counts for each protein. The inserted graphs show the total number of the spectral counts identified in both platforms, Development of a Strategy for Plasma Glycoproteome As shown previously, integrating depletion of abundant plasma proteins with M- LAC fractionation can improve the depth of analysis of the plasma proteome. There are a variety of commercially available affinity capture columns such as protein A, protein G, or protein L, and anti human serum albumin. 23 There are also a number of multicomponent immunoaffinity matrices which target the most abundant plasma proteins, for instance, MARS-7 and 14 (Agilent) and IgY12 (GenWay), as well as top 20 (Sigma) which will deplete the top 7, 12, 14, and 20 most abundant proteins, respectively. One concern when using affinity based depletion strategies includes the loss of potentially important biomarkers due to binding to depleted proteins or to non-specific interactions with the affinity column. For instance, HSA is known as a nonspecific binding protein due to its biological role as a carrier protein. 33 Furthermore, depletion when used alone is not sufficient to detect low abundance proteins due to the complexity of plasma samples and 116

117 thus further fractionation is required. A successful clinical proteomics study requires an appropriate balance between several constraints and in this study we will focus on the issues of bias, throughput, recovery, reproducibility and depth of analysis. For example, the combination of a depletion column, such as 2 proteins (2P) with a chromatographic step (reversed phase, strong cation exchange, gel electrophoresis, or IEF separation) will generate multiple fractions. As a result, one has to overcome the issues of high cost, extended length of the study and most importantly large sample requirements, sample manipulation, reproducibility, and recoveries. As a solution we developed a method which consists of combining 2P depletion with HP-M-LAC which yields only two fractions per clinical sample. Our results suggested that 2P depletion combined with M-LAC gave an appropriate balance between two important constraints in a clinical proteomics study, namely through put and depth of analysis. Moreover, we will illustrate the issue of potential bias with the use of affinity-depletion columns and the concern about the stability of the antibodies over time. Fig. 6 shows examples of trend analysis of the leak through of two proteins following depletion of top 12 most abundant proteins. This data was obtained from an independent clinical study, where 42 plasma samples were randomized and depleted using 12P depletion column. While the amount of protein leakage was small (typically a few %) the trend analysis suggested that some of the depleted proteins such as α-1 acid glycoprotein 1 and transferrin showed increased levels later in the study. In addition, there are several other factors that present barriers to widespread use of multi antibody depletion columns, e.g. the cost of the depletion column ($150 per 117

118 analysis), limited loading capacity ( µl), and buffers that are incompatible with other chromatographic steps (e.g. the MARS column). Figure 6. Stability of antibody columns. Spectral Counts for Transferrin Spectral Count Number of Runs Spectral Counts for Alpha-1 Acid glycoprotein Spectral Count Number of Runs Figure 6. Trend analysis for depleted proteins (tansferrin and alpha 1-acid glycoprotein 1) from a 42 plasma sample set in which a 12 protein depletion column containing the corresponding antibodies was used Reproducibility Studies of HP-M-LAC Platform The reproducibility of HP-MLAC was evaluated in triplicate by 1D SDS-PAGE and stained using Coomassie Blue and Schiff-base glyco staining. As seen in Fig.7, the protein patterns observed across replicates is reproducible. The same pattern of bands was observed with glyco staining (see fig. 7b for representative results) and indicated that the unbound and bound fractions from the HP-M-LAC were predominately glycosylated. These results suggest that HP-M-LAC fractionates the plasma proteome into two glycoproteomes. The composition of the 118

119 glycoproteome 1 (unbound fraction) and glycoproteome 2 (bound fraction) and glycan specificity will be the subject of a future publication. Studies in our laboratory are now focusing on determine the glycan specificity of the M-LAC. The absence of detectable amounts of albumin and immunoglobulins in the M-LAC fractions shown in Fig. 7 demonstrated effectiveness of the depletion column (run 1 lanes b,c unbound and bound HP-MLAC, run 2 lanes e,f, run 3 lanes h,k respectively). To demonstrate the reproducibility of individual glycoprotein fractionation on M- LAC column we performed both physical (independent plasma fractionations) and analytical replicates. The resulting glycoprotein fractions were digested with trypsin as previously described 23 and analyzed by LC-MS/MS. Figure 8 shows the reproducibility of the HP-M-LAC platform, the plot gives the correlation of two independent runs analyzed in the entire platform. As seen in the figure, the correlation coefficient for the two independents runs is and 0.957, for unbound and bound fractions, respectively; showing a good reproducibility of the platform. In addition we have demonstrated complete removal of the albumin (as measured of spectral counts of M-LAC fractions) and constant high level removal of IgG ( 95%). As previously describes we used acetic acid to elute glycoproteins bound to the M-LAC column, and thus we decided to monitor the lifetime of the column. We performed an independent run of the same plasma sample three months after the column was first packed and tested in the HP-MLAC platform (approximately 200 runs) and again, good reproducibility was observed with a 119

120 correlation coefficient of These results demonstrated the stability of the HP-MLAC platform using a mild acid elution step. Figure 7. Reproducibility of HP-M-LAC as measured by 1D SDS-PAGE. MW Run 1 Run 2 Run 3 a b c d MW a a a b c d e f g h k l Albumin Heavy IgG a b Figure 7. Three replicate runs of HP-MLAC (runs 1,2 and 3) were analyzed by 1D SDS- PAGE stained with Coomassie Blue. Lane a contains MW standards, lane a is the starting plasma sample. Runs 1,2 and 3 represent replicates of the same plasma samples consequently run on the same HP-MLAC column (lanes b,c represent unbound and bound M-LAC). Runs 2 (lanes e,f ) and Run 3 (lanes h,k ) are shown in the same order. Lanes d,g and l represent the bound and then eluted fraction from the depleted column. The arrows shown in both sides of the figure represent the expected migration of albumin and heavy chain IgG. Fig 7. b shows the glyco staining for the region a thru d. Figure 8. Measurement of the reproducibility of the HP-M-LAC platform. Number of spectral counts identified per protein run y = x R 2 = Number of spectral counts identified per protein run 1 a 120

121 Spectral Count new sample y = x R 2 = Spectral Counts old samples In Fig 8,a) shows the measurement of the reproducibility of the HP-MLAC platform and b) shows the reanalysis of a fresh plasma sample in the HP-M-LAC platform after three month. Each point represents a protein, x and y axes give the comparison of the number of peptide identified (spectral counts) per protein in run 1 vs the number of peptide identified (spectral counts) for the same protein in run 2. The correlation values are shown above Specificity of the Multi Lectin Column To show the specificity of the HP-M-LAC column for the glycan subpopulation of a given glycoprotein we re-injected the unbound and the bound fraction into the M- LAC column (see Fig. 9) and demonstrated reproducibility of the chromatographic process. We also evaluated the specificity of the column by analyzing the levels of individual glycoproteins present in the unbound and bound M-LAC fractions by LC- b MS/MS (see methods section). The glycoprotein identification and relative distribution was highly reproducible between the 4 replicates (see Table 6, SD 6 as measured by spectral counts). Table 6 shows examples of the identified proteins which are all known to be glycosylated and it can be seen that some glycoproteins were present at similar amounts in both fractions (unbound and bound) such as ceruloplasmin, afamin and AMBP protein. Some of the proteins 121

122 were identified only in the bound M-LAC fraction for instance kininogen, lumican, plasma retinol-binding protein or the unbound fraction (talin). In conclusion, the number of peptides identified per glyco protein in the unbound and bound fractions were highly consistent over the reproducibility study (Table 6). Furthermore, the demonstrated consistency of the HP-MLAC platform is important parameter when studying glycosylation changes in disease vs. control samples. Figure 9. Specificity of M-LAC column showed by re-chromatography Loading Flow through M-LAC Elution Neutralization M-LAC M-LAC Equilibration Unbound M-LAC Bound M-LAC Re-loading Unbound fraction 122 Re-loading

123 In Fig 9 panel a, the peaks represent the elution of each step of the HP-MLAC platform. In Fig 9. panel b and c show the re-injection of the unbound and bound M-LAC fraction collected from panel a, to show the specificity of the M-LAC column. Table 6. Partial list of proteins identified and glycoprotein distribution in the HP-M-LAC fractions Protein Name Bound M-LAC Unbound M-LAC Complement C4-A 61 a ± 4 b 129 a ± 2 b Ceruloplasmin 59 ± 3 72 ± 3 Transthyretin 25 ± ± 2 Kininogen 27 ± 2 ND Afamin 25 ± 6 28 ± 3 Inter-alpha-trypsin inhibitor heavy chain H4 20 ± 3 ND 123

124 Plasma retinol-binding 19 ± 2 ND protein Clusterin 21 ± 4 3 ± 1 Complement C9 10 ± 2 16 ± 1 AMBP Protein 12 ± 3 9 ± 2 N-acetylmuramoyl-Lalanine 11 ± 1 ND amidase Vitronectin 10 ± 1 25 ± 2 Complement factor I 7 ± 1 4 ± 1 Lumican 4 ± 0.5 ND Talin ND 3 ± 1 a Average of four replicate runs b Standard deviation of the replicates 2.4 Conclusion Importance of sample preparation in clinical proteomics has becoming more appreciated in the proteomic community; effective sample preparation is an essential step in comparative proteomics studies. Thus, the goal of this chapter was to develop a multi dimensional fractionation of plasma workflow that minimizes the number of sample handling steps and resultant losses, ex-vivo proteolysis, or chemical modifications. In order to achieve this goal we first produced an HP- M- LAC packing material using an application driven design, where the ligand density, immobilization kinetics and elution conditions were carefully optimized for human plasma and or serum. The affinity adsorbent has comparable binding properties to the original M-LAC published work by Yang et al, 2004; but with better protein recoveries and throughput for sample preparation in clinical proteomics/glycoproteomics studies. Due to the favorable pressure/flow characteristics of the POROS 20-AL material, the HP-MLAC can be easily integrated into existing proteomics workflows for multidimensional sample fractionation and deeper mining of the plasma proteome. Furthermore, HP-M-LAC 124

125 can be used in combination with other lectins of narrower specificities to study glycosylation changes. The immobilization and characterization procedures developed in this study can be easily applied to other lectins. There is a wide variety of lectins, many of which are available commercially; waiting to be explored in glycoproteomics research. The automated sample processing for glycoproteomic analysis presented in this thesis work greatly minimize upstream variations and sample losses during sample preparation. Improvements in any of the steps in a proteomics study increases the reliability of the analytical method and have a direct effect on the quality the data and the depth/protein coverage of the analysis. Moreover, the use of spectral counting is becoming widely used as a standard method for protein semi-quantitation, in this strategy the number of peptides identified for a given protein is used as an initial survey to compare abundance differences among the clinical samples. Therefore, the reproducibility of the analytical measurement is crucial in comparative clinical proteomics, and it requires the run-to-run precision demonstrated in this platform. 2.5 Reference: 1. Apweiler, R.; Hermjakob, H.; Sharon, N., On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1999, 1473, (1), Ono, M.; Hakomori, S., Glycosylation defining cancer cell motility and invasiveness. Glycoconj J 2004, 20, (1), Drake, P. M.; Schilling, B.; Niles, R. K.; Braten, M.; Johansen, E.; Liu, H.; Lerch, M.; Sorensen, D. J.; Li, B.; Allen, S.; Hall, S. C.; Witkowska, H. E.; Regnier, F. E.; Gibson, B. W.; Fisher, S. J., A lectin affinity workflow targeting glycositespecific, cancer-related carbohydrate structures in trypsin-digested human plasma. Anal Biochem 408, (1),

126 4. Drake, P. M.; Cho, W.; Li, B.; Prakobphol, A.; Johansen, E.; Anderson, N. L.; Regnier, F. E.; Gibson, B. W.; Fisher, S. J., Sweetening the pot: adding glycosylation to the biomarker discovery equation. Clin Chem 56, (2), Sharon, N.; Lis, H., Lectins: cell-agglutinating and sugar-specific proteins. Science 1972, 177, (53), Sharon, N.; Lis, H., Lectins--proteins with a sweet tooth: functions in cell recognition. Essays Biochem 1995, 30, Sharon, N., Lectins: past, present and future. Biochem Soc Trans 2008, 36, (Pt 6), Sharon, N., A life with lectins. Cell Mol Life Sci 2005, 62, (10), Sharon, N., Lectins: carbohydrate-specific reagents and biological recognition molecules. J Biol Chem 2007, 282, (5), Weis, W. I.; Drickamer, K., Structural basis of lectin-carbohydrate recognition. Annu Rev Biochem 1996, 65, Hage, D. S., Affinity chromatography: a review of clinical applications. Clin Chem 1999, 45, (5), Lis, H.; Sharon, N., Lectins as molecules and as tools. Annu Rev Biochem 1986, 55, Qiu, R.; Regnier, F. E., Use of multidimensional lectin affinity chromatography in differential glycoproteomics. Anal Chem 2005, 77, (9), Yang, Z.; Hancock, W. S., Monitoring glycosylation pattern changes of glycoproteins using multi-lectin affinity chromatography. J Chromatogr A 2005, 1070, (1-2), Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. J Chromatogr A 2004, 1053, (1-2), Madera, M.; Mann, B.; Mechref, Y.; Novotny, M. V., Efficacy of glycoprotein enrichment by microscale lectin affinity chromatography. J Sep Sci 2008, 31, (14), Mechref, Y.; Madera, M.; Novotny, M. V., Glycoprotein enrichment through lectin affinity techniques. Methods Mol Biol 2008, 424, BIOSCCIENCE, T. Affinity Chromatography. 19. Madera, M.; Mechref, Y.; Novotny, M. V., Combining lectin microcolumns with high-resolution separation techniques for enrichment of glycoproteins and glycopeptides. Anal Chem 2005, 77, (13), Madera, M.; Mechref, Y.; Klouckova, I.; Novotny, M. V., High-sensitivity profiling of glycoproteins from human blood serum through multiple-lectin affinity chromatography and liquid chromatography/tandem mass spectrometry. J Chromatogr B Analyt Technol Biomed Life Sci 2007, 845, (1), Madera, M.; Mechref, Y.; Klouckova, I.; Novotny, M. V., Semiautomated high-sensitivity profiling of human blood serum glycoproteins through lectin preconcentration and multidimensional chromatography/tandem mass spectrometry. J Proteome Res 2006, 5, (9), Echan, L. A.; Tang, H. Y.; Ali-Khan, N.; Lee, K.; Speicher, D. W., Depletion of multiple high-abundance proteins improves protein profiling capacities of human serum and plasma. Proteomics 2005, 5, (13),

127 23. Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6, (2), Dayarathna, M. K.; Hancock, W. S.; Hincapie, M., A two step fractionation approach for plasma proteomics using immunodepletion of abundant proteins and multi-lectin affinity chromatography: Application to the analysis of obesity, diabetes, and hypertension diseases. J Sep Sci 2008, 31, (6-7), Bradford, M. M., A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 1976, 72, Rauch, A.; Bellew, M.; Eng, J.; Fitzgibbon, M.; Holzman, T.; Hussey, P.; Igra, M.; Maclean, B.; Lin, C. W.; Detter, A.; Fang, R.; Faca, V.; Gafken, P.; Zhang, H.; Whiteaker, J.; States, D.; Hanash, S.; Paulovich, A.; McIntosh, M. W., Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J Proteome Res 2006, 5, (1), Haas, W.; Faherty, B. K.; Gerber, S. A.; Elias, J. E.; Beausoleil, S. A.; Bakalarski, C. E.; Li, X.; Villen, J.; Gygi, S. P., Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol Cell Proteomics 2006, 5, (7), Omenn, G. S.; States, D. J.; Adamski, M.; Blackwell, T. W.; Menon, R.; Hermjakob, H.; Apweiler, R.; Haab, B. B.; Simpson, R. J.; Eddes, J. S.; Kapp, E. A.; Moritz, R. L.; Chan, D. W.; Rai, A. J.; Admon, A.; Aebersold, R.; Eng, J.; Hancock, W. S.; Hefta, S. A.; Meyer, H.; Paik, Y. K.; Yoo, J. S.; Ping, P.; Pounds, J.; Adkins, J.; Qian, X.; Wang, R.; Wasinger, V.; Wu, C. Y.; Zhao, X.; Zeng, R.; Archakov, A.; Tsugita, A.; Beer, I.; Pandey, A.; Pisano, M.; Andrews, P.; Tammen, H.; Speicher, D. W.; Hanash, S. M., Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publiclyavailable database. Proteomics 2005, 5, (13), Johnson, W. V.; Heath, E. C., Evidence for posttranslational O-glycosylation of fetuin. Biochemistry 1986, 25, (19), Azzam RMA, B. N., Ellipsometry and Polorized Light. Elsevier: 1987; p J, H., Polorized Light and Ellipsometry. William Andrew Publishing: New York, 2005; p Rawitch, A. B.; Pollock, H. G.; Yang, S. X., Thyroglobulin glycosylation: location and nature of the N-linked oligosaccharide units in bovine thyroglobulin. Arch Biochem Biophys 1993, 300, (1), Jacobs, J. M.; Adkins, J. N.; Qian, W. J.; Liu, T.; Shen, Y.; Camp, D. G., 2nd; Smith, R. D., Utilizing human blood plasma for proteomic biomarker discovery. J Proteome Res 2005, 4, (4),

128 Chapter 3 Investigation of Potential Biomarkers of Multiple Sclerosis Using Proteomics and Peptidomic Analysis of Human Plasma This study was funded by Biogen Idec and is prepared as manuscript for publication. 128

129 Abstract As part of our collaboration with Biogen Idec, we conducted an exploratory study on defined clinical samples of patients with multiple sclerosis (M. Scler) to identify candidate protein or peptide biomarkers of pathology and potentially predictive biomarkers of the response to therapeutic intervention. For the discovery phase of the study, ten samples from each of four different groups (relapsing-remitting M Scler (RRMS), secondary progressive M Scler (SPMS), primary progressive M Scler (PPMS) and matched healthy controls were analyzed using two mass spectrometry-based comparative methods. The first method surveyed the plasma proteome to evaluate changes in the levels of proteins. In this method, removal of abundant plasma proteins and multi-lectin affinity chromatography enabled enrichment of plasma glycoproteins. The second method, allowed evaluation of the low molecular weight (LMW) component of plasma to identify endogenous LMW proteins or peptides fragments (peptidomics) which arise from in vivo cleavage of intact proteins due to dysregulation of proteolytic activity. 129

130 3.1 Introduction Multiple sclerosis (M. Scler) is an autoimmune disease that affects the brain and spinal cord (central nervous system). 1, 2 M. Scler is a chronic disease characterized by multiple areas of inflammatory demyelination, axonal degradation and glial sclerosis. M. Scler has a very heterogeneous clinical presentations and courses; there are three different types of M. Scler, relapsing-remitting M. Scler (RRMS), secondary pregressive M. Scler (SPMS), primary progressive M. Scler (PPMS). 3 RRMS is characterized by exacerbations followed by periods of complete or incomplete recovery that can lead to step-wise accumulation of disability over time. In the initial stages of M. Scler, about 80-90% of individuals have relapsing disease. SPMS is a stage of M. Scler where gradual neurological deterioration accurs with or without relapses, always proceeded by an intial RRMS phase. Approximately 50% of the patients that have RRMS will enter the SPMS phase within 10 to 15 years of disease onset. PPMS are characterized by a steady progression of disability with few or no exacerbations. Approximately 10% of individuals are diagnosed with primary progressive disease. Even though M. Scler has been around for a very long time the pathology of the disease is still unclear 2, 3. Proteomics and genomics have been used for discovering potential biomarkers in M. Scler. In this study, we focused on proteomics for biomarker discovery because it is necessary to study the expression of proteins to fully understand the signaling mechanism in the development of human disease 4. Most of the biomarker studies in M. Scler have been performed in cerebrospinal fluid (CSF), due to its proximity 130

131 to inflammatory lesions in the central nervous system. However, CSF is collection is one of the challenges for proteomic analysis due to the fact that CSF production is not homogenized, the maximum production occurs around 2:00 a.m and minimum production around 6:00 p.m, and therefore the time collection it is important. In addition, CSF collection is an invasive procedure and it is performed for selected patients. For these reasons there is a need for biomarkers in human plasma for multiple sclerosis. The major goal of this study was to perform proteomic analysis in human plasma of multiple sclerosis patients 5. In recent years, proteomics analyses have led the identification of candidate markers for different diseases. In this study we conducted an exploratory study on defined clinical samples of patients with M. Scler to identify candidate protein or peptide biomarkers of pathology and potentially predictive biomarkers of the response to therapeutic intervention. In our study, we combined two proteomics technologies which were previously developed in our laboratory. 6, 7 The first method consists of a combination of immunodepletion inline with Multi Lectin Affinity Chromatography. 8, 9 The combination of the immunodepletion column and MLAC column have been previously used in biomarker discovery and has been shown to identified proteins changing in disease vs. control. 10, 11 The second method, peptidomics, allowed evaluation of the low molecular weight (LMW) component of plasma to identify endogenous LMW proteins or peptides fragments which arise from in vivo cleavage of intact proteins due to dysregulation of proteolytic activity. 3.2 Materials and Methods 131

132 3.2.1 Materials and Chemicals Aldehyde POROS- 20 AL (20 µm beads), POROS Protein A (PA) (POROS-PA50 resin), POROS-R1-50 resin and POROS anti-hsa (2.0 ml) column were purchased from Applied Biosystems, (Foster City, CA). Unconjugated lectins: concanavalin A (ConA), jacalin (JAC), wheat germ agglutinin (WGA), were purchased from Vectors Laboratories (Burlingame, CA). Sodium cyanoborohydride, sodium azide, sodium sulfate, sodium chloride, ultra pure (hydroxymethyl)aminomethane hydrochloride, sodium azide, glycine, guanidine hydrochloride, dithiothreitol, ammonium bicarbonate, iodoacetamide, manganese chloride, calcium chloride, and C18 guard column were purchased from Sigma (St. Louis, MO). PEEK columns were purchased from Isolation Technology (Milford, MA). The following items required for Western Blott were purchased from Invitrogen (Carlsbad, CA); NuPAGE MES SDS running buffer, 4-12% Bis-Tris Gel, NuPAGE LDS 4x sample buffer, 20 x transfer buffer, nitrocellulose paper, filter paper, western blot molecular weight standards. Bradford protein assay kit, trifluoroacetic acid (TFA), formic acid, glacial acetic acid, and HPLC grade acetonitrile, HPLC grade water, and trypsin were purchased from ThermoFisher Scientific (Waltham, MA ). Plasma was purchased from Bioreclamation Inc (Long Island, NY). Millipore centrifuga filters MWCO of 10,000 were obtained from Millipore (Bellerica, MA). LC-MS columns 150 mm x 75 µm i.d. were purchased from New Objectives (Woburn, MA), reversed phase C 18 Magic bead size 5 µm, pore size 300 Å were purchased from Microm BioResource (Auburn, CA). ELISA kits for thymosin 4 (Thb4) and Alipoprotein E were from ALPCO Diagnostics. Vinculin monoclonal antibody was purchased from Chemicon 132

133 (Temecula, CA) Proteomic Analysis The workflow used for proteomic analysis of the samples was previously developed and qualified. 6 Briefly, plasma samples were depleted of abundant proteins, the depleted plasma was further fractionated into a bound fraction (glycoproteins) and an unbound (mostly non-glycosylated) by in-house prepared M-LAC column containing 3 immobilized lectins (concanavalin A, wheat germ agglutinin and jacalin). The samples were desalted/ and concentrated using a R1 reverse-phase trap,as described in Chapter 2. These steps were accomplished in an automated mode using a multi-dimensional HPLC workstation, equipped with automatic valve switching capabilities which can be directly controlled from the software. This set-up minimizes sample losses and increases sample throughput. Briefly, a total of 110 µl of plasma were spiked with 2 proteins (bovine fetuin) as internal standards and each plasma sample was diluted 5-fold with binding buffer (20 mm Tris-HCl, 0.5 M NaCl, ph 7.2, 1mM CaCl 2, 1mM MgCl 2, ). Sample injection was done manually; during sample loading, all columns were in-line. The flow through (depleted-plasma) from was directly captured by the M-LAC support. The retained abundant proteins were stripped (into waste) from the depletion column using low ph (0.1M glycine, ph 2.5). The depletion was immediately neutralized, regenerated and re- equilibrated with binding buffer. The M-LAC column was placed back in-line with the RP-trap column, and the unbound (mostly nonglycosylated) fraction was washed-off and trapped by the RP column. Proteins 133

134 were immediately eluted from the RP-trap using 90% acetonitrile. The bound glycoproteins were displaced from the M-LAC column using 100 mm acetic acid ph 3.5, captured on the RP-trap column and then eluted. The M-LAC affinity support was then neutralized, regenerated and re-equilibrated with binding buffer. To monitor the quality of column performance with regards to fractionation and reproducibility, we determined the protein concentration of the unbound and bound fractions using the Bradford total protein assay. After M-LAC fractionation, both fractions unbound and bound were subjected to trypsin digestion and analyzed the resulting peptides by nano-lc-ms/ms in the data-dependent mode. Each step in the workflow was monitored by quality control assays to assess the reproducibility of the sample preparation process and mass spectrometry analysis (MS) Trypsin Digestion. Fifty ul of the unbound and bound sample ware denatured using 6.0 M GuCl in 0.1 M ammonium bicarbonate, ph 8.0 (1:5 dilutions). Reduction was achieved by addition of 5mM TCEP and incubated at room temperature. Proteins were alkylated with 15mM iodoacetamide for 30 minutes at room temperature in the dark; the reaction was quenched by using 5mM DTT and incubated for 5 minutes. The sample was diluted with 100 mm ammonium bicarbonate, ph 8.0 to bring the guanidine-hcl concentration down to 1.5 M GuCl. Proteins were digested with trypsin at a 1:40 (w/w) ratio; samples were incubated overnight at 37 C. The digestion was terminated by the addition of formic acid to a final concentration of 1% Sample Desalting 134

135 Prior to LC-MS the unbound and bound HP-MLAC samples were desalted using reversed phase HPLC. The peptides were separated from the non or partially digested proteins using a POROS R1 reversed phase column. The mobile phase A was composed of 0.1% TFA in water, and mobile phase B was 0.1% TFA in HPLC grade acetonitrile. The digested proteins were loaded on the column at 2% solvent B and washed for 3 min to remove salts and other reagents from trypsin digestion. The bound peptides were eluted with a step gradient: 28% solvent B for 3 min, to collect peptide to be analyzed by nano-lc-ms/ms, and then 95% solvent B for 5 min, to elute larger peptides, partially digested and/or nondigested proteins. The separation was performed at 3.0 ml/min and monitored at both 280nm and 214 nm. Peptides eluted with 28 % B were concentrated down to 50 ul using vacuum concentration Peptide separation and sequencing by nano-lc-msms The nano-lc-ms/ms was performed using an Eksigent system (Dublin, CA) interfaced with an LTQ linear ion trap mass spectrometer (ThermoFisher Scientific). The composition of solvent A was 0.1 % (v/v) of formic acid in water and that of solvent B 0.1 % (v/v) of formic acid in HPLC grade acetonitrile. Concentrated peptide samples were injected using an autosampler onto a C18 capillary column (150 mm x 75 µm i.d.) packed in-house with Magic C18. The flow rate was 300 nl/min; the gradient was from 5 % solvent B to 40 % solvent B over 105 min, then from 40% solvent B to 60 % solvent B over 20 min, then from 60 % solvent B to 90 % solvent B over 5 min and held isocratic at 90 % solvent B for 15 min. The mass spectrometry method was the same as described previously. 135

136 Briefly, the electrospray conditions were: temperature of the ion transfer tube, 245 C; spray voltage, 2.0 kv; normalized collision energy, 35%. Data dependent MS/MS analysis was carried out using MS acquisition software (Xcalibur 2.0, ThermoFisher Scientific). Each MS full scan was acquired in a profile mode in the mass range between m/z 400 and 2000, followed by 7 MS/MS scans of the 7 most intense peaks. Dynamic exclusion was continued for duration of 2 min Database searching and data analysis Protein/peptide identifications were obtained through a database search against a human proteomic database using the Sequest Cluster search engine (ver. 3.0) and stored in CPAS. The database search was conducted against human protein databases Swiss-Prot (release 52 with protein sequences). The databases consist of normal and reversed protein sequences to facilitate estimation of the false positive rate. Trypsin was specified as the digestion enzyme with up to two missed cleavages and carboxyamidomethylation was designated as a fixed modification of cysteine. In order to minimize the level of false positive identifications, criteria that would yield an overall confidence of over 95% for peptide identification were established for filtering raw peptide identifications. This was achieved by filtering the Sequest results using the so called HUPO criteria DCn F 0.10, Xcorr F 1.9, 2.2, and 3.75 for singly, doubly, and triply charged ions, respectively, followed by validation using Peptide Prophet analysis with cutoff at 0.95 peptide probability to eliminate low confidence identifications. The identified protein list was sorted again to exclude any proteins that were identified with less than two peptides, these proteins were again filtered 136

137 by identifying the proteins that had a difference of two fold or more between the normal and the disease. The proteins that came under this category were selected for peak area quantitation. Peak area quantitation was performed manually using at least two peptides per protein, Figure 1 gives a schematic diagram for the data analysis. Protein Identification by SEQUEST using MS & MS/MS data Filtration by using Peptide probability and HUPO criteria Selection of proteins identified with 2 or more than 2 unique peptides Selection of proteins that has a difference of at least two fold between the normal & disease Quantification of peak area Verification of candidate marker by ELISA or Western blot Potential candidate marker Figure 1: A schematic diagram for data analysis Petidomic analysis Peptidomic analyses were performed using the previously developed platform by Zheng, et. al., (2006). Briefly, the filters were pre-rinsed with deionized water, 50 µl of plasma sample was diluted with 350 µl of deionized water and

138 µl of acetonitrile. Samples were then incubated at room temperature for 10 min. The samples were spun down at x g for 10 min to precipitate any undissolved material in the plasma. The diluted human plasma was transferred to a microcon centrifugal filter with molecular weight cut off of 10,000 Da and spun at 1500 x g for 60 min. Approximately 300 µl of filtrate was collected and stored at - 20, this process was repeated twice and the filtrates were combined and concentrated down to 50 µl using a SpeedVac concentrator. The collected filtrate was desalted using a reversed-phase C 18 column cartridge. The nano LC-MS/MS method conditions were the same as those used in the proteomic analysis ELISA and Western blot confirmation of identified candidate markers We measured plasma concentrations of Thymosin beta-4 (Thb4) and Apolipoprotein E (ApoE) with ELISA kits per the manufacturers. Briefly, 200 µl of calibrators, controls and samples were aliquated into the designated wells. Twenty-five µl of biotinylated antibody was aliquated into each well, then 25 µl of enzyme labeled antibody was added to each well. The 96 well plate was covered with aluminum foil and placed into a orbital shaker and rotated for 2 hours at room temperature. The fluid was aspirated from each well and each well was washed 5 times with washing solution. Then 150 µl of TMB substrate was added to each well and incubated for 30 minutes at room temperature. Then 100 µl of stopping solution was added and the plate was read at 450 nm SDS-PAGE-Western Blotting. Two microliters of human plasma from multiple sclerosis patients and healthy controls were loaded into each 4-12% gradient SDS-PAGE and 138

139 electrophoresed under reducing conditions. The gel was run for 2 hours at 120 V, then transferred to nitrocellulose membrane for 1 hr at 30V, blocked with 10% nonfat powered milk and probed with anti-vinculin mouse antibody at 0.1µg/mL (1:1000 dilution) for 1 hr. The blott was washed and followed by the addition of mouse anti-mouse IgG coupled with horseradish peroxidase (HRP) ( Jackson Immunolabs, West grove, PA) at a 1:1000 dilution and incubated for 1 hr at room temperature followed by washing the blott with PBS. SuperSignal West Dura Extanded Duration Substrate (Thermo Fisher Scientific, Waltham, MA) was added and the blott was developed for 3 minutes Results and Discussion Background Multiple sclerosis is an autoimmune disease of the central nervous system and the leading cause of neurological disorders in young adults. 12 Multiple sclerosis has a very heterogeneous clinical presentation; there are three different types of multiple sclerosis with different etiologies. 13 In this study, we studied plasma of multiple sclerosis patients rather than CSF for several reasons: plasma is easily obtained, is the most abundant source of proteins, and proteins resulting from pathological changes through the body may be released into the blood stream. 14 Recently, proteomics analysis of human plasma has led to identification of variety of candidate markers for different disease. However, plasma biomarker discovery is extremely complex due to the wide dynamic range of plasma proteins. 15 The typical plasma concentration range is estimated to be mg/ml, where 99% of it by mass is occupied by the 22 most abundant proteins

140 The remaining 1% of the plasma proteome consists of low abundant proteins, low molecular weight proteins, and peptides that could be shed into circulation due to in vivo proteoletic digestion. Several of these plasma-based proteolytically derived peptides are already commercially available as diagnostic markers such as, C- peptide (diabetes), beta-amyloid (Alzheimer s disease), gastrin (ulcer), and brain natriuretic peptide (cardiovascular diseases). 17 In this study, plasma was used instead of serum for proteomic and peptidomic analysis, to mimimize any ex vivo proteoylsis during serum preparation. In a previous study we have demonstrated that our protocol minimized ex vivo proteolysis. 6 A variety of studies have shown that autoimmune diseases are typically associated with inflammation; therefore, acute phase proteins could be of diagnostic values. Moreover, inflammation generally is associated with increased proteolytic activity. 18 Previously, we have shown that combination of proteomics and peptidomics analysis is an effective way to follow the complex mechanism of altered levels of proteins secretion, release and proteolysis in disease. Thus in this study we performed an integrated analysis of both proteome and peptidome of multiple sclerosis samples Proteomic Analysis In earlier studies we have shown that the dynamic range of plasma proteomec study is significantly increased by removal of top abundant proteins followed by fractionation of the glycoproteins using multi-lectin affinity chromatography (M-LAC). Subsequently the bound and the unbound fraction of 140

141 the M-LAC were subjected to trypsin digestion and analysis by nanolc-esi- MS/MS (proteomic workflow shown in figure 2). The bound fraction predomently contains glycoproteins with a high affinity for the 3 lectins used in M-LAC column, while unbound fraction contains either nonglycosylated proteins or glycoproteins with low affinity for the lectins. Proteomic analysis of the plasma sample resulted in the identification of approximately 150 proteins in the bound fraction and 140 proteins in the unbound fraction with at least two peptide identifications and high confidence level (xcorr of 1.9, 2.7 and 3.75 for +1, +2 and +3 charged peptides respectively, 75 % peptide prophet and FDR 5). The full data set is described in the supplemental information section. For preliminary screening we used spectral counts as a semiquantitative method with the following criteria for early comparison of the data ser; + for proteins identified with 2-5 peptides, ++ for proteins identified with 6-10 peptides, +++ for proteins identified with more than 10 peptides, and ++++ proteins identified with more than 100 peptides (see Table 1). We have found in previous studies 10, 11 that the functional categories of complement, lipid transport proteins (lipoproteins), protease and protease inhibitors and cytoskeletal associated proteins show changes related to disease progress and we have used the same classification in this study. In general these categories captured the majority of changes that were observed in this study. In the proteomic analysis our initial goal was to identify plasma proteins that were able to differentiate multiple sclerosis from controls. We, therefore, show in Table 1 an aggregated of proteomic studies on the three categories of multiple sclerosis (RRMS, SPMS, and PPMS as 141

142 described in the introduction). This study was performed in duplicates on a pool of 10 patients from each type of multiple sclerosis. In this comparison the most significant changes in the concentration was observed for the following proteins: apolipoprotein E, fibrinogen alpha and beta chain, vinculin, gelsolin, and thymosin beta-4. Plasma Removal of Abundant Proteins (6 Protein Depletion) QC Points Concentration/ Prot. Recovery Multi-Lectin Column Concentration/ Prot. Recovery Non-Bound Proteins Glycosylated Proteins Trypsin Digestion/sample clean-up UV trace of peptide clean up Nano LC-MS/MS (LTQ-FT) Data Dependent Acquisition Spectral Counts Figure 2. Proteomic worlflow used to measure changes in protein concentration of disease vs. controls. (Adapted from Plavina, T., 2007) Table 1. Interesting proteins identified in the plasma of multiple sclerosis and matched controls Spectral Counts Protein Group Protein Swiss Pro M-LAC Multiple Sclerosis a,b Contr Fraction Transporter 142

143 Apolipoprotein CII P02655 Bound + + Apolipoprotein E P02649 Bound + + Apolipoprotein L1 O14791 Bound + ND Retinol Binding Protein 4 P02753 Bound ++ + Complement Complement 8 P07357 Unbound + + Alpha chain Complement Factor I P05156 Unbound + ND Coagulation Prothombin P00734 Bound Fibrinogen P02671 Unbound +++ c +++ Alpha chain Fibrinogen Beta chain P02675 Unbound Protease/ Protease Inhibitors Inter-alpha trypsin P19827 Bound + + Inhibitor H1 Inter-alpha trypsin Inhibitor H4 Q14624 Bound + + Alpha-1- antitrypsin P01009 Unbound Kininogen P01042 Bound + + Cytoskeletal Vinculin P18206 Unbound + ND Zyxin Q15942 Unbound + ND Actin, aortic Smooth muscle P62736 Bound + ND Gelsolin P06396 Unbound ++ + a Compilation of biological replicates each with 2 analysis. b Combination of all three different types of multiple sclerosis ( primary progressive, relapsing remitting and secondary progressive) duplicate runs in each type, N=6. c Differences were observed in specific disease types (see later) + Identified with 1-5 spectral counts ++ Identified with 6-10 spectral counts +++ Identified with more than 10 spectral counts Peptidomic Analysis 143

144 In previous publications we described the development of an approach to isolate the peptidome fraction from plasma which minimized artifactual proteolysis as well as losses due to non specific binding of the peptide component to the ultrafiltration membrane as well as carrier proteins. The plasma sample was diluted with acetonitrile (20% v/v) and peptide component was isolated from plasma by using 10 kda molecular weight cut off ultrafiltration membrane, then desalted on a C 18 -column, and analyzed be nanolc-ms/ms (peptidomic workflow shown in fig 3). The elution profile of the 15 disease patients and controls is shown in Figure 4 a and the peak are measurements for the peptide peaks at 214 nm are shown in Fig 4 b. In our laboratory for proteomic studies we have adopted the approach of pooling of sample sets to monimise individual variability. As will be shown later in the discussion the peptidome showed significant variability between individuals and therefore we decided to perform duplicate analyses of the individual sample sets of 15 matched disease ( 5 of RRMS, 5 of SPMS, and 5 of PPMS) and 5 matched controls. A total of 121, 120, 190 and 298 peptides were identified for RRMS, SPMS, PPMS and healthy control, respectively. In most cases the proteins were identified by the presence of numerous peptides that originated from a larger parent peptide and differ by one to several amino acids residues to form the so called peptide ladders ; this name refers to progressive N- and C-terminal amino acid cleavages by exproteases. In total we observed 82 peptide ladders that were derived from 17 protein precursors and the details of the peptide constituents is given in the 144

145 supplementary data. In addition to peptide ladders, the peptidome analysis also showed differences in proteins belonging to the categories of complement, coagulation, lipid transport proteins (lipoproteins), protease and protease inhibitors and cytoskeletal associated proteins (see Table 2). Except, for cytoskeletal proteins all the other families showed reduced proteolysis compared for the control group and this will be discussed in detail in a later section. This data suggest that the activity of proteases and /or proteases inhibitors is dysregulated in multiple sclerosis. Altered proteolytic activity demonstrated by the presence of endogenous peptides for several proteins was seen in EAE model. 1 Figure 3. Peptidomic workflow Plasma Quality Control Check-Points Total protein concentration Break up Protein-Protein/Peptide Interactions Using 20% Acetonitrile Isolate Peptidome - Ultrafiltration (<10 kda) 145 Sample Clean-up Wide Pore C18 preparative column Peak area at 214 nm

146 Figure 3. Peptidomic workflow used to study the peptidome compartment of plasma proteome (adapted from Zheng, X.; 2006) Figure 4. Desalting of human plasma filtrate using C18 column Peptides Normal Relapsing Remitting 146

147 b Figure 4. Desalting of human plasma filtrate using C18 reversed phase column. Fig 4 a shows a the elution profiles for all individuals. A fast gradient 15 minutes was used to remove the salts and separate the peptides from the high molecular weight proteins. Fig. 4 b shows the peak area quantitation of the peptide peak for each individual. Table 2. Proteins whose fragments were identified to be present at different concentration in the plasma of multiple sclerosis and matched controls Spectral Counts Protein Group Protein Swiss Pro Primary Secondary Relapsing Con Progressive Progressive Remitting Transporter Apolipoprotein AI P ND Apolipoprotein E P ND ND

148 Apolipoprotein A P ND ND +++ IV Apolipoprotein A II P ND Complement Complement C IV Alpha chain P0C0L Complement C III P ND Coagulation Fibrinogen P02679 ND ND ND +++ Gamma chain Fibrinogen P Alpha chain Fibrinogen Beta chain P ND Protease/ Protease Inhibitors Inter-alpha trypsin P19823 ND ND ND ++ Inhibitor H2 Inter-alpha trypsin Q ND ND + Inhibitor H4 Kininogen P01042 ND ND ND + Cytoskeletal Thymosin beta 4 P ND Gelsolin P ND ND Tubulin alpha A1 Q71U36 + ND ND ND + Identified with 1-5 spectral counts ++ Identified with 6-10 spectral counts +++ Identified with more than 10 spectral counts ++++Identified with more than 100 spectral count In the order presented in Table 1 the five most interesting proteins which exhibit significant changes at the protein and peptide level will be discussed Apolipoprotein E As shown in Fig 5 both the proteome and peptidome analysis of apolipoprotein E (ApoE) clearly distinguished multiple sclerosis from controls but 148

149 ApoE does not show significant differenced between the different types (RRMS, SPMS, and PPMS). The proteome analysis indicated that ApoE was upregulated in the SPMS (7 peptides) group, compared to the healthy control (3 peptides); the number of peptides identified for PPMS (4 peptides) and RRMS (4 peptides) were similar to the control group. To confirm the proteomics results, we measured the concentration of ApoE by ELISA, as seen in fig 5 c the concentration of ApoE found in the primiary progressive is 3 fold upregulated compared to matched controls, 2 fold upregulated in relapsing remitting and 1.5 fold upregulated in the secondary progressive compared to matched controls, respectively. Contrary to the proteomic results, the ELISA data indicated an up-regulation of Apo-E in all disease types. The discrepancy between the proteomic analysis and the ELISA result is probably due to the low number of peptides detected for apo-e, where semi-quantitative comparisons using spectral counts tend to be less accurate. To achieve a more accurate quantitation, we measure the peak area of the extracted chromatogram for 2 different peptides of ApoE; as shown in Fig 5 a and b in agreement with the ELISA results, both peptides showed an up-regulation of ApoE in all diasease types. Peak area quantitation was also performed on the peptides of apolipoprotein E identified in the peptidome, as seen in the figure 5 c there is up to 10 fold decrease in the peptidomic analysis and up 2.5 fold decreased in one of the peptides identified in the peptidomic analysis of apolipoprotein E. In addition, apolipoprotein E has been shown to be significantly increased in plasma samples and cerebrospinal fluid (CSF) of patients with progressive M.Scle. and has been proposed as potential markers of axonal damage. 149

150 Furthermore, Apo E has been shown to be involved in neurodegenerative diseases; the most well know association is between Apo E ε4 and Alzheimer s disease. 20 Increased levels of ApoE has been observed in CFS of RRMS patients. 22 Figure 5 Apolipoprotein E (Transporter Protein) Proteome Analysis 150

151 Apoliprotein E K.VQAAVGTSAAPVPSDNH.- Apoliprotein E R.AATVGSLAGQPLQER.A Apoliprotein E Elisa Pea k Area of EIC Peak Area of EIC C once ntr ation ng/m L RRMS PPMS SPMS Con trols 0 RRMS P PMS SPMS Controls 0 RR MS P PMS S P MS C on tro ls Peptidome Analysis 15 Apoliprotein E Peptidome Apoliprotein E A.KVEQAVETEPEPELR.Q # sequencing events 10 5 Peak Area of EIC RRMS PPMS SPMS Controls d 0 RRM S P PM S S PM S Controls e Figure 5. An examination of the proteome analysis top and peptidome analysis bottom. a and b. show relative concentration of selected peptides observed in the bound fraction of M-LAC. X axes are the three different multiple sclerosis types and matched control, where y axes is the peak are for the selected peptides. Fig.5. c shows the relative concentration of Apolipoprotein E ELISA in the multiple sclerosis plasma samples vs. matched controls; Y axes shows the relative concentration ng/ml. Fig.5. d shows the relative concentration of apoliprotein E peptides found in the multiple sclerosis disease vs matched controlled; Y axes shows the relative amount as measured by MS sequencing events. Fig. 5. e shows the relative concentration of a selective peptide in the plasma samples; Y axes shows the peak area measured for the selected peptide Fibrinogen Another family of protein which the concentration significantly decreased in 151

152 peptidomic analysis was fibrinogen alpha and beta chain. The number of peptide ladders for fibrinogen-alpha-chain and beta chain are shown in figure 6 a and b. As seen in the figure there is more peptides ladder found in the healthy controls than is the disease samples for both proteins. To further confirm our findings we did peak area quantitation. As seen figure 6 c peptide Y.HTEKLVTSKGDKEL.R was only observed in the control samples and not in the disease patients. Fig 6 d and e shows the peak area quantitation of two peptides that were identified in the normal samples and disease samples, one of the peptides has downregulated by 400 folds in all the disease types and the second peptide by 30 folds in the disease samples. Similar results has been previously reported by other studies like psoriasis 10 rheumatoid arthritis 11,and thyroid cancer patients. These similarities may suggest that an alteration in proteolysis of fibrinogen reflects common inflammatory mechanisms. Figure 6 Peptidomic analysis of Fibrinogen 152

153 Peptides Identifies for Fibrinogen-beta-chain Peptides Identifies for Fibrinogen-alpha-chain # sequencing events # sequencing events PPMS RRM S SPMS Controls a 0 PPMS RRMS SPMS Controls b Normal Y.HTEKLVTSKGDKEL.R Primary Progressive Relapsing Remitting Secondary Progressive c 153

154 Fibrinogen alpha chain Y.HTEKLVTSKGDKEL.R Fibrinogen alpha chain A.DSGEGDFLAEGGGVR.G Peak Area of EIC Peak Area of EIC RRMS PPMS SPMS Controls d 0 RRMS PPMS SPMS Controls e Fig.6 a and b Shows relative concentration of the peptides of fibrinogen alpha chain and beta chain in the disease samples vs matched controls. X axes are the three different multiple sclerosis types and matched control, where y axes is sequence hit for total peptides identified. Fig. 6 c shows the peak area of one of the peptides only identified in the normals and not identified in the disease patients. Fig 6 d and e gives the relative concentration of selected peptides in the plasma samples. X axes are the three different multiple sclerosis types and matched control, where y axes is the peak are for the selected peptides Cytoskeletal Proteins Cytoskeletal protein family was another category of proteins which we listed in Table 1 that showed changes in both the proteome and peptidome for multiple sclerosis samples. Table 1 shows the spectral counts for both proteomic fractions from the M-LAC column (bound and unbound). As seen in this table the number of spectral counts for all the cytoskeletal proteins identified with high confidence (vinculin, zyxin, actin, and gelsolin) are upregulated in the disease samples. In addition, Table 2 shows the peptidome of gelsolin and thymosin beta- 4. Thymosine beta-4 was not identified in the proteomic analysis. This family plays an important role in the cell-cell and cell-matrix adhesion, cell proliferation and migration. The source of these proteins in plasma is unknown but it could be possibly caused due to shedding of the protins from damaged cells or platelets. 154

155 Vinculin The plasma levels for vinculin were higher in multiple sclerosis relative to the controls (Fig 7) as measured in the unbound fraction from the M-LAC column. Vinculin is a non glycosylated protein; therefore it was not detected in the bound fraction. In addition the levels of vinculin were higher in PPMS and SPMS compared to RRMS. We confirmed the mass spectrometry results by Western blotting using a commercial antibody and also demonstrated that individual patients (9 RRMS, 9 SPMS, 9 PPMS, and 9 matched controls) were consistent with the proteomic result (Fig 7b). Figure 7. Vinculin Proteome Analysis Vinculin Unbound Proteome Control Primary Progressive Secondary Progressive Relapsing Remitting # sequencing attempts RRM S P PM S S PM S Controls Control Primary Secondary Relapsing Progressive Progressive Remitting Control Primary Secondary Relapsing Progressive Progressive Remitting a Fig.7 a shows the relative concentration of Vinculin identified in the unbound M-LAC fraction in the pools of each type of multiple sclerosis and matched controls. X axes presents the three different disease types and the controls and Y axes presents the sequence attempts identified for the protein in each pool. Fig. 7 b shows the Western blott performed in 36 individual patients to further confirm the observations from the mass spectrometry data. b Gelsolin 155

156 In this study, we found gelsolin to be upregulated in both proteomic and peptidomic analysis (as seen in Fig. 8 a and b). Gelsolin a non glycosylated protein was essentially found in the unbound fraction and we observed about 2 fold upregulation in all three types of multiple sclerosis compared to matched controls. In the case of the peptidomic analysis (Fig. 8 b), there was a significant increase in the number of peptides identified in the PPMS and SPMS compared to RRMS. Gelsolin is an actin-binding protein; the exact role and function of gelsolin is not clearly defined and is critical for the central nervous system (CNS) myelination; the overexpression of the gelsolin may lead to actin depolymerization. 2 Figure 8. Proteomics and peptidomic analysis of Gelsolin # sequencing attempts Gelsolin Unbound Proteome RRMS RRMS PPMS PPMS SPMS SPMS Controls Controls # sequencing attempts Gelsolin Peptidome Gelsolin K.MDYPKQTQVSVLPEGGETPLF.K RRMS PPMS SPMS Controls Peak Area of EIC a b c RRMS PPMS SPMS Controls Figure 8 a and b Shows relative concentration of the peptide of gelsolin in the disease samples vs matched controls in the unbound M-LAC and peptidomic fractions. X axes are the three different multiple sclerosis types and matched control, where y axes is sequence hit for total peptides identified. Fig. 8 c shows the relative concentration of selected peptide in the plasma samples. X axes are the three different multiple sclerosis types and matched control, where y axes is the peak are for the selected peptides Thymosin Beta-4 156

157 Interestingly, one of the proteins which belong to this family thymosin beta-4 was identified only in the peptidomic analysis. We did not observe thymosin beta-4 in the proteomic analysis which is presumable due to the low plasma levels of this protein as measured by ELISA. However the peptidome of thymosin beta-4 showed an increase in the concentration of the peptides found in the multiple sclerosis patients compared to matched control. The increased level in the peptidome was confirmed by peak area measurement of the two most abundant peptides (Fig 9 c and d). As seen in Fig 9 a the peptidome of thymosin beta-4 is not detected in the matched control samples. To investigate wherever the increase in the peptidome was matched by an increase in the plasma levels of the precursor protein, we used an ELISA to measure thymosin beta-4 in all 30 disease patients (10 RRMS, 10PSPMS, and 10 PPMS) and the 10 matched controls. The thymosin beta-4 concentration as measured by ELISA was increased in all disease patients by 10 folds (Fig 9 b). This increase is comparable to our peptidomic data, where thymosin beta-4 was not observed in the control samples. At this stage it is not clear what is the mechanism for the increase in the disease related peptidome, but it could be due to a disease associated proteases or the binding of components of the peptidome to carrier proteins such as albumin or fibrinogen. 3, 4 157

158 Figure 9. # sequencing attempts RRMS Thymosin beta-4 Peptidome PPMS SPMS Controls a Concentration ug/ ml Thymosine Be ta-4 ELISA Individuals RRMS PPMS SPMS Controls b Thymosin beta-4 E.TQEKNPLPSKETIEQEKQAGES Thymosin beta-4 K.TETQEKNPLPSKETIEQEKQAGES Peak Area of EIC Peak Area of EIC RRMS PPMS SPMS Controls c 0 RRMS PPMS SPMS Controls d Fig. 9. a Shows relative concentration of the peptides of thymosin beta -4 identified in the peptidomic analysis in the disease samples vs matched controls. X axes are the three different multiple sclerosis types and matched control, where y axes is sequence hit for total peptides identified. Fig. 9 b shows the relative concentration of Thymosin beta -4 ELISA in the multiple sclerosis plasma samples vs. matched controlled. Fig. 9 c and d shows the relative concentration of selected peptides in the plasma samples. X axes are the three different multiple sclerosis types and matched control, where y axes is the peak are for the selected peptides. 3.4 Conclusion 158

159 Multiple sclerosis is the leading non-traumatic cause of neurological disorders in young adults. Therefore, there is a need for good biomarkers not only to detected, but as well as to differentiate between three different types of multiple sclerosis. Currently, there are known potential biomarkers for multiple sclerosis including apolipoprotein E, complement C3, complement C4, amyloid A protein, actin, tobulin, vinculin, gelsolin of which apolipoprotein E, vinculin and gelsolin were observed in this study. 5 As was described previously, a problem with plasma proteomic studies of the autoimmune diseases which are often associated by inflammation and dysregulation of protease and protease inhibitors is that the observation of changes in protein synthesis and/or secretion can be obscured by altered levels of proteolysis. 5, 6 Therefore, a combined proteomic and peptidomic analysis, as described here is of a value study of such disease. Our study showed an increase in apolipoprotein E protein which is consistent with previous reports. 7 Interestingly, the peptidome data for gelsolin showed a distinct difference from relapsing remitting and the other two types of disease (secondary progressive and primary progressive), while the intact protein analysis of M-LAC unbound of vinculin showed decreased levels of relapsing remitting compare to the other two types of multiple sclerosis. The mechanistic difference may be related to the secretion of specific proteases in the progression of multiple sclerosis as was suggested previously. This study demonstrated that proteomic and peptidomic methods can be useful identifying candidate biomarkers in multiple sclerosis plasma samples. A combination of these methods gives a more detailed and comprehensive analysis and understanding of human plasma. Table 3. Summary of the Analysis 159

160 Proteins Proteome Peptidome Apolipoprotein E Fibrinogen - a Gelsolin Vinculin ND Thymosin-Beta-4 ND b a No changes observed in the proteomic analysis b ND not detected in proteomic analysis 3.5 Reference: 1. Harris, V. K.; Sadiq, S. A., Disease biomarkers in multiple sclerosis: potential for use in therapeutic decision making. Mol Diagn Ther 2009, 13, (4),

161 2. Gonen, O.; Moriarty, D. M.; Li, B. S.; Babb, J. S.; He, J.; Listerud, J.; Jacobs, D.; Markowitz, C. E.; Grossman, R. I., Relapsing-remitting multiple sclerosis and whole-brain N-acetylaspartate measurement: evidence for different clinical cohorts initial observations. Radiology 2002, 225, (1), Kantarci, O.; Wingerchuk, D., Epidemiology and natural history of multiple sclerosis: new insights. Curr Opin Neurol 2006, 19, (3), Hueber, W.; Tomooka, B. H.; Batliwalla, F.; Li, W.; Monach, P. A.; Tibshirani, R. J.; Van Vollenhoven, R. F.; Lampa, J.; Saito, K.; Tanaka, Y.; Genovese, M. C.; Klareskog, L.; Gregersen, P. K.; Robinson, W. H., Blood autoantibody and cytokine profiles predict response to anti-tumor necrosis factor therapy in rheumatoid arthritis. Arthritis Res Ther 2009, 11, (3), R Giovannoni, G.; Green, A. J.; Thompson, E. J., Are there any body fluid markers of brain atrophy in multiple sclerosis? Mult Scler 1998, 4, (3), Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6, (2), Zheng, X.; Baker, H.; Hancock, W. S., Analysis of the low molecular weight serum peptidome using ultrafiltration and a hybrid ion trap-fourier transform mass spectrometer. J Chromatogr A 2006, 1120, (1-2), Kullolli, M.; Hancock, W. S.; Hincapie, M., Automated platform for fractionation of human plasma glycoproteome in clinical proteomics. Anal Chem 82, (1), Kullolli, M.; Hancock, W. S.; Hincapie, M., Preparation of a highperformance multi-lectin affinity chromatography (HP-M-LAC) adsorbent for the analysis of human plasma glycoproteins. J Sep Sci 2008, 31, (14), Plavina, T.; Hincapie, M.; Wakshull, E.; Subramanyam, M.; Hancock, W. S., Increased plasma concentrations of cytoskeletal and Ca2+-binding proteins and their peptides in psoriasis patients. Clin Chem 2008, 54, (11), Zheng, X.; Wu, S. L.; Hincapie, M.; Hancock, W. S., Study of the human plasma proteome of rheumatoid arthritis. J Chromatogr A 2009, 1216, (16), Boiko, A.; Vorobeychik, G.; Paty, D.; Devonshire, V.; Sadovnick, D., Early onset multiple sclerosis: a longitudinal study. Neurology 2002, 59, (7), Sadiq, S. A., Multiple Sclerosis. 11th ed.; Philadelphia, 2005; p

162 14. Rithidech, K. N.; Honikel, L.; Milazzo, M.; Madigan, D.; Troxell, R.; Krupp, L. B., Protein expression profiles in pediatric multiple sclerosis: potential biomarkers. Mult Scler 2009, 15, (4), Anderson, N. L.; Anderson, N. G., The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 2002, 1, (11), Adkins, J. N.; Varnum, S. M.; Auberry, K. J.; Moore, R. J.; Angell, N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. G., Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry. Mol Cell Proteomics 2002, 1, (12), Harper, R. G.; Workman, S. R.; Schuetzner, S.; Timperman, A. T.; Sutton, J. N., Low-molecular-weight human serum proteome using ultrafiltration, isoelectric focusing, and mass spectrometry. Electrophoresis 2004, 25, (9), Hilliquin, P., Biological markers in inflammatory rheumatic diseases. Cell Mol Biol (Noisy-le-grand) 1995, 41, (8), Jacobs, A. D. S. a. D. H., MS and cognition and ApoE: The ongoing conumdrum about biomarkers. Neurology 2008, 70, Pinholt, M.; Frederiksen, J. L.; Andersen, P. S.; Christiansen, M., Apo E in multiple sclerosis and optic neuritis: the apo E-epsilon4 allele is associated with progression of multiple sclerosis. Mult Scler 2005, 11, (5), Guerrero, A. L.; Laherran, E.; Gutierrez, F.; Martin-Polo, J.; Iglesias, F.; Alcazar, C.; Peralta, J.; Rostami, P., Apolipoprotein E genotype does not associate with disease severity measured by Multiple Sclerosis Severity Score. Acta Neurol Scand 2008, 117, (1), Chiasserini, D.; Di Filippo, M.; Candeliere, A.; Susta, F.; Orvietani, P. L.; Calabresi, P.; Binaglia, L.; Sarchielli, P., CSF proteome analysis in multiple sclerosis patients by two-dimensional electrophoresis. Eur J Neurol 2008, 15, (9),

163 Chapter 4 Analysis of Plasma Proteome for Obese Subjects and Obese Type 2 Diabetes Subjects After Gastric Bypass Surgery This study was funded by Johnson and Johnson and is prepared as manuscript for publication. Abstract As part of our collaboration with Johnson and Johnson, we conducted an exploratory study on defined clinical samples of obese women patients and obese diabetic women patients undergoing gastric bypass surgery to identify candidate proteins that will show abundance changes compared to lean controls. For the discovery phase we used pool samples (analytical replicates and biological 163

164 replicates) of each sample as well as 43 individual patients. The proteomic platform surveyed the plasma proteome to evaluate changes in the levels of proteins. In this method we used, removal of abundant plasma proteins (top 14 most abundant proteins) and multi-lectin affinity chromatography enabling enrichment of plasma glycoproteins. This study led in the identification of attractin as a candidate marker for type 2 diabetes Introduction Obesity refers to the excess amount of body fat in and is defined by a body mass index (BMI) of over 30. BMI is a tool to measure the body weight of the person based in the person s size. Obesity has been shown to be associated with numerous diseases including Type II diabetes, high blood pressure, high cholesterol, coronary heart disease, stroke, and sleep apnea. 1-4 There are three major types of diabetes; gestational diabetes, type I, and type II diabetes. Gestational diabetes is developed in the late stages of the pregnancy and often 164

165 disappears after the birth of the baby, however, the women who have had gestational diabetes have a 40-60% chance of developing type II diabetes later on in life. Type I diabetes is an autoimmune disease, where the body attacks the beta cells that produce the insulin hormone. Type II diabetes is the most common type of diabetes,90-95 % of the people that have diabetes have type II diabetes, also known as insulin resistant diabetes (insulin is the hormone that delivers the glucose to the cells of the body). 5 A high percentage of patients (55%) diagnosed with type II diabetes are obese and studies show that increase in type II diabetes is directly proportional with increase in obesity. 6-8 This is as result of fat cells being more resistant to insulin than muscle cells if there are more fat cells in the body than muscle cells, insulin becomes less effective, thus glucose remains circulating in the body instead of being taken in by the cells. 9 The World Diabetes Foundation has reported that approximately 285 million people worldwide, almost 7% of the population of the world, have diabetes or diabetic complications. Unless addressed, the disease will increase up to 430 million people by Current medical therapies are effective in treating obesity but lack long-term durability. 11 Bariatric surgery has been demonstrated to be the only effective long term treatment for obesity. 12 There are different types of bariatric surgery; the most common one is Roux-En-Y Gastric Bypass (RYGB) procedure. This procedure is associated with rapid weight loss and improvement in medical conditions associated to obesity, such as Type 2 diabetes. The most remarkable result associated with gastric bypass surgery is the reversibility of type 2 diabetes, immediately following the surgery. For this reason it has been a medical challenge 165

166 to understand the immediate changes due to this surgery. Recent studies have shown that hormonal and metabolic changes caused by gastric bypass surgery could be responsible for the reversal of Type 2 diabetes and not only the weight loss. 13 However, not all obese diabetic patients who undergo gastric bypass surgery will respond to the surgery (non-responders), but all the obese diabetic patients will loose up to 60-80% of their body weight In this work, proteomic technology was applied to human plasma to profile the plasma proteome of the individuals who do not respond to gastric bypass surgery with the hope of identifying candidate markers. Currently, there is no technology that can be used to distinguish between the obese diabetic patients that will successfully respond to the surgery and the non-responders. Nevertheless, undergoing gastric bypass surgery is not without complication, a variety of complications such as including electrolyte abnormalities, nutrient deficiencies, kidney stones, and osteoporosis; therefore there is a need for candidate biomarkers to understand the outcome of the surgery prior to undergoing it. 13 A second goal of this study was to compare the obese patients with and without diabetes. In an attempt to identified candidate markers that could be used to determine patients at risk to develop diabetes later on in life. Protein biomarker discovery, using proteomic methods has been traditionally performed with a small number of samples to profile as many proteins as possible. Pooling has many advantages including reducing the biological variability between individuals, pooling is attractive when there is limited material for analysis, reduces the number of samples analyzed for discovery of candidate 166

167 markers, therefore reducing the cost of the analysis. 17 However, the main concern with pooling is the dilution of low abundant circulating proteins The proteome analysis was performed using the platform already discussed in Chapter 3 with minor changes. Briefly, we applied the combination of fourteen protein depletion and multi lectin affinity chromatography (M-LAC) as an enrichment and fractionation method. The bound and the unbound fractions of M- LAC were subjected to trypsin digestion and nanolc-ms/ms analysis Experimental Materials All chemicals and reagents were purchased from Sigma Aldrich (St. Louis, MO) and were of the highest quality available. PEEK columns were purchased from Isolation Technology (Milford, MA). Bradford protein assay kit, trifluoroacetic acid (TFA), formic acid, glacial acetic acid, and HPLC grade acetonitrile, HPLC grade water, and trypsin were purchased from Thermo Fisher Scientific (Waltham, MA). LC-MS columns 150 mm x 75 µm i.d. were purchased from New Objectives (Woburn, MA), reversed phase C18 Magic bead size 5 µm, pore size 300 Å were 167

168 purchased from Microm BioResource (Auburn, CA). The following items required 1D SDS-PAGE were purchased from Invitrogen (Carlsbad, CA); NuPAGE MES SDS running buffer, 4-12% Bis-Tris Gel, NuPAGE LDS 4x sample buffer Study Design and Plasma Samples Human plasma samples were received from J&J in dry ice. This study was a longitudinal study as result plasma was withdrawn 2-4 weeks before the surgery (PreSurgery), days 7,10 and 12 following the surgery (Surgery), and 3 month after the surgery (PostSurgery). This study consisted of thirteen Caucasian women; 8 diabetic where 5 patients responded to the surgery and 3 did not respond; 5 euglycemic and 5 leans. The subjects were between years of age; with BMI score of All patients underwent gastric bypass surgery. We received 250 µl of each individual, for three different time points (presurgery, surgery and postsurgery) and the controls (leans). The strategy we used in this study was to first pool the sample from each of the seven of the groups, in order to optimize our platform prior to analyzing the individuals for biomarker discovery. Each group consisted of five individuals and a plasma sample pool was made for each of the seven groups by combining 100µL of the five individuals within a group. Pools were aliquated in 100 µl aliquates and freezed at -75 C for further uses. T he pool samples were separated into three sets, each set consisting of one of each 100µL pool sample (7 pools per set). The sample processing for each set was performed at the same time HP-M-LAC Platform Evaluation Using the Pool Samples The evaluation of the platform was done by processing two different sets of pools one month apart from each other. The pools were randomized prior to 168

169 processing to minimize any variability or biases, each sample consisted of 100 µl. The platform used in this study consisted of minor changes from the platform described in Chapter 3. Briefly, the platform consists of sequential, multidimensional HPLC fractionation of plasma. A total of 100 µl of plasma from each sample was diluted 4-fold with binding buffer (20 mm Tris-HCl, 0.5 M NaCl, ph 7.2, 1mM CaCl 2, 1mM MgCl 2, ); each sample was first depleted of the fourteen most abundant plasma proteins using. The depleted plasma was further fractionated into a flow through (unbound) and bound fraction using the M-LAC column and each fraction was then desalted/concentrated with a polymeric R1 reversed-phase column (RP-trap). The volumes of the unbound and bound HP-M- LAC fractions eluted from the RP-trap were reduced to 100 ul using speed vacuum centrifugation prior to Bradford assay and trypsin digestion. This platform was used to analyze the pool samples as well as the individual samples Trypsin Digestion of the Pools and Individual Samples Method 1. Fifty µl of the unbound and bound sample ware denatured using 6.0 M GnCl in 0.1 M ammonium bicarbonate, ph 8.0 (1:5 dilutions). Reduction was achieved by addition of 5mM TCEP and incubated at room temperature. Proteins were alkylated with 15mM iodoacetamide for 30 minutes at room temperature in the dark; the reaction was quenched by using 5mM DTT and incubated for 5 minutes. The sample was diluted with 100 mm ammonium bicarbonate, ph 8.0 to bring the guanidine-hcl concentration down to 0.9 M GuCl. Proteins were digested with trypsin at a 1:40 (w/w) ratio; samples were incubated overnight at 37 C. The digestion was terminated by the addition of formic acid to 169

170 a final concentration of 1%. Method 2. Fifty µl of the unbound and bound sample ware denatured using 6.0 M GnCl in 0.1 M ammonium bicarbonate, ph 8.0 (1:5 dilutions). Reduction was achieved by addition of 5mM TCEP and incubated at room temperature. Proteins were alkylated with 15mM iodoacetamide for 30 minutes at room temperature in the dark; the reaction was quenched by using 5mM DTT and incubated for 5 minutes. The guanidine chloride was removed from the sample using a R1 reversed-phase column. The mobile phase A was composed of 0.1% TFA in water, and mobile phase B was 0.1% TFA in HPLC grade acetonitrile. The proteins were loaded on the column at 2% solvent B for 3 min at 4ml/min to remove the guanidine and the other reagents. The proteins were eluted with a step gradient 70% B for 3 min at 4ml/min and monitored at 214 nm and 280 nm. The eluted sample was concentrated down to 50 µl using seepd vacumm concentrator. The sample was diluted 1:5 with 100 mm ammonium bicarbonate at ph 8.3 to adjust the ph prior to adding the trypsin. Proteins were digested with trypsin at a 1:40 (w/w) ratio; samples were incubated overnight at 37 C. The digestion was terminated by the addition of formic acid to a final concentration of 1% Evaluation of Trypsin Digestion by HPLC Prior to analyzing the samples using LC-MS, the unbound and bound HP- MLAC samples were desalted using reversed-phase HPLC. The peptides were separated from the non or partially digested proteins using a POROS R1 reversed- 170

171 phase column as describes previously (Chapter 3). Peak area quantitation of the peptide peak at 214 nm was performed to evaluate trypsin digestion efficiency Nano-LC-MSMS Method and Bioinformatics The LC-MS/MS method and bioinformatics analysis were the same as the methods described in Chapter Results and Discussion Platform Reproducibility The Capture Select fourteen depletion affinity removal system, was used to remove the fourteen most abundant proteins in human plasma (IgG, IgA, IgM, transferrin, fibrinogen (alpha, beta and gamma chains), apolipoprotein A-I, apolipoprotein A-II, haptoglobin, α1-anti-trypsin, orosomucoid (α-1 acid glycoprotein) and α-2 macroglobulin) efficiently. After the depletion of high abundance proteins, the flow through of the depletion column, consisting of the low and medium abundance proteins was loaded directly to (online) the M-LAC column for further fractionation and enrichment of glycoproteins. Figure 1 gives an example of the typical chromatogram that was obtained during the depletion and fractionation/enrichment of the plasma. A very good chromatographic consistency was observed through all the runs. The protein concentration of the three fractions was measured using the Bradford total protein assay; Table 1 summarizes the results obtained from the platform. From the amount of total plasma loaded on to the platform, about 6 % came on the flow though of the M-LAC column and approximately 10 % bound to the M-LAC column. As seen in Table 1, the recoveries from the platform are very good and consistent for both sets of pools 171

172 run one month apart from each other. We further analyzed these samples by SDS-PAGE to demonstrate the consistency of the columns (depletion and M-LAC). The samples were normalized based on volume and the same volume was loaded on to the SDS-PAGE gel. As seen in Figure 2 a, regardless of the sample type, the behavior of the depletion column is consistent for all the pools. Furthermore, differences were observed in the unbound and bound fraction of the M-LAC (Fig 2 b and c). Fig 2 b, shows the bound M-LAC fraction, it was observed that there is less protein for diabetic pool postsurgery lane 3 (D-Pool-(+)) compare to the other pools. This difference was also observed for the same sample in the unbound fraction as well (lane 3 Fig 2c) and is consistent with the recoveries from the HP-M- LAC (Table 1). Bound M-LAC (Peak 3) Unbound M-LAC (peak 1) Bound High-abundant Proteins (Peak 2) 172

173 Figure 1: The typical chromatogram obtain during the HP-M-LAC platfor Table 1 :Results of the HP-M-LAC column for both sets of pools run one month apart Sample Name Sample Labeling % Protein Bound to Depletion Column Peak 2 Pool-Diabetic % Bound % Unbound M-LAC M-LAC Peak 3 Peak 1 % Total Pool-D-(-) 73.3 ± ± ± ± 4.0 Pool-D-(S) 67.7 ± ± ± ± 0.7 Pool-D-(+) 93.9 ± ± ± ± 9.0 Pool-ND-(-) 76.2 ± ± ± ± 9.5 Pool-ND-(S) 82.8 ± ± ± ± 10.1 Pool-ND-(+) 94.9 ± ± ± ± 8.4 Control-Pool 93.3 ± ± ± ± 1.4 PreSurgery Pool-Diabetic Surgery Pool-Diabetic Post-Surgery Pool-Non Diabetic PreSurgery Pool-Non Diabetic Surgery Pool-Non Diabetic PostSurgery Leans ND Pool-(+) Pool D Pool-(-) Control ND Pool-(-) D Pool-(+) D Pool-(S) ND 260kDa 160kDa 110kDa 80kDa 60kDa 50kDa 40kDa 60kDa 30kDa 50kDa 40kDa 20kDa 30kDa 15kDa ND Pool-(-) Pool D Control ND Pool-(-) D Pool-(+) D Pool-(S) ND Pool-(S) Pool-(+) a 10kDa 260kDa 160kDa Pool-(S) ND Pool-(+) D Pool-(-) Pool Control ND Pool-(-) D Pool-(+) D Pool-(S) ND 260kDa 160kDa 110kDa 80kDa Pool-(S) Figure 2. SDS-PAGE of the pool samples kDa b

174 Figure 2. SDS-PAGE of all three fractions from the HP-M-LAC. Fig 2 a shows the bound fraction of the depletion column, b shows the bound fraction of the M-LAC column and c the unbound of the M-LAC column. To assess the reproducibility of the mass spectrometry, we plotted the spectral counts of one of the pool samples run in duplicates on the mass spectrometer for each of the sets (Fig 3 a and b). These results are shown in figure 3 a and b, the correlation coefficient is very good R 2 = for the pool of the first set and R 2 = for the pool of the second set. In Fig 3 c, we illustrate the robustness of this platform as indicated by the reproducibility graphs obtained after running the same sample one month apart, good correlation coefficient (r 2 = ) was observed. Figure 3 Reproducibility study MS Reproducibility Between Duplicate Runs Set 1 MS Reproducibility Between Duplicates Set 2 t id e ID p e r P r o t e in R y = x R 2 = p t id e ID p e r P r o t e in R u n y = x R 2 =

175 Figure 3. Study reproducibility. Fig 3 a shows the mass spectrometer reproducibility between duplicates for set 1. X-axe shows the peptide identifications (spectral counts) for run-1, and Y-axe shows the peptide identification for the second run. Fig 3 b the mass spectrometer reproducibility for set 2. Fig 3 c shows the platform reproducibility plotted by plotting in X-axes the spectral counts of on of the pools of set 1 versus the same pool on set 2 (y-axes) Trypsin Digestion Plasma is a very complex biofluid and protein based biomarker discovery relies on the quantitation of the peptides identified per protein. A key assumption of this approach is that the abundance of the peptide represents the abundance of the protein. Thus, any variation in the digestion efficiency will affect the analysis. In this study, for digestion of the pools we first used a commonly accepted protocol for trypsin digestion of plasma, method 1 (see method section). After tryptic digestion, separation of the peptides from any undigested or partially digested proteins was performed; both 214 nm and 280 nm were monitored during the desalting process (Figure 4). Peak area quantitation of the peptide fraction was performed in both unbound and bound M-LAC fractions (Fig 5 a and b). As seen in Figure 5 a the digestion of the bound M-LAC fraction is very consistent and reproducible between all pools, for both sets, with an average approximately of 70% trypsin digestion efficiency. However, not the same behavior was observed 175

176 with the unbound M-LAC fraction. The unbound fraction showed a big variability between pools, and in addition, very low trypsin digestion efficiency, with an average recovery of approximately 50%. This inconsistency was reproducible within sets. This data suggested that the protein composition for the unbound fraction was different from the bound fraction; therefore, the commonly used trypsin digestion (method 1) could not be applied to the unbound fraction of this study. Thus, a major part of this study was devoted to develop a new trypsin digestion method that will efficiently and reproducibly digest both fractions (unbound and bound M-LAC). Figure 4: The typical chromatogram obtain during desalting process Salts Peptides 214 nm Undigested/Partially Digested proteins 280 nm Figure 4: The typical chromatogram obtain during the desalting of samples prior to LC- MS/MS analysis. Figure 5 Trypsin digestion of the bound and unbound M-LAC fraction for the pool samples Figure 5. Trypsin Digestion for the Pool Samples. Fig 5 a shows the digestion reproducibility for the bound M-LAC fraction a and the unbound fraction is shown in Fig 5 b b 176

177 Optimization of Trypsin Digestion To optimize the trypsin digestion protocol, different digestion procedures were used, and included different denaturing and reducing agent and different temperature conditions. Table 2 shows a summary of digestion protocols tested. The efficiency of trypsin was evaluated by peak area quantitation of the 214 nm peptide peak shown in the last column of Table 2. Table 2. Summary of Digestion Protocols Denaturing Agents Reducing Agent 6 M GnHCl 5 mm DTT 6 M GnHCl 5 mm DTT 6 M GnHCl 5 mm TCEP 8 M Urea 5 mm TCEP 8 M Urea 5 mm TCEP Thio Urea 5 mm TCEP Reducing Time min Temperature C Alkylating Agent mm IAA Alkylation s Time min Dilution 100 mm ABC Trypsin Ratio % Peptide Peak Area (214 nm) :7 1: :7 1: RT :7 1: RT :9 1: RT :9 1: RT :9 1: In this scheme, the effect of the temperature, reducing agent and the reduction time did not have a direct impact in the digestion efficiency. As seen in Table 2, the denaturing agent played a significant role in the digestion efficiency, under the first three conditions the digestion efficiency is about 70 % compare to 177

178 approximately 55 % when urea was used as a denaturing agents. Guanidine hydrochloride (GnHCl) is one of the strongest denaturing agents, however, the concentration of GnHCl (6M) required for protein denaturing will reduce the activity of trypsin. Thus, the concentration of GnHCl was reduced down to 0.9M, prior to adding trypsin to the sample, as suggested by Promega. However, the digestion efficiency for the unbound M-LAC fraction using these conditions was observed to be inconsistent as previously described. Due to this inconsistency a different method was employed to achieve complete removal of the denaturing agent in the sample, prior to adding the trypsin to the sample. As described in digestion protocol 2 (see methods) a complete desalting was performed by using a R1 reversed-phase column with a step gradient. The protein recoveries from the R1 reversed-phase trap column were very high and reproducible. The improved digestion protocol was applied to the individual samples for the biomarker discovery. As seen in Figure 6, the digestion efficiency of the bound and the unbound fraction is similar with an average higher than 75% efficiency. Comparing the unbound fraction of the pool sample (Fig 4 b) to the unbound M- LAC fraction of the individuals (Fig 6 b) not only the efficiency of the digestion was improved but also the consistency of the digestion between samples. Figure 6. Trypsin digestion of the individual patients a 178 b

179 Figure 6. Trypsin digestion of the individual patients. Fig 6 a shows the bound M-LAC fraction for all patients, and fig 6 b shows the unbound fraction Proteomic Analysis Proteomic Analysis for the Pools We have shown that the combination of depletion of abundant protein followed by fractionation of glycoproteins using multi-lectin affinity chromatography 18, 19 (M-LAC) will increase the dynamic range of plasma proteome. In this study, a combination of 14 protein depletion column followed by fractionation/enrichment with M-LAC was used. Subsequently the bound and the unbound fraction of the M-LAC were subjected to trypsin digestion and analysis by nanolc-esi-ms/ms..protein analysis of the plasma proteome resulted in identification of 228 proteins; spectral counting quantitative approach was used for the analysis of the pool and individuals. These identified proteins were validated using the following criteria, 75% peptide probability, X corr of 1.9, 2.7 and 3.75 for +1, +2 and +3 charged peptides respectively and FDR 5. Compared to the method used in Chapter 3, in this study the proteins were identified with a higher number of spectral counts, therefore, increasing the confidence level of the identification. From these 228 proteins, the proteins that had a two fold or more difference between the leans and the disease samples were selected for further analysis. Tables 3 and 4 show a partial lists of the proteins that were seen either up or down regulated between disease and the lean pools. 179

180 Table 3: Proteins of interest that were seen up or down regulated in disease samples of the bound M-LAC fraction Protein Name Up or down regulation compared to leans D-Pool (-) D-Pool (S) D-Pool (+) ND-Pool (-) ND-Pool (S) ND-Pool (+) Alpha-1- Up Up Up antichymotrypsin Apolipoprotein E Up Up Up Up Up Angiotensinogen Down Down Down Attractin Down Up Up Pregnancy zone Down Down Down Down Down Down protein Thyroxine binding Down Down Down Down globulin Afamin Down Ceruloplasmin Down Down Down Corticosteroidbinding Down Down globulin Pigment Up Up Up Up epitheliumderived factor Complement Up Up Up Up factor B Apolipoprotein C- Up Up Up II Fibronectin Up Up Apolipoprotein A- Down Down IV Prothombin Down Down Down Table 4: Proteins of interest that were seen up or down regulated in disease samples of the unbound M-LAC fraction Protein Name Up and down regulation compared to leans D-Pool (-) D-Pool (S) D-Pool (+) ND-Pool (-) ND-Pool (S) ND-Pool (+) Gelsolin Up Up Apolipoprotein Up C-III Complement Up Up Up C-III Plasminogen Up Up Up Complement C9 Up Up Up Up Up Complement C4-A Up Up Up Up Tetranectin Up Down Up Up Vitam D-binding protein Down Down Actin, aortic Down Down 180

181 smooth muscle Vitamin K- dependent protein S Up Up Up Proteomic Analysis of Individual Patients In our laboratory for proteomic studies we have adopted the approach of pooling of sample sets to minimize individual variability; however in this study the individual patients were studied to investigate the dilution effect in sample pooling. The optimized proteomics developed platform was applied to the individual samples. For the analysis of the individual samples five patients from each group (5 lean, 5 diabetic and 5 non-diabetic) and 3 patients for the obese diabetic nonresponder were processed through the proteomic platform. In this longitudinal study, three time points were obtained from each patient (presurgery, surgery and postsurgery). This resulted in a total of 43 samples for proteomic analysis (Table 5). Table 6 shows the results of the HP-M-LAC platform for the individual samples, the total recoveries are very good (96.5 %) from the total amount of plasma proteins loaded. Table 5: Summary of the individual samples analyzed Groups Patients PreSurgery time point Patients Surgery time point Patients PostSurgery time point Obese Non- P05, P08, P13, P34, P05, P08, P13, P34, P05, P08, P13, P34, diabetic P35 P35 P35 Obese Diabetic P01, P06, P11, P20, P01, P06, P11, P20, P01, P06, P11, P20, Responders P30 P30 P30 Obese Diabetic P10, P17, P33 P10, P17, P33 P17. P33 Non-responders 181

182 Leans C03*, C03, C04, P27, P28 Table 6: Results of the HP-M-LAC for the individual patients (N=43) Sample Measurement Sample Loading mg/ml % Protein Bound to Depletion Column % Bound M-LAC % Unbound M-LAC % Total mean Individual Patients Standard deviation % CV Proteomic analyses of the individual patients were performed in duplicates at the mass spectrometry level, resulting in approximately 216 mass spectrometric files. In this study we needed to perform biologically based comparisons to identify candidate markers that will differentiate between the obese non-diabetic group compare to obese diabetic group prior and post gastric bypass surgery, as well as differentiate between obese diabetic responders and obese diabetic nonresponders. To facilitate these comparisons, we developed a data analysis design which is shown in Figure 7, the comparison of obese non-diabetic group compared to obese diabetic group resulted in twenty-five different comparisons (for each time point per M-LAC fraction), out of these twenty-five comparisons proteins that 182

183 showed a two fold or higher change in at least eight different comparisons was considered as a candidate protein (Fig.7 a). The same rule was applied for obese non-diabetic compare to the leans and obese diabetic responder compare to the leans (Fig 7 b and c). Furthermore, when comparing the obese diabetic nonresponders for the presurgery time point with obese non-diabetic, obese diabetic responders and the leans, proteins that showed abundance change in 5 comparisons were considered as candidate markers (Fig 7 d,e and f) for both unbound and bound M-LAC fraction, while for the comparison of the obese diabetic non-responders for the postsurgery time point to the obese non-diabetic, obese diabetic responders and the leans, proteins that had a two fold change or higher in at least 4 comparisons was considered as candidate marker for both unbound and bound M-LAC fraction. Figure 7. Data Analysis Design Non-Diabetic Responders Pre-S& Post-S vs Diabetic Responders (Unbound and Bound) a P01 P06 P11P20P30P01P01 P06 P11P20P30 P01 P06 P11P20P30 P01 P06 P11P20P30P01P01 P06 P11P20P30 Non-Diabetic Responders Pre-S& Post-S vs Leans (Unbound and Bound) b C03*C03 C04P27P28C03*C03C03*C03 C04P27P28 C03*C03 C04P27P28 C03*C03 C04P27P28C03*C03C03*C03 C04P27P28 Diabetic Responders Pre-S& Post-S vs Leans (Unbound and Bound) c C03*C03 C04P27P28C03*C03C03*C03 C04P27P28 C03*C03 C04P27P28 C03*C03 C04P27P28C03*C03C03*C03 C04P27P28 183

184 Diabetic Non Responders Pre-S vs Leans (Unbound and Bound) C03*C03 C04P27P28C03*C03 P28C03*C03 C04P27P28 C03*C03 C04P27P28 d Diabetic Responders Pre-S vsdiabetic Non-responders (Unbound and Bound) e NonDiabetic Responders Pre-S vs Diabetic Non-responders (Unbound and Bound) 5 Diabetic 8 Non Responders Post-S vs 13 Leans (Unbound 34 and Bound) g 33 C03*C03 C04P27P28 P28 C03*C03 C04P27P28 P28 f Diabetic Responders Post-S vs Diabetic Non-responders(Unbound and Bound) h The findings from this study revealed the identification of a number of acute Non Diabetic Responders Post-S vs Diabetic Non-responders(Unbound and Bound) phase reactant 5 proteins 8which differentiated 13between the 34 disease (obese and 35 type 2 diabetic) patients compared to the leans. This result is in agreement with the underlying inflammation known to be associated with the obesity, the association j of the obesity with the inflammation pathway has been studied since early 1990s. 20 Various studies in this area has confirmed that obesity is a state of inflammation

185 Moreover, Crook et al. and Pickup et al were the first to propose that type 2 22, 23 diabetes was also an inflammatory condition. Figure 8: Acute phase reactant proteins Figure 8: Acute phase reactant proteins. Fig 8 a and b show the bound M-LAC SAA-4 and unbound M-LAC SAA-4 in the individual ca patients presurgery, surgery and postsyrgery b time points. One of the acute phase proteins most reported to be associated with obesity is the class of proteins, known as serum amyloid A (SSA). In our study we identified serum amyloid A-4 (SAA-4) which is a member of SAA family and is known to responds to inflammatory stimuli. 24 According to this study it was found that SAA-4 is over expressed in the presurgery time point of the obese and diabetic (responders and non-responders) compared to the leans. Decreased levels of SAA-4 were observed in the postsurgery time points for all the patients (Fig 8 a and b). Our results are in agreement with the literature and in fact suggested that 24, 25 massive weight loss will reverse the inflammatory response. Another family of proteins observed to change in these patients was lipid binding proteins. Altered levels of apolipoprotein C-III (ApoC3) was observed in the obese diabetic and non-diabetic patients compared to the leans. Other have shown increased levels of ApoC3 in plasma of individuals with type 1 diabetes and also ApoC3 may contribute to the increased cardiovascular risk. 26 This result may 185

186 also suggest that there might be a relationship between ApoC3 with obesity and type 2 diabetes. 26 Figure 9 Lipid binding proteins Figure 9: Lipid binding proteis. Fig 9 show the bound M-LAC for apolipoprotein C-III in the individual patients presurgery, surgery and postsyrgery time points. Sex hormone (SHBG) and corticosteroid binding globulin (CBG) show to have significant decreased levels in the obese non-diabetic and obese diabetic (responders and non-responders) patients prior to weight loss surgery and postsurgery (Fig 10 a and b). The hormone transporters, CBG (cortisol) and SHBG (testosterone and estradiol) are major blood transporter for steroid hormones circulating in blood. These proteins have been proposed to be negative acute phase reactants proteins. 27 Our study also found the levels of these proteins to decrease prior to surgery and as previously described this could be as a result of insulin inhibition of these proteins. 28 The levels of both of these proteins (Fig. 10 a and b) remain lower after the gastric bypass surgery, this could be explained by decrease in production of these hormone transporting proteins with lower calorie intake. Figure 10 Hormone binding proteins 186

187 Figure 10: Hormone binding proteins Fig 10 a and b Shows relative concentration of the peptides of sex hormone binding globulin and corticosteroid binding globulin vs matched leans. X axes are the different comparison, where y axes is sequence hit for total peptides identified Proteins Discriminating Between Obese Diabetic Responders and Obese Non-diabetic versus Diabetic Non-responders The main goals of this study was to identify candidate markers differentiating between different groups (obese non-diabetic and obese diabetic) and most importantly differentiating the obese diabetic non-responder group from the obese diabetic responders. The data shown in Fig 10 indicates that attractin as potential candidate marker of obese diabetic patients who do not respond to the surgery. The levels of attractin in the obese non-diabetic patient compared to the leans showed no differences in abundance changes. However, the levels of attractin in obese non-diabetic patients compared to obese diabetic patients, showed a significant difference in the presurgery time point. This is an interesting finding suggesting that attractin can be followed as a marker to predict wherever obesity could be developed in type 2 diabetes. In addition, the levels of attractin for the obese diabetic patient who responded to the surgery had a concentration level (based on spectral counts) comparable to that of the leans and obese nondiabetic patients three months after the surgery. However, the obese diabetic 187

188 patients that did not respond to gastric bypass surgery, the levels of the attractin protein remained significantly higher after the surgery, indicating that this protein could be also used as a marker to predict who will respond to gastric bypass surgery. Figure 10 Attractin protein Figure 10: Proteins differentiating between diabetic responder and non-responders. Fig 10 a, shows relative concentrations of the peptides of attractin compare to leans, obese diabetic responders and obese non-diabetic. X axes are the different comparison, where y axes is sequence hit for total peptides identified Comparison of Pools versus Individuals. In this study, to investigate the dilution effect of the pooling approach of the samples only the bound M-LAC fraction of the pools samples was compared to the bound M-LAC fraction of the individuals due to the inconsistency in the trypsin digestion of the unbound M-LAC fraction. Interestingly, corticosteroid binding globulin and sex hormone binding globulin were found to be down-regulated in the pool samples as well as in the individual samples. The same patterns were observed for the acute phase reactant proteins in the pools and in the individual samples. Therefore, we have evidence, at least in these studies, sample pooling could be used for biomarker discovery in plasma samples. 4.4 Conclusion and Future Work In this work we have demonstrated that proteomic technology was successfully applied to obese patients with and without diabetes (responders and 188

189 non-responders) that underwent gastric bypass surgery. The study identified potential protein markers that can be used to predict which obese patient will go on to develop type 2 diabetes and which obese diabetic patients will successfully respond to gastric bypass surgery. Using our proteomic approach we identified proteins that have been shown to be associated with obesity such as ceruloplasmin 29, proteins associated with obesity, diabetes and gastric bypass surgery such as sex hormone binding globulin and corticosteroid binding globulin. 30 While is difficult to make definite conclusion, due to the small number of patients, we will concentrate on verifying these findings by using orthogonal techniques such as ELISA, Western blotting or MRMs. In this chapter we also discussed the pooling approach in proteomics; the proteins identified differentiating in abundance in the individuals were identified behaving the same in the pool samples as well. Pooling is helpful approach in proteomics analysis for biomarker discovery plasma, due to the fact that the techniques used in plasma proteomics require massive fractionation to identify low abundance proteins resulting in a large number of fractions and a large data analysis set. Pooling not only will give a smaller number of samples to analyze but also will reduce the cost of analysis per sample. 189

190 4.5 Reference: 1. Cho, M.; Park, J. S.; Nam, J.; Kim, C. S.; Nam, J. H.; Kim, H. J.; Ahn, C. W.; Cha, B. S.; Lim, S. K.; Kim, K. R.; Lee, H. C.; Huh, K. B., Association of abdominal obesity with atherosclerosis in type 2 diabetes mellitus (T2DM) in Korea. J Korean Med Sci 2008, 23, (5), Nguyen, N. T.; Magno, C. P.; Lane, K. T.; Hinojosa, M. W.; Lane, J. S., Association of hypertension, diabetes, dyslipidemia, and metabolic syndrome with obesity: findings from the National Health and Nutrition Examination Survey, 1999 to J Am Coll Surg 2008, 207, (6), Bloomgarden, Z. T., Diabetes and obesity: part 2. Diabetes Care 2008, 31, (1), LeRoith, D.; Novosyadlyy, R.; Gallagher, E. J.; Lann, D.; Vijayakumar, A.; Yakar, S., Obesity and type 2 diabetes are associated with an increased risk of developing cancer and a worse prognosis; epidemiological and mechanistic evidence. Exp Clin Endocrinol Diabetes 2008, 116 Suppl 1, S Overview, D Nyamdorj, R.; Qiao, Q.; Lam, T. H.; Tuomilehto, J.; Ho, S. Y.; Pitkaniemi, J.; Nakagami, T.; Mohan, V.; Janus, E. D.; Ferreira, S. R., BMI compared with central obesity indicators in relation to diabetes and hypertension in Asians. Obesity (Silver Spring) 2008, 16, (7), Nyamdorj, R.; Qiao, Q.; Soderberg, S.; Pitkaniemi, J. M.; Zimmet, P. Z.; Shaw, J. E.; Alberti, K. G.; Pauvaday, V. K.; Chitson, P.; Kowlessur, S.; Tuomilehto, J., BMI compared with central obesity indicators as a predictor of diabetes incidence in Mauritius. Obesity (Silver Spring) 2009, 17, (2), Hjartaker, A.; Langseth, H.; Weiderpass, E., Obesity and diabetes epidemics: cancer repercussions. Adv Exp Med Biol 2008, 630,

191 9. Rutter, G. A.; Parton, L. E., The beta-cell in type 2 diabetes and in obesity. Front Horm Res 2008, 36, Foundation, W. D., 35.htm. In Jantz, E. J.; Larson, C. J.; Mathiason, M. A.; Kallies, K. J.; Kothari, S. N., Number of weight loss attempts and maximum weight loss before Roux-en-Y laparoscopic gastric bypass surgery are not predictive of postoperative weight loss. Surg Obes Relat Dis 2009, 5, (2), Schernthaner, G.; Morton, J. M., Bariatric surgery in patients with morbid obesity and type 2 diabetes. Diabetes Care 2008, 31 Suppl 2, S Butner, K. L.; Nickols-Richardson, S. M.; Clark, S. F.; Ramp, W. K.; Herbert, W. G., A review of weight loss following Roux-en-Y gastric bypass vs restrictive bariatric surgery: impact on adiponectin and insulin. Obes Surg 20, (5), Hall, T. C.; Pellen, M. G.; Sedman, P. C.; Jain, P. K., Preoperative factors predicting remission of type 2 diabetes mellitus after Roux-en-Y gastric bypass surgery for obesity. Obes Surg 20, (9), Rand, C. S.; Macgregor, A.; Hankins, G., Gastric bypass surgery for obesity: weight loss, psychosocial outcome, and morbidity one and three years later. South Med J 1986, 79, (12), Laferrere, B.; Teixeira, J.; McGinty, J.; Tran, H.; Egger, J. R.; Colarusso, A.; Kovack, B.; Bawa, B.; Koshy, N.; Lee, H.; Yapp, K.; Olivan, B., Effect of weight loss by gastric bypass surgery versus hypocaloric diet on glucose and incretin levels in patients with type 2 diabetes. J Clin Endocrinol Metab 2008, 93, (7), Diz, A. P.; Truebano, M.; Skibinski, D. O., The consequences of sample pooling in proteomics: an empirical study. Electrophoresis 2009, 30, (17), Plavina, T.; Hincapie, M.; Wakshull, E.; Subramanyam, M.; Hancock, W. S., Increased plasma concentrations of cytoskeletal and Ca2+-binding proteins and their peptides in psoriasis patients. Clin Chem 2008, 54, (11), Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6, (2), Spiegelman, B. M.; Hotamisligil, G. S., Through thick and thin: wasting, obesity, and TNF alpha. Cell 1993, 73, (4),

192 21. Dandona, P.; Aljada, A.; Bandyopadhyay, A., Inflammation: the link between insulin resistance, obesity and diabetes. Trends Immunol 2004, 25, (1), Crook, M. A.; Tutt, P.; Simpson, H.; Pickup, J. C., Serum sialic acid and acute phase proteins in type 1 and type 2 diabetes mellitus. Clin Chim Acta 1993, 219, (1-2), Pickup, J. C.; Mattock, M. B.; Chusney, G. D.; Burt, D., NIDDM as a disease of the innate immune system: association of acute-phase reactants and interleukin- 6 with metabolic syndrome X. Diabetologia 1997, 40, (11), Scheja, L.; Heese, B.; Zitzer, H.; Michael, M. D.; Siesky, A. M.; Pospisil, H.; Beisiegel, U.; Seedorf, K., Acute-phase serum amyloid A as a marker of insulin resistance in mice. Exp Diabetes Res 2008, 2008, O'Brien, K. D.; Brehm, B. J.; Seeley, R. J.; Bean, J.; Wener, M. H.; Daniels, S.; D'Alessio, D. A., Diet-induced weight loss is associated with decreases in plasma serum amyloid a and C-reactive protein independent of dietary macronutrient composition in obese subjects. J Clin Endocrinol Metab 2005, 90, (4), Dayarathna, M. K.; Hancock, W. S.; Hincapie, M., A two step fractionation approach for plasma proteomics using immunodepletion of abundant proteins and multi-lectin affinity chromatography: Application to the analysis of obesity, diabetes, and hypertension diseases. J Sep Sci 2008, 31, (6-7), Manco, M.; Fernandez-Real, J. M.; Valera-Mora, M. E.; Dechaud, H.; Nanni, G.; Tondolo, V.; Calvani, M.; Castagneto, M.; Pugeat, M.; Mingrone, G., Massive weight loss decreases corticosteroid-binding globulin levels and increases free cortisol in healthy obese patients: an adaptive phenomenon? Diabetes Care 2007, 30, (6), Fernandez-Real, J. M.; Grasa, M.; Casamitjana, R.; Pugeat, M.; Barret, C.; Ricart, W., Plasma total and glycosylated corticosteroid-binding globulin levels are associated with insulin secretion. J Clin Endocrinol Metab 1999, 84, (9), Kim, O. Y.; Shin, M. J.; Moon, J.; Chung, J. H., Plasma ceruloplasmin as a biomarker for obesity: A proteomic approach. Clin Biochem 44, (5-6), Lewis, J. G.; Shand, B. I.; Elder, P. A.; Scott, R. S., Plasma sex hormonebinding globulin rather than corticosteroid-binding globulin is a marker of insulin resistance in obese adult males. Diabetes Obes Metab 2004, 6, (4),

193 Appendix A Additional Research Work Performed During this PhD Graduate Course Analysis of colon cancer samples using depletion of the top 12 abundant proteins combined with Multi-Lectin Affinity Chromatography (M-LAC) Publications Baumgartner, C.; Rejtar, T.; Kullolli, M.; Akella, L. M.; Karger, B. L., SeMoP: a new computational strategy for the unrestricted search for modified peptides using LC- MS/MS data. J Proteome Res 2008, 7, (9),

194

195 investigate changes in the plasma glycoproteome from postmenopausal women diagnosed with colon cancer within 18 months after sample collection. The analysis was performed in 42 samples (20 case and 22 matched controls). The identity of the samples was blinded throughout the execution of the study. The proteomics platform consisted of a two-dimensional HPLC fractionation approach. In the first dimension, 12 abundant plasma proteins were depleted using the IgY column. The depleted plasma was captured directly by a high pressure M- LAC column to enrich for glycoproteins. Two fractions: an unbound, mostly nonglycosylated and a bound glycosylated proteins were collected. In this study, we focused on the analysis of the plasma glycoproteome using the bound glycosylated M-LAC protein fraction only. Samples were digested with trypsin, desalted a C 18 column trap and the peptides were separated on a nano C 18 reversed phase column using LC-MS/MS. Data generated on a LTQ-FT instrument were processed using Sequest search engine and subsequently quantified using labelfree comparative spectral counting. 3.1 Introduction Colorectal cancer (CRC) is the third most commonly diagnosed cancer and third leading cause of cancer deaths worldwide, with approximately 655,

196 deaths per year. 1, 2 If diagnosed and treated early CRC has a survival rate as high as 90%, and therefore, is one of the most curable types of cancer 1. Diagnostic methods for CRC include fecal occult blood test (FOBT), which is used to check for blood in the stool resulting from either gastrointestinal or lower gastrointestinal bleeding. 3 FOBT is not a specific method for CRC because it is also used in diagnoses of gastric cancer, thus further investigation with a more specific method such as colonoscopy is usually performed to follow up. Colonoscopy is an invasive test, in which a tube with a flexible fiber optic camera or charged-coupled device (CCD) camera is inserted through the anus. As a result many patients are resistant to undergoing the procedure. 4-6 Thus, non-invasive based methodologies for CRC detection that are more specific and suitable for general cancer screening in the wide population are required. Currently, there is increased interest in proteomics based technologies for a non-invasive method for early detection of colon cancer. Biomarkers are important tools for detection, diagnosis, treatment, monitoring, and prognosis of a specific disease. Identifying sensitive and specific candidate biomarker proteins for early stage diagnosis of cancer is critical due to higher chance of a positive prognosis. Proteomics has become the most powerful and efficient methodology in recent years for biomarker discovery in biological fluids. One of the most studied biological samples in biomarker discovery for early detection of cancer is plasma. 7 Plasma has several advantages over other biological fluids, it is easily obtained, is the most abundant source of proteins, and proteins resulting from pathological changes in the body are released into the blood stream due to the intimate contact 196

197 of the blood with tissue. 8 Studying the plasma proteome means studying alterations in protein abundance, identification of new proteins in disease samples, or alteration of post translation modifications in a given disease state. 9 The range of protein concentration present in plasma dramatically complicates such studies as ten abundant proteins together make up 90% of the total plasma proteome. To address this complexity removal of abundant proteins can beneficially increase the dynamic range of protein levels in plasma. 10 Glycosylation of proteins is the most ubiquitous post-translational modification observed in eukaryotic organisms. It is estimated that roughly half of the mammalian proteomeis glycosylated. Glycoproteins are involved in a myriad of cellular and biological functions, including immune defense, cell growth and 11, 12 differentiation, cell-cell adhesion to name but a few. Furthermore, many current clinical biomarkers for different cancers are glycoproteins, for example Her2/neu in breast cancer, CA 125 in ovarian cancer, prostate-specific antigen (PSA) in prostate cancer and carcinoembryonic anticen (CEA) 13 in bladder, lung, breast, pancreatic, and colon cancer. As a result of the important role of glycoproteins, a number of analytical tools have been developed over the past few years and more continue to emerge to study the plasma glycoproteome. Lectin affinity chromatography has been a major analytical tool in glycoproteomics due to specific affinity of lectins for glycoproteins. Currently, the two main analytical tools to study glycoproteins are hydrazine coupling of N-linked glycoproteins and immobilized lectin chromatography. The main disadvantage of hydrazine chemistry method is that the glycan structure is destroyed due to the harsh 197

198 oxidative conditions. 14 Consenquently required for hydrazide coupling the use of lectins has been shown to be more favorable for glycoprotein analysis Research presented in this chapter describes a collaborative biomarker discovery study between ten laboratories. Each laboratory applied their different proteomic technologies to the same set of samples to identify proteins that may be derived from tumor cells or protein changes associated with the response that occurs during tumor development, including inflammation, angiogenesis and infiltration. Pre-diagnostic samples were used to avoid identifying candidate markers that maybe altered only in the symptomatic disease sate or that may reflect inflammation. Our method consisted of enrichment of glycoprotein using a multi lectin affinity chromatography column (M-LAC). We have showed that a multilectin affinity column (M-LAC) provided a comprehensive capture of glycoproteins from biological fluids and was sensitive to alteration in glycosylation of human plasma proteins In this study, we focused on the analysis of the plasma glycoproteome and therefore the proteomic analysis targeted the bound glycosylated protein fraction only Experimental Methods and materials Materials and Chemicals Aldehyde POROS- 20 AL (20 µm beads), POROS Protein A (PA) (POROS-PA50 resin), POROS-R1-50 resin were purchased from Applied Biosystems, (Foster City, CA). Unconjugated lectins: concanavalin A (ConA), jacalin (JAC), wheat germ agglutinin (WGA), were purchased from Vectors 198

199 Laboratories (Burlingame, CA). The IgY12-LC10 column, 10 ml (currently sold as ProteomeLab IgY-12 partitioning system by Beckman Coulter. All chemicals and reagents were purchased from Sigma Aldrich (St. Louis, MO) and were of the highest quality available. PEEK columns were purchased from Isolation Technology (Milford, MA). Bradford protein assay kit, trifluoroacetic acid (TFA), formic acid, glacial acetic acid, and HPLC grade acetonitrile, HPLC grade water, and trypsin were purchased from Thermo Fisher Scientific (Waltham, MA ). LC- MS columns 150 mm x 75 µm i.d. were purchased from New Objectives (Woburn, MA), reversed phase C18 Magic bead size 5 µm, pore size 300 Å were purchased from Microm BioResource (Auburn, CA) Plasma Collection and Pooling This study was conducted using 0.5 ml of plasma collected with EDTA present from each sample of 100 colon cancer cases diagnosed within 18 months following the 3 rd year blood draw ( ). These cases were individually matched on age (within 2 years), self-reported race/ethnicity, and date of 3-year blood draw (within 2 months) to a randomly selected control whom was without a colon cancer diagnosis at the time of initiation of this pilot project. The samples were provided from Fred Hutchinson s Cancer Research Center. Blood samples were collected from women in rage of years old. All women were postmenopausal and had no medical condition associated with predicted survival of less than 3 years. In the colon cancer samples only patients that had been diagnosed with colon cancer within a period following the year 3 blood draw 199

200 were eligible for inclusion in the present study. The control samples were matched to colon cancer cases, in terms of age, self reported race/ethnicity and body mass. An additional 10 controls having matching variables to the case group were selected and pair matched to yet additional controls, to allow control-control comparisons for quality assurance analyses within each laboratory. The samples were pooled by pooling 5 individuals and subdivided in 125 µl aliquotes. The pooled samples were labeled A or B; however not all A s have the same disease status, nor all B s have the same disease status Multi-Dimensional Sample Fractionation for High-Throughput Proteomic Analysis Pre-diagnostic colon cancer and matched control plasma samples were received frozen in dry ice; samples were stored at -75 C until analysis. Throughout the whole procedure, specific attention was given to sample handling, protein recovery, reproducibility and consistency of the platform. To enrich for low abundant glycoproteins, we depleted plasma from 12 highly abundant proteins (albumin, total IgG, α1-antitrypsin, IgA, IgM, transferrin, haptoglobin, α1-acid glycoprotein, α2-macroglobulin, apolipoproteins A-I & A-II) using the IgY-12 LC10 column from Beckman. The depleted plasma was further fractionated into a bound fraction (glycoproteins) and an unbound (mostly non-glycosylated) by in-house prepared M-LAC column containing 3 immobilized lectins (concanavalin A, wheat germ agglutinin and jacalin). The samples were desalted/ and concentrated using a R1 reverse-phase trap,as described in Chapter 2. These steps were accomplished in an automated mode using a multi-dimensional HPLC workstation 200

201 (BioCAD Perceptive Biosystems), equipped with automatic valve switching capabilities which can be directly controlled from the software. This set-up minimizes sample losses and increases sample throughput. Briefly, a total of 110 µl of plasma were spiked with 2 proteins (bovine fetuin and glucose oxidase from A. niger) as internal standards and each plasma sample was diluted 5-fold with binding buffer (20 mm Tris-HCl, 0.5 M NaCl, ph 7.2, 1mM CaCl 2, 1mM MgCl 2, ). Sample injection was done manually; during sample loading, all columns were inline. The flow through (depleted-plasma) from the Protein IgY column was directly captured by the M-LAC support. The retained 12 abundant proteins were stripped (into waste) from the depletion column using low ph (0.1M glycine, ph 2.5). The IgY immuno-affinity support was immediately neutralized, regenerated and reequilibrated with binding buffer. The M-LAC column was placed back in-line with the RP-trap column, and the unbound (mostly non-glycosylated) fraction was washed-off and trapped by the RP column. Proteins were immediately eluted from the RP-trap using 90% acetonitrile. The bound glycoproteins were displaced from the M-LAC column using 100 mm acetic acid ph 3.5, captured on the RP-trap column and then eluted. The M-LAC affinity support was then neutralized, regenerated and re-equilibrated with binding buffer. To monitor the quality of column performance with regards to fractionation and reproducibility, we determined the protein concentration of the unbound and bound fractions using the Bradford total protein assay. 201

202 The unbound fraction was stored at -75 C and the volume of the fractions glycoproteins (bound proteins), was further reduced by vacuum concentration to roughly 100 µl prior to trypsin digestion Trypsin Digestion Each sample was denatured using 6.0 M guanidine (GnHCl) in 0.1 M ammonium bicarbonate, ph 8.0 (1:5 dilution). Reduction was achieved by addition of 5 mm TCEP and incubating at 60 C for 15 minutes. Protein s were alkylated with 15 mm iodoacetamide in the dark for 25 minutes and the reaction was quenched by additing 5 mm DTT and incubated for an additional 5 minutes. The sample was diluted down to final concentration 1.5 M GnHCl using 100 mm ammonium bicarbonate (ph 8.0). Trypsin was added to the sample at a 1:40 w/w ratio. Sample was incubated overnight at 37 C. The digestion was terminated by the addition of formic acid to a final concentration of 1% Sample desalting Prior to LC-MS analysis, samples were desalted using a reverse phase column. The peptides were separated from the non-digested proteins and contaminants by a R1- POROS (Perseptive Biosystems) column. The mobile phase A was composed of 0.1% TFA in water, and mobile phase B was 0.1% TFA in HPLC grade acetonitrile. The digested proteins were loaded on the column using 2% solvent B and washed for 3 min to remove salts and other reagents from trypsin digestion. The bound peptides were eluted with a step gradient: 30% solvent B for 3 min, to collect peptide to be analyzed by nano-lc-ms/ms, and then 95% solvent B for 5 min, to elute larger peptides, partially digested and non-digested proteins. 202

203 The separation was performed at 2.5 ml/min and monitored at both 280nm and 214 nm. Peptides eluted with 30 % B were concentrated down to 50 µl using vacuum concentrator. The recoveries from the reverse-phase clean up are very good with good reproducibility NanoLC-LTQ-FT analysis The LC-MS/MS experiments were performed on a Dionex UltiMate TM 3000 HPLC system (LC Packings-Dionex) coupled to a hybrid linear ion trap FT-ICR mass spectrometer (Thermo Electron, San Jose, CA). The column used for LC-MS/MS analysis (150 mm x mm) was from New Objective (Woburn, MA) and slurry packed in house with 5 µm, 200 Å pore size Magic C 18 stationary phase (Michrom Bioresources, Auburn, CA). The flow rate for sample separation was 200 nl/min. Mobile phase A was 0.1 % formic acid in water and mobile phase B was 0.1% formic acid in acetonitrile. Routinely, 2.5µl of each sample corresponding to approximately 2.0 µg of total proteins was injected on to the column. The ion transfer tube of the linear ion trap was held at 245 C; the normalized collision energy for MS/MS was set to 30%. The spray voltage was 2.0 kv. The mass spectrometer was operated in the data dependant mode to switch automatically between MS and MS/MS acquisitions using MS acquisition software (Xcalibur 2.0, Thermo Electron, San Jose, CA). Each MS full scan (mass range of m/z 400 to m/z 1600) was followed by 10 MS/MS scans of the ten most intense peaks Results and Discussion Tremendous progress has been made in identifying candidate biomarkers for early detection of cancer. Despite this progress cancer still remains the second 203

204 leading cause of death in world. Collaborative approaches have been shown to be a successful way to studying plasma proteins due to multiplicity of proteomic 22, 23 approaches Sample preparation The proteomic platform that we applied to the study is shown in fig.1. Our proteomic platform consisted of a two-dimensional HPLC fractionation approach. In the first dimension, 12 abundant proteins in plasma were depleted using the IgY column, the depleted plasma was captured directly by a high pressure M-LAC column to enrich for glycoproteins. 21 We evaluated the performance of our in-line protein depletion and M-LAC fractionation using a model sample (human plasma obtained from Sigma). This sample was repeatedly analyzed (5-times) and protein concentration of individual fractions was determined using Bradford protein assay shown in Table1. It can be seen that on average the recovery was 98.6% with a coefficient of variation (CV) of approximately 10% except for the nonbound fraction which was < 20%. The overall recovery (unbound and bound fraction) of the sample fractionation step IgY12 -M-LAC step for the plasma samples averaged 87% with a CV of 16%. It has long been recognized that the structural changes in glycosylation are associated with cancer and other disease. In the analysis of the samples between case (colon cancer) and controls, we observed that the amount of the proteins in the bound fraction of M-LAC was higher in the colon cancer samples than the amount of protein in the matched control samples (Fig 2 a). The amount of the 204

205 protein in the M-LAC flow though is lower compared to the matched controls as shown in the Fig 2 b, suggesting that glycosylation changes in colon cancer patients compared to the matched control samples. We carefully observed the performance of our 12 protein depletion column during the study. The study consisted of a large number of runs, to check the performance of our platform a blank run was performed at the beginning of the day and at the end of the day. Figure 3 shows a trend analysis of the leak through for 9 out of 12 most abundant proteins, therefore there is a concern regarding the stability of the depletion column over the study. Figure 1. Proteomic workflow Internal Standard Plasma Removal of Abundant Proteins (12 Protein Depletion Concentration/ Prot. Recovery Multi-Lectin Column Concentration/ Prot. Recovery Non-Bound Proteins Glycosylated Proteins Trypsin Digestion/sample clean-up Nano LC-MS/MS (LTQ-FT) UV trace of peptide clean up Data Dependent Acquisition Spectral Sampling Figure1. A systematic workflow of the proteomic platform used to analyze the samples. The methods involved immunodepletion of abundant proteins, glycoprotein enrichments using M-LAC and analysis by LC-MS/MS. 205

206 Table 1. Platform reproducibility study N=5 Depletion Unbound Bound Total Mean Std dev CV 9.2% 17.6% 11.9% 7.0% Figure 2 Bound and Unbound M-LAC Fractions Bound M-LAC Mean Case Case vs. control Unbound M-LAC Control a Mean Case Case vs. Control Control b 206

207 Figure 2 M-LAC fractions, a) shows the mean of the bound M-LAC fraction for 20 case samples (colon cancer) and 22 matched control samples, b) shows the mean of the unbound fraction for the 20 case samples and 22 matched controls Figure 3. Stability of depletion column during the Study. 2 Norm. spectral counts A2M ALB FGA FGG FGB ORM1 APOA1 TF HP Analysis Order Figure 3. Abundance of 9 out of 12 depleted proteins in samples estimated based on number of spectral counts Database Searching LC/MS data generated with the LTQ-FT instrument were processed using Sequest 24, 25 Cluster search engine (ver. 3.0) and stored in CPAS. The database search was conducted against two human protein databases Swiss-Prot (release 52 with protein sequences) and IPI (release 3.23 with sequences) with appended sequences of standard proteins spiked to samples (bovine fetuin and glucose oxidase from A. niger). Both databases consist of normal and reversed protein sequences to facilitate estimation of the false positive rate. Trypsin was 207

208 specified as the digestion enzyme with up to two missed cleavages and carboxyamidomethylation was designated as a fixed modification of cysteine. In order to minimize level of false positive identifications while keeping correctly identified lower abundant proteins, two sets of filtering criteria were used: 1) High confidence list of proteins generated using combined stringent criteria: PeptideProphet probability 0.90, fractional mass accuracy of ±50 ppm and so called HUPO criteria Cn 0.10, RSp>5, Xcorr 1.9, 2.2 and 3.75 for singly, doubly, and triply charged ions, respectively followed by Protein Prophet analysis with cutoff at 0.95 probability. This filtering criterion led to identification of roughly 130 proteins per sample with false positive rate of 3%. This protein list was used primarily to assign protein annotation, for example for evaluation of M-LAC performance using known protein glycosylation status. 2) Intermediate confidence list using PeptideProphet probability greater than 0.7, fractional mass accuracy within ±50 ppm and ProteinProphet probability greater than 0.9. Protein probability was calculated without readjustment of peptide probabilities based the number of sibling peptides. Using these criteria increased the total number identified peptides per protein and at the same time led to exclusion of proteins with single peptide identifications and probability lower than 0.9 (at least one peptide per protein must have probability greater than 0.9). This filtering criterion led to identification of approximately 230 proteins per sample with 40% false positive rate. 26 It should be noted that the IPI database contains a significant number of highly homologous protein sequences, which leads to ambiguity in protein assignment 208

209 especially when comparing multiple samples. ProteinProphet often generates nonidentical protein groups for virtually identical set of peptides found in different samples, thus making multi-sample comparison challenging. In order to alleviate this problem, LC-MS data were search also against SwissProt database. Compared to IPI, SwissProt contains fewer protein sequences but most of the sequences can be linked to a unique gene name, which is advantageous for multisample comparison. To estimate how many proteins could be potentially lost using SwissProt instead of the IPI database, proteins identified exclusively in IPI (i.e. not in SwissProt database) were generated using the following strategy: all peptides identified using high confident criteria in SwissProt searches were pooled and excluded from individual IPI search results. The resultant filtered files were then reprocessed using Protein Prophet to generate a list of proteins unique to IPI. In the majority of cases proteins identified exclusively in the IPI database were hypothetical proteins annotated only by molecular mass. It was estimated that for this dataset, on average, less than 10% of additional proteins were identified in IPI search compared to proteins found in SwissProt alone. This number was considered negligible compared to changes in the number of identified proteins when applying slightly different identification criteria Semi-Quantitative Analysis Protein abundance between cases and controls was estimated using the total number of peptides identified per protein in a particular sample, i.e. so called spectral counting. In order to minimize influence of proteins identified randomly in individual samples (intermediate confidence list was used), a protein could 209

210 qualified for semi-quantitation only if it was identified in at least 5 samples (regardless of sample status, i.e. case or control). Applying this criterion reduced the accumulated number of identified proteins in all samples to 246, see Table 2. It should be noted that in the case when a particular protein was not identified in a given sample, the number of peptides corresponding to this protein was set to 0. Table 2. Number of identified glyco proteins and false positive rate. High confidence Intermediate confidence No. of proteins FP rate No. of proteins FP rate Average per sample 130 4% % Total 5 samples 169 3% % Analysis of the data indicated that according to the literature, a number of proteins known to be associated with cancer were identified; these are presented in Table 3. Table 3 Selected glycoprotein observed in at least five case samples Inositol haxakisphosphate kinase 2 Neural cell adhesion molecule 1* Protein DRA (down-regulated in adenoma) IgGFc-binding protein 210

211 Breast cancer antigen NY-BR-1 Serine/Threonine-protein kinase Nek5 Hepatocyte growth factor activator Plasma serine protease inhibitor Hepatocyte growth factor-like protein* Tetranectin* Gelsolin* Extracellular matrix protein* *Also observed in plasma studies of MCF-7 xenograft Statistical analysis Statistical analysis was performed to find proteins that could discriminate between cases and controls. Biomarker identification (BMI) algorithm ranks proteins according to a specific quality measure (BMI score) which combines three parameters: discriminatory performance separating two groups (cases vs. controls), relative concentration/count changes and variance of data in each class 27. To estimate statistical significance of proteins with significant difference in the number of spectral counts for cases and controls, permutation analysis was applied. Using the dataset in Table 2, sample labels case and control were randomly permuted for all 42 samples and Wilcoxon p-values and BMI-scores were calculated. The procedure was repeated 200 times and the resulting values were considered expected random values, i.e. the null distributions. By comparing calculated p-values and BMI-scores with null distributions, false discovery rate 211

212 (FDR) was estimated. The total number of peptides identified per protein for case and control samples was used for semi-quantitative analysis. BMI-score was calculated and results are presented in Table 4. Positive values of BMI represent greater abundance in cancer patients; negative BMI values indicate greater abundance in controls. Using permutation analysis it was determined that proteins with BMI value greater 30 correspond to FDR roughly 0.9 and greater than 50 roughly 0.7. Using the above criteria, a total of 10 proteins were identified in colon cancer patients with significant BMI-scores (4 up-regulated and 6 down-regulated). It was noted that 4 out 6 proteins found to be decreased belong to immunoglobulins. Most likely this reflects a systemic response due to inflammation. Table 4. Proteins found differentially abundant using BMI-score Gene name Proteins Biomarker Index Score HV304 Ig heavy chain V-III region TIL 30 CPN1 Carboxypeptidase N1 20 C9 Complement C9 20 RSP6KA1 Ribosomal Protein 20 KV402 Ig kappa -20 DOK7 Protein Doc7-20 HV320 Ig Heavy chain -30 IgJ IgJ -40 FCN3 Finculin

213 IgH2 Ig alpha Conclusions The collaborative colon cancer study presented in this appendix was novel in that it was based on the use of a unique source of pre-diagnostic plasma (prior to the diagnosis of cancer) samples and ten different proteomic platforms were applied to profile the plasma. To avoid bias the sample collection storage was uniform and the raw data was centralized and processed under identical conditions. The results of this study clearly demonstrated that no single platform can fully analyze the complexity of proteomes such as biological fluids, and therefore, supports the need of collaborative studies. The analyses led to a rich data set of candidate markers shown to be at higher levels compared to controls. These candidates belong primarily to two classes. One class represents inflammatory markers and a second class represents proteins involved in the immune system. These two classes of proteins are likely to be up-regulated in many pathological conditions such as cancer, autoimmune and metabolic disease; so solely on the basis of changes in protein abundances, these proteins lack specificity and utility as disease biomarkers. 213

214 3.5 Reference: 1. Parkin, D. M.; Bray, F.; Ferlay, J.; Pisani, P., Global cancer statistics, CA Cancer J Clin 2005, 55, (2), Sriamporn, S.; Parkin, D. M.; Pisani, P.; Vatanasapt, V.; Suwanrungruang, K.; Kamsa-ard, P.; Pengsaa, P.; Kritpetcharat, O.; Pipitgool, V.; Vatanasapt, P., A prospective study of diet, lifestyle, and genetic factors and the risk of cancer in Khon Kaen Province, northeast Thailand: description of the cohort. Asian Pac J Cancer Prev 2005, 6, (3), Engwegen, J. Y.; Helgason, H. H.; Cats, A.; Harris, N.; Bonfrer, J. M.; Schellens, J. H.; Beijnen, J. H., Identification of serum proteins discriminating colorectal cancer patients and healthy controls using surface-enhanced laser desorption ionisation-time of flight mass spectrometry. World J Gastroenterol 2006, 12, (10), Burt, R. W., Colon cancer screening. Gastroenterology 2000, 119, (3),

215 5. Ransohoff, D. F., Colon cancer screening in 2005: status and challenges. Gastroenterology 2005, 128, (6), Kung, J. W.; Levine, M. S.; Glick, S. N.; Lakhani, P.; Rubesin, S. E.; Laufer, I., Colorectal cancer: screening double-contrast barium enema examination in average-risk adults older than 50 years. Radiology 2006, 240, (3), Anderson, N. L.; Anderson, N. G., The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 2002, 1, (11), Makarov, D. V.; Carter, H. B., The discovery of prostate specific antigen as a biomarker for the early detection of adenocarcinoma of the prostate. J Urol 2006, 176, (6 Pt 1), Srivastava, S., Cancer biomarker discovery and development in gastrointestinal cancers: early detection research network-a collaborative approach. Gastrointest Cancer Res 2007, 1, (4 Suppl 2), S Tang, H. Y.; Ali-Khan, N.; Echan, L. A.; Levenkova, N.; Rux, J. J.; Speicher, D. W., A novel four-dimensional strategy combining protein and peptide separation methods enables detection of low-abundance proteins in human plasma and serum proteomes. Proteomics 2005, 5, (13), Hakomori, S., Tumor-associated carbohydrate antigens defining tumor malignancy: basis for development of anti-cancer vaccines. Adv Exp Med Biol 2001, 491, Hakomori Si, S. I., The glycosynapse. Proc Natl Acad Sci U S A 2002, 99, (1), Fletcher, R. H., Carcinoembryonic antigen. Ann Intern Med 1986, 104, (1), Zhang, H.; Li, X. J.; Martin, D. B.; Aebersold, R., Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol 2003, 21, (6), Qiu, R.; Regnier, F. E., Comparative glycoproteomics of N-linked complextype glycoforms containing sialic acid in human serum. Anal Chem 2005, 77, (22), Madera, M.; Mechref, Y.; Klouckova, I.; Novotny, M. V., Semiautomated high-sensitivity profiling of human blood serum glycoproteins through lectin preconcentration and multidimensional chromatography/tandem mass spectrometry. J Proteome Res 2006, 5, (9),

216 17. Zhou, Y.; Aebersold, R.; Zhang, H., Isolation of N-linked glycopeptides from plasma. Anal Chem 2007, 79, (15), Plavina, T.; Wakshull, E.; Hancock, W. S.; Hincapie, M., Combination of abundant protein depletion and multi-lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery. J Proteome Res 2007, 6, (2), Yang, Z.; Hancock, W. S., Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. J Chromatogr A 2004, 1053, (1-2), Kullolli, M.; Hancock, W. S.; Hincapie, M., Preparation of a highperformance multi-lectin affinity chromatography (HP-M-LAC) adsorbent for the analysis of human plasma glycoproteins. J Sep Sci 2008, 31, (14), Kullolli, M.; Hancock, W. S.; Hincapie, M., Automated platform for fractionation of human plasma glycoproteome in clinical proteomics. Anal Chem 82, (1), Adkins, J. N.; Monroe, M. E.; Auberry, K. J.; Shen, Y.; Jacobs, J. M.; Camp, D. G., 2nd; Vitzthum, F.; Rodland, K. D.; Zangar, R. C.; Smith, R. D.; Pounds, J. G., A proteomic study of the HUPO Plasma Proteome Project's pilot samples using an accurate mass and time tag strategy. Proteomics 2005, 5, (13), Bell, A. W.; Deutsch, E. W.; Au, C. E.; Kearney, R. E.; Beavis, R.; Sechi, S.; Nilsson, T.; Bergeron, J. J., A HUPO test sample study reveals common problems in mass spectrometry-based proteomics. Nat Methods 2009, 6, (6), Cottingham, K., CPAS: a proteomics data management system for the masses. J Proteome Res 2006, 5, (1), Rauch, A.; Bellew, M.; Eng, J.; Fitzgibbon, M.; Holzman, T.; Hussey, P.; Igra, M.; Maclean, B.; Lin, C. W.; Detter, A.; Fang, R.; Faca, V.; Gafken, P.; Zhang, H.; Whiteaker, J.; States, D.; Hanash, S.; Paulovich, A.; McIntosh, M. W., Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J Proteome Res 2006, 5, (1), Robinson, C. V.; Gross, M. L., Focus on proteomics in honor of Ruedi Aebersold, 2002 Biemann Awardee. J Am Soc Mass Spectrom 2003, 14, (7), Baumgartner, C. B., D.,, Biomarker discovery, disease classification, and similarity query processing on high-throughput MS/MS data of inborn errors of metabolism. J. Biomol Screen 2006, 11, (1),

217 SeMoP: A New Computational Strategy for the Unrestricted Search for Modified Peptides Using LC MS/MS Data Christian Baumgartner *,,, Tomas Rejtar, Majlinda Kullolli, Lakshmi Manohar Akella, and Barry L. Karger *, Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts Research Group for Clinical Bioinformatics, Institute of Biomedical Engineering, University for Health Sciences, Medical Informatics and Technology, Hall in Tyrol, Austria Abstract A novel computational approach, termed Search for Modified Peptides 217

218 (SeMoP), for the unrestricted discovery and verification of peptide modifications in shotgun proteomic experiments using low resolution ion trap MS/MS spectra is presented. Various peptide modifications, including posttranslational modifications, sequence polymorphisms, as well as sample handling-induced changes, can be identified using this approach. SeMoP utilizes a three-step strategy: (1) a standard database search to identify proteins in a sample; (2) an unrestricted search for modifications using a newly developed algorithm; and (3) a second standard database search targeted to specific modifications found using the unrestricted search. This targeted approach provides verification of discovered modifications and, due to increased sensitivity, a general increase in the number of peptides with the specific modification. The feasibility of the overall strategy has been first demonstrated in the analysis of 65 plasma proteins. Various sample handling induced modifications, such as β-elimination of disulfide bridges and pyrocarbamidomethylation, as well as biologically induced modifications, such as phosphorylation and methylation, have been detected. A subsequent targeted Sequest search has been used to verify selected modifications, and a 4-fold increase in the number of modified peptides was obtained. In a second application, 1367 proteins of a cervical cancer cell line were processed, leading to detection of several novel amino acid substitutions. By conducting the search against a database of peptides derived from proteins with decoy sequences, a false discovery rate of less than 5% for the unrestricted search resulted. SeMoP is shown to be an effective 218

219 and easily implemented approach for the discovery and verification of peptide modifications. Keywords mass spectrometry; post-translational modifications; unrestricted search; shotgun proteomics Introduction Most proteins undergo post-translational modifications (PTMs) that alter their physical and chemical properties, as well as their three-dimensional structure and stability, and such changes are often key regulators of protein function. 1,2 Currently, several hundred PTMs are known, 3,4 though only a few, such as phosphorylation, acetylation, methylation, and glycosylation, are generally targeted in a proteomic analysis. 5 In addition, various sequence polymorphisms attributed to alternative splicing or single nucleotide polymorphism are commonly observed. 6 Moreover, a high number of peptide modifications can be introduced during sample preparation, and these modifications are often ignored or misinterpreted. Shotgun proteomics is a widely used tool for global analysis of protein modifications; 7 however, reliable and comprehensive identification of unknown modifications is one of the biggest challenges in experimental bioinformatics. In a typical LC MS/MS experiment, hundreds of thousands of tandem mass spectra are typically collected; however, only 10 20% of these spectra are interpreted using current approaches, 8 and it is expected that a number of unassigned MS/MS spectra may correspond to modified peptides. Regrettably, standard 219

220 database search programs, such as Sequest or Mascot, 9,10 are suitable only for identification of expected, user specified modifications. Analysis of sequence tags 11 or full de novo sequencing has been utilized for identification of modified peptides. 12 Recently, a new set of algorithms to search in an unrestricted manner for modifications has been developed. However, to reduce the search complexity and thus the search time, most of the unrestricted search algorithms target modification only for proteins identified in the sample rather than for all potentially present proteins. 13 ModifiComb, for example, allows identification of substoichiometrically modified peptides that elute from an RPLC gradient in close proximity to unmodified peptides. 14,15 Another algorithm, termed TwinPeaks, uses spectral convolution, similar to the Sequest search algorithm, to compute the cross-correlation between predicted and experimental spectra in a search for a bimodal pattern in the cross-correlation function. Such a pattern is indicative of the presence of fragment ions with constant mass shift relative to the corresponding unmodified peptide. 16 P-Mod calculates mass differences between search peptide sequences and a precursor and uses p-value statistics for estimating a sequence-to-ms/ms spectrum match. 17 In another approach, PTM Explorer works in conjunction with a standard database search algorithm but includes all modifications from UniMod 3 or other similar databases to facilitate identification of modified peptides. 18 Recently, Tsur et al. proposed a Smith-Waterman-based spectrum alignment algorithm that is applicable to either de novo peptide sequencing, assembly of 220

221 protein sequences from shotgun proteomic data or an unrestricted search for PTMs. 19,20 An extended framework targeting peptide modifications based on this alignment approach, termed PTM Finder, 21 as well as a strategy of selectively excluded mass screening analysis, SEMSA, 22 have very recently been published. This paper presents a new approach to the discovery of peptide modifications that is readily implemented in shotgun proteomics using low resolution MS/MS spectra. The approach is based on the coupling of an unrestricted search for peptide modifications with well established methods, allowing a broad search for modifications. SeMoP (search for modified peptides) is based on a three-step strategy: (1) a standard database search to identify proteins in a sample; (2) a comprehensive, unrestricted search to discover modified peptides within a specified mass range (±200 Da) with respect to the corresponding unmodified peptides using only identified proteins; and (3) a targeted search using a standard database search for verifying selected modifications found in step 2. A new algorithm is utilized for the unrestricted search for peptide modifications. The algorithm relies on detection of a constant shift between experimental and fragment ions predicted for unmodified peptides. The search generates a histogram, termed a M plot, that displays this constant shift for matches of experimental fragment ions. It is important to emphasize that the strategy does not require the identification of unmodified peptides nor the specification of the modifications before the search. Furthermore, the algorithm can identify multiple modifications per peptide, allow selection of the mass range for the search for 221

222 modifications, and, most importantly, is simple to apply since the only parameter that must be specified is the mass range with respect to the unmodified peptide. It should be noted that the unrestricted algorithm is used for detection of potential modifications present in the sample rather than for detection and validation of modified peptides. The result of the unrestricted search (step 2) is a list of modifications that are then targeted in a standard database search to verify peptides with these targeted modifications (step 3). To demonstrate the SeMoP strategy, we initially focus on high abundant glycosylated proteins in human plasma samples. It was anticipated that such proteins will yield high sequence coverage and thus have a high potential for observing modified peptides. Tryptic digest samples were analyzed by LC MS/MS, and a selected set of 65 glycoproteins, identified in all samples of the first discovery step, was identified for the unrestricted search for modified peptides. In this experiment, various modifications introduced during sample preparation such as desulfurization of cysteine, biologically induced modifications such as phosphorylation and methylation, and several amino acid substitutions have been identified. In a second experiment, 1367 proteins obtained from about SiHa cells (cervical cancer cell line) were processed using the SeMoP approach leading to the detection of novel amino acid substitutions. The approach is shown to be promising as a tool to find modified peptides SeMoP Protocol The SeMoP computational protocol consists of three steps schematically shown in 222

223 Figure 1. First, a standard database search is conducted using Sequest to identify proteins in the sample (step 1). Then, an unrestricted search for modifications is performed using an in-house developed algorithm (step 2). A list of all peptides generated by in silico digestion of identified proteins is used as an input for the algorithm. As a result, modifications for peptides already identified in step 1 as well as additional modified peptides can be found. When substoichiometric modifications are targeted where both the modified and unmodified versions of the same peptide are expected in the sample, the unrestricted search is performed only for peptides identified in step 1, thus significantly speeding up the analysis. It should be noted that the unrestricted search is performed only on selected experimental MS/MS spectra, i.e. high quality spectra that were not assigned to peptides in step 1. The result of the unrestricted search is a M plot that reveals candidate modifications. Specific modifications are selected for a targeted standard database search using again Sequest (step 3), and significant matches are verified using well established protocols. Importantly, the use of established tools to determine the significance of assignments of MS/MS spectra to modified peptides alleviates the need for the development of a new scoring algorithm for an unrestricted search. Furthermore, this final step typically leads to the detection of additional peptides with the targeted modification. Unrestricted Search for Peptide Modifications The algorithm for unrestricted searching relies on a comparison of the unmodified fragment list for the sequences with the fragment list created from experimental MS/MS spectra. Overlapped fragments are counted and interpreted as a measure of similarity via a M plot. A workflow diagram of the algorithm is presented in 223

224 Figure 2, and the basic principles of the algorithm are demonstrated in Figure 3. Step A, Figure 2 For each unmodified peptide, the algorithm calculates b-ion {b1,,bl} and y-ion {y1,, yl} fragments with a charge state up to that of the precursor ion and thenuses the m/z values of the fragment ions to create a list of mass differences of the form bij =bj bi and yij = yj yi, where i is from 1 to L 1, j from 2 to L, with the condition j > i. L is the peptide length. The indices of spectral peaks bi and yi are retained to define the exact location of generated peptide fragments bij and yij in the predicted spectrum. Only experimental MS/MS spectra which have the precursor ion within a specified window (±200 Da) are selected for further comparison, reducing substantially the number of matching operations. Each experimental spectrum is first baseline corrected and denoised by a Hann window 23 to allow for the identification of peaks from background noise, and then, peaks are detected using a quadratic polynomial fit. Step B, Figure 2 The algorithm uses a prescreening procedure to eliminate all peptide sequences that are unlikely to match by counting the number of exact matches between the m/z values of b- and y-ions from the predicted spectrum of an unmodified peptide to the m/z values found in the denoised spectrum within the mass tolerance of ± 0.7 Da. The unmodified fragment list for the sequences is matched to each MS/MS spectrum despite the mass difference between the peptide and precursor ion. Only spectral candidates with a specific predefined number N of matches of ions are selected as a measure of spectral similarity, while spectra not fulfilling this criterion are discarded. We set N = 4 for short peptides (8 10 residues), 15 and linearly adjust N for longer peptides (see Supporting 224

225 Information A, eq 1). The procedure strongly reduces the number of processed spectra in an extended search for modifications. This data reduction procedure is necessary to reduce the computer processing time since CPU- intensive operations are only conducted for spectra with a high likelihood of a match. Alternatively, when using a large number of nodes, this filtering procedure could be skipped leading to an improved sensitivity. On the other hand, it may also increase the false discovery rate. Next, an extended list of m/z values, defined by ppij= pj pi, where p represent a spectral peak in the denoised MS/MS spectrum and j > i is generated. Again, indices of spectral peaks pi of selected peptide fragments ppij are retained. Analogous to spectral convolution, a comparison of the mass lists from the experimental and predicted spectra of the form bij, yij; bi, yi pij; pi is performed for all masses within a specified mass range [ M to + M]. Fragments found to coincide in the experimental and predicted spectra within each bin, defined by the mass tolerance, are counted, and the accumulated number of fragment hits per bin is then plotted as the M plot, see Figure 3. It should be noted that since low resolution ion trap MS/MS spectra are examined, the bin size of M = 1 Da is selected. For high resolution MS/MS spectra, the bin size could be adjusted to account for improved mass accuracy. The M plot is then analyzed for the occurrence of major peaks using a Boolean acceptance criterion fhits based on four M plot specific parameters: (1) the number and (2) height of the major peaks, (3) their M locations (presence of a peak at M = 0 is required) and (4) the median of the number of fragment hits within the specified search window [ M, + M] to assign a match (for the definition of fhits see 225

226 Supporting Information A, eq 2). Step C, Figure 2 Spectra passing the fhits criterion are considered at this point as candidate matches. The mass difference between the experimental precursor and the predicted peptide mass is calculated and compared to the M plot. Only in the case of the presence of a peak at the same mass difference in the M plot will a peptide match be considered significant and reported in the final list of hits. The algorithm is implemented in LabView 8.0 (National Instruments, Austin, TX) and compiled in an executable file. To facilitate rapid processing, the algorithm is written in a form suitable for distributed computing, which is highly scalable and does not require additional software for operation. Thus, the algorithm can be deployed on networking computers running, for example, an MS Windows operating system. The searches in this paper were carried out on a 12-CPU computer cluster and the processing of 1670 peptides and roughly MS/MS spectra collected from the human plasma sample required about 60 h. The 4393 peptides and spectra of the cancer cell line sample were processed in 145 h on the cluster. Importantly, the algorithm was designed for distributed computing, and thus the implementation on a larger computer cluster would lead to a significant reduction in processing time. The program is freely available upon request. Experimental Section Sample Preparation and LC MS Human Plasma Sample Six normal plasma samples were obtained from 226

227 Bioreclamation (Hicksville, NY) were initially used to assess the SeMoP strategy. Samples were analyzed using the work flow described elsewhere. 24 Briefly, twelve highly abundant plasma proteins were depleted using a commercial immunoaffinity column (Beckman, Fullerton, CA), with subsequent enrichment of glycosylated proteins by an in-house developed Multi-Lectin Affinity Column (M- LAC). 25 Following denaturation, reduction, and alkylation with iodoacetamide, samples were digested with trypsin. After desalting with a reversed phase column, the digested samples were analyzed by 75 µm i.d. RPLC column with a one hour gradient at 200 nl/min flow rate using a hybrid linear ion trap FT mass spectrometer (Thermo Fisher, San Jose, CA). It should be noted that LTQ FT instrument was operated in the high resolution MS mode (full profile mode) while the MS/MS spectra were obtained using low resolution (centroid mode). To demonstrate compatibility with broadly available LTQ instruments, the high mass accuracy of the precursor ions was not utilized in SeMoP strategy. SiHa Cell Line Sample preparation and analysis were described in detail elsewhere. 26 Briefly, an amount of total protein corresponding to a lysate obtained from roughly 10,000 SiHa cells (cervical cancer cell line) was separated by SDS- PAGE. The SDS-PAGE lane was divided into three sections that were individually in-gel digested and analyzed by 75 µm i.d. RPLC column using a one hour gradient at 200 nl/min flow rate with a hybrid linear ion trap FT mass spectrometer (Thermo Fisher). Determination of the False Discovery and Identification Success 227

228 Rate To determine the rate of false discovery modifications from the unrestricted search, a series of searches was conducted against a database consisting of sequences of the 65 selected human plasma proteins and against a separate decoy database consisting of randomized sequences of these 65 human proteins. All hits in the randomized database were considered false identifications. 27,28 To determine the identification success rate (sensitivity) of our algorithm for an unrestricted search, selected peptide modifications were specified in a subsequent Sequest database search. The results provided by Sequest were filtered using XCorr values greater than 1.9, 2.2, and 3.8 for 1+, 2+, and 3+, respectively. Because only a very small database (65 proteins) was used in this calculation, it was expected that, by applying the above strict criteria, the level of random identifications would be very low. Significant hits from the Sequest search were then compared with results obtained using our in-house algorithm for an unrestricted search to calculate the identification success rate as [# of hits from the unrestricted search]/[# of hits from the Sequest search]. 28 Results and Discussion In this study, we applied a new computational protocol, SeMoP, for an unrestrictive investigation of peptide modifications from 64 human and 1 bovine high abundant plasma glycoproteins and 1367 proteins of a cervical cancer cell line. Roughly MS/MS spectra of the plasma and MS/MS spectra of the cell line sample derived from a shotgun proteomic experiment were searched for modifications using sequences of unmodified peptides. Both expected and 228

229 unusual modifications were detected. In particular, the human plasma data set served as a model for demonstrating the feasibility of our approach, estimating the identification success and false discovery rate of the method. Analysis of Human Plasma Proteins Following step one of our protocol to identify proteins in the samples that would be used for unrestricted search, see Figure 1, a Sequest Cluster search engine within a CPAS system 29 was used for the database search of approximately MS/MS spectra collected for the LC/MS analysis of 6 plasma samples. The search was conducted with a precursor mass tolerance of 2.5 Da and a fragment mass tolerance of 1 Da against the Swiss-Prot human protein database (release 52 with protein sequences, downloaded in April 2007), appended with reversed protein sequences to facilitate the estimation of the discovery rate. Trypsin was specified as the digestion enzyme with up to 2 missed cleavages. Carbamidomethylation of cysteines was specified as the only modification in this search. Matches were accepted with peptide probabilities, estimated using a PeptideProphet value greater than 0.9 and Xcorr values greater than 1.9, 2.2, and 3.8 for 1+, 2+, and 3+ charged ions, respectively. On average, roughly 100 proteins were identified per sample. Out of the identified proteins, 64 highly abundant plasma glycoproteins plus the spiked internal standard, bovine fetuin, identified in all 6 samples, were selected for the subsequent unrestricted search. The list of these selected proteins is provided in the Supporting Information B. An in-house written perl script was used to generate a total of 1670 fully tryptic peptides (no missed cleavages) within a mass range from 700 to 3500 Da, and 229

230 these peptide sequences were selected as input for the unrestricted search (Figure 1, steps 1 and 2). Out of a total of 185,000 MS/MS spectra, 56,065 in the initial Sequest search yielded cross-correlation values greater than 1.0, 1.5, and 2.0 for 1+, 2+, and 3+ charged ions, respectively, and these spectra were selected for the unrestricted search. This procedure allowed preferential selection of MS/MS spectra corresponding to peptide fragmentation. The MS/MS spectra of peptides identified in the Sequest search included peptides with carbamidomethylated cysteines that were retained in the data set to facilitate estimation of sensitivity of the unrestricted search. To illustrate the new search algorithm, the M plots (1 Da resolution, mass range between ± 200 Da) for an MS/MS spectrum that corresponds to the unmodified and modified peptide, MATTMIQSK, is shown in Figure 4. The single major peak at M = 0 in Figure 4A indicates correspondence between the predicted and observed fragments for the unmodified peptide as the majority of high intensity peaks in the experimental MS/MS spectrum could be directly explained by b- and y-ion fragments. Minor peaks in this plot could be attributed to noise (see the quality parameter for a peptide match in Supporting Information A, eq 3). (Random matches of peptide fragments could be suppressed using high mass accuracy MS/MS spectra.) On the other hand, the M plot for an MS/MS spectrum corresponding to the modified peptide, shown in Figure 4B, exhibited two peaks, one at M = 0 and the other at M = 48 Da. The second peak clearly shows the presence of fragment ions in the experimental MS/MS spectrum that have a constant shift with respect to the predicted fragments of unmodified peptides, indicative of a 230

231 modification. Further examination revealed that the mass shift can be interpreted as a dethiomethylation of oxidized methionine, a relatively uncommon modification reported in the UniMod database. 3 Out of MS/MS spectra processed, 2076 were reported as modified peptides (Step 2, Figure 1), as summarized in Figure 5 and, in more detail, Table 1. Despite the low mass accuracy of the LTQ instrument, a unique assignment was achieved for the majority of modifications including expected PTMs, missed peptide cleavages and sequence polymorphisms. The most common modifications were on cysteine residues that were introduced during sample preparation such as carbamidomethylation (+57 Da) from reduction and alkylation, the rarely reported N-terminal carboxyamidomethyl-cys cyclization (+40 Da) 30 and desulfurization ( 34 Da), 31 an unusual cysteine modification. It should be noted again that carbamidomethylated peptides were not excluded from the significant hits, since the high number of matches for this modification was found to be useful for estimating the identification success and false discovery rate after the targeted database search. Among biological modifications found in these samples, methylation and phosphorylation were the most common. Interestingly, several glutamic acid, isoleucine, and alanine substitutions were identified as well. To evaluate the third step of the SeMoP strategy, see Figure 1, we next selected several frequent modifications out of the list of all identified modifications and conducted a targeted search using Sequest. Examples of modifications identified in the unrestricted search and comparison with the results of the subsequent Sequest search are presented below. 231

232 Evaluation of Modifications Identified Using Unrestricted Search for Plasma Sample For evaluation of the performance of the unrestricted search algorithm, MS/MS spectra identified in step 1 were retained in the data set. As a result, the majority of modifications in the unrestricted search (1171 of 2076) were found to be carbamidomethylations (+57 Da) of cysteine. In total, 97.7% of these peptides were modified singly, 2.3% doubly and several triply. Importantly, roughly 1% of the cysteine containing peptides were found without any modification, indicating a low level of peptides with unreacted cysteines. Alkylation of N- terminal residues or histidines was not observed. MS/MS spectra assigned to multiple carbamidomethylations can be used to illustrate the abilities of the unrestricted search algorithm to identify multiple modifications per peptide. As an example, Figure 6A presents a M plot for a peptide with three cysteines modified with carbamidomethylation. As can be seen, the three major peaks reveal the presence of fragment ions for one, two, and three modifications (+57, +114, +171 Da). In addition, a peak at zero mass shift is observed, corresponding to the fragmentation of the unmodified residues. In silico MS/MS spectra of the peptide with three mass shifts of 57 Da introduced at various positions were generated and M plots recalculated. The result is shown in Figure 6B in which the modifications are positioned on the cysteine residues, leading to a single high intensity peak with zero mass shift, indicating agreement between the experimental and the predicted MS/MS spectra. Similar examples of 232

233 multiple modifications were also found for the combination of a variety of modifications with different masses, such as carbamidomethylation of cysteines and methionine oxidation. Interestingly, Figure 5 reveals two other highly abundant cysteine modifications in this data set. One is N-terminal cyclization of carbamidomethylated cysteine (+40 Da), observed in 3.2% of the modified peptides, a known alteration but rarely reported. 30 The second is an unusual loss of 34 Da from cysteine found in 1.7% of all modifications. This latter change could be explained by the conversion of cysteine to dehydroalanine through β-elimination of disulfide bridges and is likely a result of heating during sample preparation Two of the three identified cysteine modifications, (1) carbamidomethylation (+57 Da) and (2) desulfurization ( 34 Da), were subjected to a targeted database search using Sequest to verify identified peptides and also to assess the identification success rate (sensitivity) of our unrestricted search algorithm (step 3). The targeted search led to identification of 4703 peptides modified by carbamidomethylation, or roughly 3-fold increase compared to the number of peptides identified in the unrestricted search with the same modification (1171). Importantly, all carbamidomethylated peptides found in the unrestricted search were also identified in the targeted search demonstrating a high specificity of the unrestricted search. Similarly, all 36 peptides identified by the unrestricted search with desulfurized cysteine were among 141 peptides found by the targeted search demonstrating a similar gain in sensitivity for the targeted search and again a high specificity of the unrestricted search. The relatively low sensitivity of the 233

234 unrestricted algorithm for both modifications is likely a result of conservative filtering to minimize the false discovery rate, and thus the number of modifications to be validated, see next section. Besides cysteine modifications, several peptides in Table 1 were detected with a +80 Da mass shift associated with serine residues, indicating phosphorylation. A subsequent targeted database search using Sequest with potential phosphorylation of S, T, and Y residues confirmed the presence of phosphorylation sites for 4 peptides of ITIH2_HUMAN and 3 peptides of FETUA_BOVIN using the unrestricted search. The targeted search lead to identification of 5 additional phosphorylation sites, two in LAC_HUMAN, one in HEP2_HUMAN, one in KAC_HUMAN, and one in ITIH2_HUMAN. Importantly, well-known phosphorylation sites in ITIH2_HUMAN and FETUA_BOVIN were found, thus confirming correct matches. In addition, an increase in the number of modified peptides was achieved using the targeted database search. Other modifications found using the unrestricted search include loss of ammonia, which was detected for 17 different peptides from 69 total MS/MS spectra. The loss of ammonia can be explained by formation of pyroglyutamic acid from glutamine. A targeted Sequest search for this modification led to detection of 25 different peptides (47% increase) with a total of 106 matches (54% increase). Interestingly, several amino acid substitutions were detected as well, see Table 1. Peptide MYYSAVDPTK, derived from copper binding oxidoreductase ceruloplasmin, was detected with a D E substitution, which is a known mutation that may present higher risk for Parkinson disease. 34 In another example, peptide 234

235 FNKPFVFLMIEQNTK, belonging to α-1-antitrypsin, was found with an E D substitution, which is a known variant of this protein. 35 In summary, a variety of peptide modifications including sample preparationinduced modifications as well as biologically relevant modifications, such as phosphorylation and sequence polymorphisms, were identified using an unrestricted search, followed by a targeted database search. The result illustrates the usefulness of the SeMoP approach to detect various classes of expected as well as unusual modifications. Estimation of the False Discovery Rate (FDR) of the Unrestricted Search Compared to the standard database search, the unrestricted search often leads to an increased rate of false identifications. Thus, care must be exercised when establishing the acceptance criteria for significant matches. We analyzed all significant matches reported by the unrestricted algorithm, and it was found that, out of 2076 detected modified peptides, 1890 could be assigned to known types of modifications using manual evaluation of MS/MS spectra. Next, we investigated the remaining pool of 186 matches (9%) that could not be confidently identified. It was found that roughly half of these spectra could be attributed to previously identified peptides but with an insufficient quality criterion for the MS/MS spectra to allow unambiguous identification, while the other half of the MS/MS spectra were attributed to random matches (90 out of 2076 detected modified peptides). Thus, an empirical false discovery rate was estimated to be roughly 5% of the total number of significant matches. 235

236 To determine the number of randomly assigned peptides more accurately, a decoy database search was employed. 27 Analogous to the standard database search, the unrestricted search was repeated against a database of peptides derived from proteins with random sequences. These peptides were generated by in-silico tryptic digestion of randomly reshuffled protein sequences; random rather than reversed protein sequences were used in this case to minimize the chances of creating homologous peptides. All positive matches in the decoy database were considered to be incorrect and their quantity to be a measure of false discovery rate. 28 This search returned 98 significant hits, which is 4.7% of the total peptide matches, and these findings agree with our estimated empirical false discovery rate of 5%. For more information on the performance of our unrestricted search algorithm, see Supporting Information A, Tables 4A B. In summary, the empirical as well as the statistically estimated false discovery rate were found to be in good agreement. Moreover, the false discovery rate below 5% allows for high confident identification of modified peptides without the need for further validation of matches. Evaluation of Modifications Identified Using Unrestricted Search for SiHa Cells To evaluate further the performance of SeMoP strategy, we investigated peptide modifications derived from a cervical cancer cell line (SiHa). Initially, all collected MS/MS spectra were analyzed by Sequest using the same filtering criteria as for the plasma protein samples, leading to the identification of a total of 1367 proteins (step 1). Only a selected set of MS/MS spectra with specific XCorr values was utilized for the unrestricted search ( out of MS/ MS spectra), and 236

237 MS/MS spectra already assigned to peptides using the Sequest search were retained in the data set. Compared to the plasma analysis where all potential peptides corresponding to proteins identified were considered, only substoichiometric modifications were targeted in this experiment and thus, only peptides identified by the Sequest algorithm (a total of 4393 peptides) with molecular weight ranging from 700 to 3500 Da were used in the unrestricted search. Table 2A summarizes the modifications identified using the unrestricted search, most of the which are expected such as loss of ammonia. Nevertheless, several unusual modifications were found. For example, two peptides, VSFELFADKVPK from peptidyl-prolyl cis trans isomerase A, a protein accelerating protein folding, and ISGLIYEETR from the histone H4 family, showed a +26 Da modification on a serine residue, which could be interpreted as a substitution of serine to leucine or isoleucine. Modification of +26 Da for these peptides has not yet been reported, but interestingly both serines were listed in the UniProt database to be potentially phosphorylated possibly suggesting adduct formation after beta elimination of the phosphate group. It is important to note that more modifications could be found with the analysis of a greater number of MS/MS spectra along with less stringent criteria, the latter of which would, however, increase the false discovery rate. Following the SeMoP strategy, a targeted database search (step 3) of selected modifications, listed in Table 2B, was conducted to verify detected modifications, and to search for further peptide candidates with these modifications. All 10 selected peptide candidates, corresponding to 7 different modifications, e.g. 237

238 tryptophan oxidation and formation of pyroglutamic acid, were confirmed by the targeted database search. As expected, an additional number of different peptides with the same modification, on average 5-times greater, was detected as well, demonstrating the usefulness of the targeted step. For example, cyclization of N- term carbamidomethylated cysteine was detected only once using the unrestricted database search, compared to the assignment of six MS/MS spectra to the same modified peptides accomplished in the targeted search. Annotated MS/MS spectra of all peptides listed in Table 2B are provided in Supporting Information C. Several proteins found in this study were homologous, and thus several peptides identified in the initial Sequest search differed by only a single amino acid residue. Since the algorithm for unrestricted search considers amino acid substitution simply a modification, for homologous peptides a single MS/MS spectrum should be assigned to all homologous peptides with the modification corresponding to their amino acid substitutions. We tested the ability of the unrestricted search to find such homologous peptides. For example, an MS/MS spectrum that was matched directly to peptide IMNTFSVMPSPK from tubulin beta-2a chain with no modification was also matched to a modified peptide IMNTFSVM 32 PSPK. However, such a modification could be explained by M V substitution (a difference of 32 Da) and indeed peptide IMNTFSVVPSPK can be found in a homologous protein, tubulin beta-2c chain, also identified in the sample. Table 3 summarizes all identified homologous peptides found in this data set using the unrestricted search, denoting modification sites and mass shifts of detected amino acid substitutions. The results show that all matches determined using the 238

239 unrestricted database search were verified by homologous peptides. In addition, the exact modification sites of all amino acid substitutions could be determined, demonstrating the ability of the approach to identify single amino acid modifications. In summary, these results further confirm the usefulness of our unrestricted search algorithm. Conclusions In this paper, we have presented a straightforward strategy, SeMoP, for the discovery and verification of peptide modifications from LC MS/MS data in shotgun proteomics data processing. SeMoP relies on coupling standard database searching to identify proteins present in the sample with a new algorithm for an unrestricted search of peptide modifications, followed by a second standard database search using modifications discovered by the unrestricted search. Importantly, since the standard database search is used for identification of modified peptides, well-established algorithms can be applied to determine significance of matches. In addition, due to the high sensitivity of standard database search, SeMoP leads to identification of a greater number of modified peptides than unrestricted search alone. SeMoP was applied to data sets of human plasma proteins and a cervical cancer cell line, detecting a number of expected as well as some unusual peptide modifications. The SeMoP approach utilizes a user specified mass range, ±200 Da, to search for peptide modifications. This mass range can be modified based on the application and the type of MS instrumentation used. We have demonstrated the feasibility of the approach using as MS widely 239

240 employed low mass resolution ion trap. High mass accuracy in the MS and, even more importantly in the MS/MS mode, could provide a significant improvement and permit not only discrimination between modifications with a similar mass but also provide stricter criteria leading to a decrease of false matches. 36 A simple procedure employing a targeted database search was used to assess the sensitivity of the unrestricted algorithm. In addition, the false discovery rate of the unrestricted was estimated using a search against database with random protein sequences. In addition, SeMoP strategy could be readily adapted to novel fragmentation techniques such as ETD. In summary, our experimental results demonstrate that SeMoP is a useful and simple tool for global analysis of protein modifications in shotgun proteomics with the potential to be extended to high-mass accuracy MS/MS data. Acknowledgements We thank the NIH GM15847 for support of this work and Mr. D. Wang of the Barnett Institute for providing SiHa cell LC MS/MS data. C.B. thanks the Max Kade Foundation and the Austrian Genome Program (GEN-AU, BIN II) for support of a fellowship. Contribution 920 from the Barnett Institute. 240

241 References 1. Witze ES, Old WM, Resing KA, Ahn NG. Mapping protein post-translational modifications with mass spectrometry. Nat Methods 2007;4(10): [PubMed: ] 2. Pang CN, Hayen A, Wilkins MR. Surface accessibility of protein posttranslational modifications. J Proteome Res 2007;6(5): [PubMed: ] 3. Creasy DM, Cottrell JS. Unimod: Protein modifications for mass spectrometry. Proteomics 2004;4(6): [PubMed: ] 4. Farriol-Mathis N, Garavelli JS, Boeckmann B, Duvaud S, Gasteiger E, Gateau A, Veuthey AL, Bairoch A. Annotation of post-translational modifications in the Swiss- Prot knowledge base. Proteomics 2004;4(6): [PubMed: ] 5. Larsen MR, Trelle MB, Thingholm TE, Jensen ON. Analysis of posttranslational modifications of proteins by tandem mass spectrometry. Biotechniques 2006;40(6): [PubMed: ] 6. Roth MJ, Forbes AJ 2nd, Kim YB, Robinson DE, Kelleher NL. Precise and parallel characterization of coding polymorphisms, alternative splicing, and modifications in human proteins by mass spectrometry. Mol Cell Proteomics 2005;4(7): [PubMed: ] 7. MacCoss MJ, McDonald WH, Saraf A, Sadygov S, Clark JM, Tasto JJ, Gould KL, Wolters D, Washburn M, Weiss A, Clark JI, Yates JR. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc Natl Acad Sci USA 2002;99(12): [PubMed: ] 8. MacCoss MJ. Computational analysis of shotgun proteomics data. Curr Opin 241

242 Chem Biol 2005;9(1): [PubMed: ] 9. Eng JK, McCormick AL, Yates JR III. An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. J Am Soc Mass Spec 1994;5: Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectroscopy data. Electrophoresis 1999;20(18): [PubMed: ] 11. Tabb DL, Saraf A, Yates JR. GutenTag: High-Throughput Sequence Tagging via an Empirically Derived Fragmentation Model. Anal Chem 2003;75(23): [PubMed: ] 12. Searle BC, Dasari S, Wilmarth PA, Turner M, Reddy AP, David LL, Nagalla SR. Identification of protein modifications using MS/MS de novo sequencing and the OpenSea alignment algorithm. J Proteome Res 2005;4(2): [PubMed: ] 13. Craig R, Beavis RC. A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun Mass Spectrom 2003;17(20): [PubMed: ] 14. Nielsen ML, Savitski MM, Zubarev RA. Extent of modifications in human proteome samples and their effect on dynamic range of analysis in shotgun proteomics. Mol Cell Proteomics 2006;5(12): [PubMed: ] 15. Savitski MM, Nielsen ML, Zubarev RA. ModifiComb, a new proteomic tool for mapping substoichiometric post-translational modifications, finding novel types of modifications, and fingerprinting complex protein mixtures. Mol Cell Proteomics 2006;5(5): [PubMed: ] 16. Havilio M, Wool A. Large-scale unrestricted identification of post-translation modifications using tandem mass spectrometry. Anal Chem 2007;79(4): [PubMed: ] 17. Hansen BT, Davey SW, Ham AJ, Liebler DC. P-Mod: an algorithm and software to map modifications to peptide sequences using tandem MS data. J Proteome Res 2005;4(2): [PubMed: ] 18. Chamrad DC, Korting G, Schafer H, Stephan C, Thiele H, Apweiler R, Meyer HE, Marcus K, Bluggel M. Gaining knowledge from previously unexplained spectra-application of the PTM-Explorer software to detect PTM in HUPO BPP MS/MS data. Proteomics 2006;6(18): [PubMed: ] 19. Tsur D, Tanner S, Zandi E, Bafna V, Pevzner PA. Identification of posttranslational modifications by blind search of mass spectra. Nat Biotechnol 2005;23(12): [PubMed: ] 20. Bandeira N, Tsur D, Frank A, Pevzner PA. Protein identification by spectral networks analysis. Proc Natl Acad Sci USA 2007;104(15): [PubMed: ] 21. Tanner S, Payne SH, Dasari S, Shen Z, Wilmarth PA, David LL, Loomis WF, Briggs SP, Bafna V. Accurate Annotation of Peptide Modifications through 242

243 Unrestrictive Database Search. J Proteome Res 2007;7(1): [PubMed: ] 22. Seo J, Jeong J, Kim YM, Hwang N, Paek E, Lee KJ. Strategy for Comprehensive Identification of Post-translational Modifications in Cellular Proteins, Including Low Abundant Modifications: Application to Glyceraldehyde- 3-phosphate Dehydrogenase. J Proteome Res 2008;7(2): [PubMed: ] 23. Oppenheim, AV.; Schafer, RW. Discrete-Time Signal Processing. Prentice-Hall; Upper Saddle River, NJ: p Plavina T, Wakeshull E, Hancock WS, Hincapie MJ. Combination of Abundant Protein Depletion and Multi-Lectin Affinity Chromatography (M-LAC) for Plasma Protein Biomarker Discovery. J Proteome Res 2007;6(2): [PubMed: ] 25. Yang Z, Hancock WS. Approach to the comprehensive analysis of glycoproteins isolated from human serum using a multi-lectin affinity column. J Chromatogr A 2004;1053: [PubMed: ] 26. Gu Y, Wu SL, Meyer JL, Hancock WS, Burg LJ, Linder J, Hanlon DW, Karger BL. Proteomic analysis of high-grade dysplastic cervical cells obtained from ThinPrep slides using laser capture microdissection and mass spectrometry. J Proteome Res 2007;6(11): [PubMed: ] 27. Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 2007;4(3): [PubMed: ] 28. Kall L, Storey JD, MacCoss MJ, Noble WS. Posterior Error Probabilities and False Discovery Rates: Two Sides of the Same Coin. J Proteome Res 2008;7(1): [PubMed: ] 29. Rauch A, Bellew M, Eng J, Fitzgibbon M, Holzman T, Hussey P, Igra M, Maclean B, Lin CW, Detter A, Fang R, Faca V, Gafken P, Zhang H, Whitaker J, States D, Hanash S, Paulovich A, McIntosh MW. Computational Proteomics Analysis System (CPAS): an extensible, open-source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J Proteome Res 2006;5(1): [PubMed: ] 30. Glazer, AN.; Delange, RJ.; Sigman, DS. Chemical modification of proteins: Selected methods and analytical procedures. Work, TS.; Work, E., editors. North Holland; Amsterdam: p Ch Pentelute BL, Kent SB. Selective desulfurization of cysteine in the presence of Cys(Acm) in polypeptides obtained by native chemical ligation. Org Lett 2007;9(4): [PubMed: ] 32. Cohen SL, Price C, Vlasak J. Beta-elimination and peptide bond hydrolysis: two distinct mechanisms of human IgG1 hinge fragmentation upon storage. J Am Chem Soc 2007;129(22): [PubMed: ] 243

244 33. Rejtar, T.; Baumgartner, C.; Kullolli, M.; Karger, BL. Beta-elimination of disulfide bridges: a common sample preparation induced protein modification. Proceeding of the 56th ASMS Conference; Denver, CO Hochstrasser H, Tomiuk J, Walter U, Behnke S, Spiegel J, Krüger R,Becker G, Riess O, Berg D. Functional relevance of ceruloplasmin mutations in Parkinson's disease. FASEB J 2005;19(13): [PubMed: ] 35. Graham A, Hayes K, Weidinger S, Newton CR, Markham AF, Kalsheker NA. Characterisation of the alpha-1-antitrypsin M3 gene, a normal variant. Hum Genet 1990;85(3): [PubMed: ] 36. Shen Y, Tolic N, Hixson KK, Purvine SO, Pasa-Tolic L, Qian WJ, Adkins JN, Moore RJ, Smith RD. Proteome-Wide Identification of Proteins and Their Modifications with Decreased Ambiguities and Improved False Discovery Rates Using Unique Sequence Tags. Anal Chem 2008;80(6): [PubMed: ] Figure 1. Workflow diagram of the SeMoP protocol. 244

245 245

246 Figure 2. Flow diagram of the algorithm for unrestricted search for peptide modifications. The terms shifted and non-shifted refer to constant mass shifts and zero mass shifts between experimental and predicted fragment ions. 246

Mass Spectrometry. Mass spectrometer MALDI-TOF ESI/MS/MS. Basic components. Ionization source Mass analyzer Detector

Mass Spectrometry. Mass spectrometer MALDI-TOF ESI/MS/MS. Basic components. Ionization source Mass analyzer Detector Mass Spectrometry MALDI-TOF ESI/MS/MS Mass spectrometer Basic components Ionization source Mass analyzer Detector 1 Principles of Mass Spectrometry Proteins are separated by mass to charge ratio (limit

More information

Lecture 3. Tandem MS & Protein Sequencing

Lecture 3. Tandem MS & Protein Sequencing Lecture 3 Tandem MS & Protein Sequencing Nancy Allbritton, M.D., Ph.D. Department of Physiology & Biophysics 824-9137 (office) nlallbri@uci.edu Office- Rm D349 Medical Science D Bldg. Tandem MS Steps:

More information

2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry

2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry Dr. Sanjeeva Srivastava 1. Fundamental of Mass Spectrometry Role of MS and basic concepts 2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry 2 1 MS basic concepts Mass spectrometry - technique

More information

MALDI-TOF. Introduction. Schematic and Theory of MALDI

MALDI-TOF. Introduction. Schematic and Theory of MALDI MALDI-TOF Proteins and peptides have been characterized by high pressure liquid chromatography (HPLC) or SDS PAGE by generating peptide maps. These peptide maps have been used as fingerprints of protein

More information

Agilent Protein In-Gel Tryptic Digestion Kit

Agilent Protein In-Gel Tryptic Digestion Kit Agilent 5188-2749 Protein In-Gel Tryptic Digestion Kit Agilent Protein In-Gel Tryptic Digestion Kit Instructions Kit Contents The Protein In-Gel Tryptic Digestion Kit includes sufficient reagents for approximately

More information

Biological Mass Spectrometry. April 30, 2014

Biological Mass Spectrometry. April 30, 2014 Biological Mass Spectrometry April 30, 2014 Mass Spectrometry Has become the method of choice for precise protein and nucleic acid mass determination in a very wide mass range peptide and nucleotide sequencing

More information

PTM Discovery Method for Automated Identification and Sequencing of Phosphopeptides Using the Q TRAP LC/MS/MS System

PTM Discovery Method for Automated Identification and Sequencing of Phosphopeptides Using the Q TRAP LC/MS/MS System Application Note LC/MS PTM Discovery Method for Automated Identification and Sequencing of Phosphopeptides Using the Q TRAP LC/MS/MS System Purpose This application note describes an automated workflow

More information

Metabolomics: quantifying the phenotype

Metabolomics: quantifying the phenotype Metabolomics: quantifying the phenotype Metabolomics Promises Quantitative Phenotyping What can happen GENOME What appears to be happening Bioinformatics TRANSCRIPTOME What makes it happen PROTEOME Systems

More information

REDOX PROTEOMICS. Roman Zubarev.

REDOX PROTEOMICS. Roman Zubarev. REDOX PROTEOMICS Roman Zubarev Roman.Zubarev@ki.se Physiological Chemistry I, Department for Medical Biochemistry & Biophysics, Karolinska Institutet, Stockholm What is (RedOx) Proteomics? Proteomics -

More information

Ion Source. Mass Analyzer. Detector. intensity. mass/charge

Ion Source. Mass Analyzer. Detector. intensity. mass/charge Proteomics Informatics Overview of spectrometry (Week 2) Ion Source Analyzer Detector Peptide Fragmentation Ion Source Analyzer 1 Fragmentation Analyzer 2 Detector b y Liquid Chromatography (LC)-MS/MS

More information

Proteomics/Peptidomics

Proteomics/Peptidomics Proteomics/Peptidomics System biology tools and preclinical models for translational research in endometriosis, ESHRE Campus workshop, 4-5 September 2009 E. Waelkens Proteomics: What? Proteins Proteomics

More information

Solving practical problems. Maria Kuhtinskaja

Solving practical problems. Maria Kuhtinskaja Solving practical problems Maria Kuhtinskaja What does a mass spectrometer do? It measures mass better than any other technique. It can give information about chemical structures. What are mass measurements

More information

Glycosylation analysis of blood plasma proteins

Glycosylation analysis of blood plasma proteins Glycosylation analysis of blood plasma proteins Thesis booklet Eszter Tóth Doctoral School of Pharmaceutical Sciences Semmelweis University Supervisor: Károly Vékey DSc Official reviewers: Borbála Dalmadiné

More information

Introduction to Proteomics 1.0

Introduction to Proteomics 1.0 Introduction to Proteomics 1.0 CMSP Workshop Pratik Jagtap Managing Director, CMSP Objectives Why are we here? For participants: Learn basics of MS-based proteomics Learn what s necessary for success using

More information

Mass Spectrometry. - Introduction - Ion sources & sample introduction - Mass analyzers - Basics of biomolecule MS - Applications

Mass Spectrometry. - Introduction - Ion sources & sample introduction - Mass analyzers - Basics of biomolecule MS - Applications - Introduction - Ion sources & sample introduction - Mass analyzers - Basics of biomolecule MS - Applications Adapted from Mass Spectrometry in Biotechnology Gary Siuzdak,, Academic Press 1996 1 Introduction

More information

Biological Mass spectrometry in Protein Chemistry

Biological Mass spectrometry in Protein Chemistry Biological Mass spectrometry in Protein Chemistry Tuula Nyman Institute of Biotechnology tuula.nyman@helsinki.fi MASS SPECTROMETRY is an analytical technique that identifies the chemical composition of

More information

Comparison of mass spectrometers performances

Comparison of mass spectrometers performances Comparison of mass spectrometers performances Instrument Mass Mass Sensitivity resolution accuracy Quadrupole 1 x 10 3 0.1 Da* 0.5-1.0 pmol DE-MALDI 2 x 10 4 20 ppm 1-10 fmol peptide 1-5 pmol protein Ion

More information

NIH Public Access Author Manuscript J Proteome Res. Author manuscript; available in PMC 2014 July 05.

NIH Public Access Author Manuscript J Proteome Res. Author manuscript; available in PMC 2014 July 05. NIH Public Access Author Manuscript Published in final edited form as: J Proteome Res. 2013 July 5; 12(7): 3071 3086. doi:10.1021/pr3011588. Evaluation and Optimization of Mass Spectrometric Settings during

More information

In-Gel Tryptic Digestion Kit

In-Gel Tryptic Digestion Kit INSTRUCTIONS In-Gel Tryptic Digestion Kit 3747 N. Meridian Road P.O. Box 117 Rockford, IL 61105 89871 1468.2 Number Description 89871 In-Gel Tryptic Digestion Kit, sufficient reagents for approximately

More information

1. Sample Introduction to MS Systems:

1. Sample Introduction to MS Systems: MS Overview: 9.10.08 1. Sample Introduction to MS Systems:...2 1.1. Chromatography Interfaces:...3 1.2. Electron impact: Used mainly in Protein MS hard ionization source...4 1.3. Electrospray Ioniztion:

More information

New Instruments and Services

New Instruments and Services New Instruments and Services Liwen Zhang Mass Spectrometry and Proteomics Facility The Ohio State University Summer Workshop 2016 Thermo Orbitrap Fusion http://planetorbitrap.com/orbitrap fusion Thermo

More information

Mass Spectrometry and Proteomics - Lecture 4 - Matthias Trost Newcastle University

Mass Spectrometry and Proteomics - Lecture 4 - Matthias Trost Newcastle University Mass Spectrometry and Proteomics - Lecture 4 - Matthias Trost Newcastle University matthias.trost@ncl.ac.uk previously Peptide fragmentation Hybrid instruments 117 The Building Blocks of Life DNA RNA Proteins

More information

Learning Objectives. Overview of topics to be discussed 10/25/2013 HIGH RESOLUTION MASS SPECTROMETRY (HRMS) IN DISCOVERY PROTEOMICS

Learning Objectives. Overview of topics to be discussed 10/25/2013 HIGH RESOLUTION MASS SPECTROMETRY (HRMS) IN DISCOVERY PROTEOMICS HIGH RESOLUTION MASS SPECTROMETRY (HRMS) IN DISCOVERY PROTEOMICS A clinical proteomics perspective Michael L. Merchant, PhD School of Medicine, University of Louisville Louisville, KY Learning Objectives

More information

Introduction to LC/MS/MS

Introduction to LC/MS/MS Trends in 2006 Introduction to LC/MS/MS By Crystal Holt, LC/MS Product Specialist, Varian Inc. Toxicology laboratories Increased use of LC/MS Excellent LD Cheaper (still expensive) Much more robust Solves

More information

New Instruments and Services

New Instruments and Services New Instruments and Services http://planetorbitrap.com/orbitrap fusion Combining the best of quadrupole, Orbitrap, and ion trap mass analysis in a revolutionary Tribrid architecture, the Orbitrap Fusion

More information

The distribution of log 2 ratio (H/L) for quantified peptides. cleavage sites in each bin of log 2 ratio of quantified. peptides

The distribution of log 2 ratio (H/L) for quantified peptides. cleavage sites in each bin of log 2 ratio of quantified. peptides Journal: Nature Methods Article Title: Corresponding Author: Protein digestion priority is independent of their abundances Mingliang Ye and Hanfa Zou Supplementary Figure 1 Supplementary Figure 2 The distribution

More information

Shotgun Proteomics MS/MS. Protein Mixture. proteolysis. Peptide Mixture. Time. Abundance. Abundance. m/z. Abundance. m/z 2. Abundance.

Shotgun Proteomics MS/MS. Protein Mixture. proteolysis. Peptide Mixture. Time. Abundance. Abundance. m/z. Abundance. m/z 2. Abundance. Abundance Abundance Abundance Abundance Abundance Shotgun Proteomics Protein Mixture 1 2 3 MS/MS proteolysis m/z 2 3 Time µlc m/z MS 1 m/z Peptide Mixture m/z Block Diagram of a Mass Spectrometer Sample

More information

MASS SPECTROMETRY BASED METABOLOMICS. Pavel Aronov. ABRF2010 Metabolomics Research Group March 21, 2010

MASS SPECTROMETRY BASED METABOLOMICS. Pavel Aronov. ABRF2010 Metabolomics Research Group March 21, 2010 MASS SPECTROMETRY BASED METABOLOMICS Pavel Aronov ABRF2010 Metabolomics Research Group March 21, 2010 Types of Experiments in Metabolomics targeted non targeted Number of analyzed metabolites is limited

More information

Improve Protein Analysis with the New, Mass Spectrometry- Compatible ProteasMAX Surfactant

Improve Protein Analysis with the New, Mass Spectrometry- Compatible ProteasMAX Surfactant Improve Protein Analysis with the New, Mass Spectrometry- Compatible Surfactant ABSTRACT Incomplete solubilization and digestion and poor peptide recovery are frequent limitations in protein sample preparation

More information

O O H. Robert S. Plumb and Paul D. Rainville Waters Corporation, Milford, MA, U.S. INTRODUCTION EXPERIMENTAL. LC /MS conditions

O O H. Robert S. Plumb and Paul D. Rainville Waters Corporation, Milford, MA, U.S. INTRODUCTION EXPERIMENTAL. LC /MS conditions Simplifying Qual/Quan Analysis in Discovery DMPK using UPLC and Xevo TQ MS Robert S. Plumb and Paul D. Rainville Waters Corporation, Milford, MA, U.S. INTRODUCTION The determination of the drug metabolism

More information

Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev

Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev Institute for Biochemical Physics, Rus. Acad. Sci., Moscow, Russia. Institute for Energy Problems

More information

Mass Spectrometry and Proteomics. Professor Xudong Yao Bioanalytical Chemistry Spring 2007

Mass Spectrometry and Proteomics. Professor Xudong Yao Bioanalytical Chemistry Spring 2007 Mass Spectrometry and Proteomics Professor Xudong Yao Bioanalytical Chemistry Spring 2007 Proteomics and -omics Roles of mass spectrometry Comparative proteomics Chemical proteomics Protein, Proteome and

More information

Fundamentals of Soft Ionization and MS Instrumentation

Fundamentals of Soft Ionization and MS Instrumentation Fundamentals of Soft Ionization and MS Instrumentation Ana Varela Coelho varela@itqb.unl.pt Mass Spectrometry Lab Analytical Services Unit Index Mass spectrometers and its components Ionization methods:

More information

(III) MALDI instrumentation

(III) MALDI instrumentation Dr. Sanjeeva Srivastava (I) Basics of MALDI-TF (II) Sample preparation In-gel digestion Zip-tip sample clean-up Matrix and sample plating (III) MALDI instrumentation 2 1 (I) Basics of MALDI-TF Analyte

More information

Characterization of Disulfide Linkages in Proteins by 193 nm Ultraviolet Photodissociation (UVPD) Mass Spectrometry. Supporting Information

Characterization of Disulfide Linkages in Proteins by 193 nm Ultraviolet Photodissociation (UVPD) Mass Spectrometry. Supporting Information Characterization of Disulfide Linkages in Proteins by 193 nm Ultraviolet Photodissociation (UVPD) Mass Spectrometry M. Montana Quick, Christopher M. Crittenden, Jake A. Rosenberg, and Jennifer S. Brodbelt

More information

Double charge of 33kD peak A1 A2 B1 B2 M2+ M/z. ABRF Proteomics Research Group - Qualitative Proteomics Study Identifier Number 14146

Double charge of 33kD peak A1 A2 B1 B2 M2+ M/z. ABRF Proteomics Research Group - Qualitative Proteomics Study Identifier Number 14146 Abstract The 2008 ABRF Proteomics Research Group Study offers participants the chance to participate in an anonymous study to identify qualitative differences between two protein preparations. We used

More information

Protein Analysis using Electrospray Ionization Mass Spectroscopy *

Protein Analysis using Electrospray Ionization Mass Spectroscopy * OpenStax-CNX module: m38341 1 Protein Analysis using Electrospray Ionization Mass Spectroscopy * Wilhelm Kienast Andrew R. Barron This work is produced by OpenStax-CNX and licensed under the Creative Commons

More information

Trypsin Mass Spectrometry Grade

Trypsin Mass Spectrometry Grade 058PR-03 G-Biosciences 1-800-628-7730 1-314-991-6034 technical@gbiosciences.com A Geno Technology, Inc. (USA) brand name Trypsin Mass Spectrometry Grade A Chemically Modified, TPCK treated, Affinity Purified

More information

FOURIER TRANSFORM MASS SPECTROMETRY

FOURIER TRANSFORM MASS SPECTROMETRY FOURIER TRANSFORM MASS SPECTROMETRY https://goo.gl/vx3ogw FT-ICR Theory Ion Cyclotron Motion Inward directed Lorentz force causes ions to move in circular orbits about the magnetic field axis Alan G. Marshall,

More information

Advances in Hybrid Mass Spectrometry

Advances in Hybrid Mass Spectrometry The world leader in serving science Advances in Hybrid Mass Spectrometry ESAC 2008 Claire Dauly Field Marketing Specialist, Proteomics New hybrids instruments LTQ Orbitrap XL with ETD MALDI LTQ Orbitrap

More information

Chapter 10apter 9. Chapter 10. Summary

Chapter 10apter 9. Chapter 10. Summary Chapter 10apter 9 Chapter 10 The field of proteomics has developed rapidly in recent years. The essence of proteomics is to characterize the behavior of a group of proteins, the system rather than the

More information

Multiplex Protein Quantitation using itraq Reagents in a Gel-Based Workflow

Multiplex Protein Quantitation using itraq Reagents in a Gel-Based Workflow Multiplex Protein Quantitation using itraq Reagents in a Gel-Based Workflow Purpose Described herein is a workflow that combines the isobaric tagging reagents, itraq Reagents, with the separation power

More information

LC/MS/MS SOLUTIONS FOR LIPIDOMICS. Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS

LC/MS/MS SOLUTIONS FOR LIPIDOMICS. Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS LC/MS/MS SOLUTIONS FOR LIPIDOMICS Biomarker and Omics Solutions FOR DISCOVERY AND TARGETED LIPIDOMICS Lipids play a key role in many biological processes, such as the formation of cell membranes and signaling

More information

Mass Spectrometry Course Árpád Somogyi Chemistry and Biochemistry MassSpectrometry Facility) University of Debrecen, April 12-23, 2010

Mass Spectrometry Course Árpád Somogyi Chemistry and Biochemistry MassSpectrometry Facility) University of Debrecen, April 12-23, 2010 Mass Spectrometry Course Árpád Somogyi Chemistry and Biochemistry MassSpectrometry Facility) University of Debrecen, April 12-23, 2010 Introduction, Ionization Methods Mass Analyzers, Ion Activation Methods

More information

One Gene, Many Proteins. Applications of Mass Spectrometry to Proteomics. Why Proteomics? Raghothama Chaerkady, Ph.D.

One Gene, Many Proteins. Applications of Mass Spectrometry to Proteomics. Why Proteomics? Raghothama Chaerkady, Ph.D. Applications of Mass Spectrometry to Proteomics Raghothama Chaerkady, Ph.D. McKusick-Nathans Institute of Genetic Medicine and the Department of Biological Chemistry Why Proteomics? One Gene, Many Proteins

More information

More structural information with MS n

More structural information with MS n PRODUCT SPECIFICATIONS The LTQ XL linear ion trap mass spectrometer More structural information with MS n The LTQ XL linear ion trap mass spectrometer delivers more structural information faster and with

More information

SMART Digest Kit Facilitating perfect digestion

SMART Digest Kit Facilitating perfect digestion Questions Answers SMART Digest Kit Facilitating perfect digestion The modern biopharmaceutical and protein research laboratory is tasked with providing high quality analytical results, often in high-throughput,

More information

130327SCH4U_biochem April 09, 2013

130327SCH4U_biochem April 09, 2013 Option B: B1.1 ENERGY Human Biochemistry If more energy is taken in from food than is used up, weight gain will follow. Similarly if more energy is used than we supply our body with, weight loss will occur.

More information

Edgar Naegele. Abstract

Edgar Naegele. Abstract Simultaneous determination of metabolic stability and identification of buspirone metabolites using multiple column fast LC/TOF mass spectrometry Application ote Edgar aegele Abstract A recent trend in

More information

Bioanalytical Quantitation of Biotherapeutics Using Intact Protein vs. Proteolytic Peptides by LC-HR/AM on a Q Exactive MS

Bioanalytical Quantitation of Biotherapeutics Using Intact Protein vs. Proteolytic Peptides by LC-HR/AM on a Q Exactive MS Bioanalytical Quantitation of Biotherapeutics Using Intact Protein vs. Proteolytic Peptides by LC-HR/AM on a Q Exactive MS Jenny Chen, Hongxia Wang, Zhiqi Hao, Patrick Bennett, and Greg Kilby Thermo Fisher

More information

Supporting information

Supporting information Supporting information Figure legends Supplementary Table 1. Specific product ions obtained from fragmentation of lithium adducts in the positive ion mode comparing the different positional isomers of

More information

Characterization of an Unknown Compound Using the LTQ Orbitrap

Characterization of an Unknown Compound Using the LTQ Orbitrap Characterization of an Unknown Compound Using the LTQ rbitrap Donald Daley, Russell Scammell, Argenta Discovery Limited, 8/9 Spire Green Centre, Flex Meadow, Harlow, Essex, CM19 5TR, UK bjectives unknown

More information

Neosolaniol. [Methods listed in the Feed Analysis Standards]

Neosolaniol. [Methods listed in the Feed Analysis Standards] Neosolaniol [Methods listed in the Feed Analysis Standards] 1 Simultaneous analysis of mycotoxins by liquid chromatography/ tandem mass spectrometry [Feed Analysis Standards, Chapter 5, Section 1 9.1 ]

More information

Matrix Assisted Laser Desorption Ionization Time-of-flight Mass Spectrometry

Matrix Assisted Laser Desorption Ionization Time-of-flight Mass Spectrometry Matrix Assisted Laser Desorption Ionization Time-of-flight Mass Spectrometry Time-of-Flight Mass Spectrometry. Basic principles An attractive feature of the time-of-flight (TOF) mass spectrometer is its

More information

Ultra Performance Liquid Chromatography Coupled to Orthogonal Quadrupole TOF MS(MS) for Metabolite Identification

Ultra Performance Liquid Chromatography Coupled to Orthogonal Quadrupole TOF MS(MS) for Metabolite Identification 22 SEPARATION SCIENCE REDEFINED MAY 2005 Ultra Performance Liquid Chromatography Coupled to Orthogonal Quadrupole TOF MS(MS) for Metabolite Identification In the drug discovery process the detection and

More information

Application of a new capillary HPLC- ICP-MS interface to the identification of selenium-containing proteins in selenized yeast

Application of a new capillary HPLC- ICP-MS interface to the identification of selenium-containing proteins in selenized yeast Application of a new capillary HPLC- ICP-MS interface to the identification of selenium-containing proteins in selenized yeast Application note Food supplements Authors Juliusz Bianga and Joanna Szpunar

More information

Phosphorylation of proteins Steve Barnes Feb 19th, 2002 in some cases, proteins are found in a stable, hyperphosphorylated state, e.g.

Phosphorylation of proteins Steve Barnes Feb 19th, 2002 in some cases, proteins are found in a stable, hyperphosphorylated state, e.g. Phosphorylation of proteins Steve Barnes Feb 19th, 2002 in some cases, proteins are found in a stable, hyperphosphorylated state, e.g., casein more interestingly, in most other cases, it is a transient

More information

Analysis of Testosterone, Androstenedione, and Dehydroepiandrosterone Sulfate in Serum for Clinical Research

Analysis of Testosterone, Androstenedione, and Dehydroepiandrosterone Sulfate in Serum for Clinical Research Analysis of Testosterone, Androstenedione, and Dehydroepiandrosterone Sulfate in Serum for Clinical Research Dominic Foley, Michelle Wills, and Lisa Calton Waters Corporation, Wilmslow, UK APPLICATION

More information

Glycerolipid Analysis. LC/MS/MS Analytical Services

Glycerolipid Analysis. LC/MS/MS Analytical Services Glycerolipid Analysis LC/MS/MS Analytical Services Molecular Characterization and Quantitation of Glycerophospholipids in Commercial Lecithins by High Performance Liquid Chromatography with Mass Spectrometric

More information

AbsoluteIDQ p150 Kit. Targeted Metabolite Identifi cation and Quantifi cation. Bringing our targeted metabolomics expertise to your lab.

AbsoluteIDQ p150 Kit. Targeted Metabolite Identifi cation and Quantifi cation. Bringing our targeted metabolomics expertise to your lab. AbsoluteIDQ p150 Kit Targeted Metabolite Identifi cation and Quantifi cation Bringing our targeted metabolomics expertise to your lab. The Biocrates AbsoluteIDQ p150 mass spectrometry Assay Preparation

More information

Time (min) Supplementary Figure 1: Gas decomposition products of irradiated DMC.

Time (min) Supplementary Figure 1: Gas decomposition products of irradiated DMC. 200000 C 2 CH 3 CH 3 DMC 180000 160000 140000 Intensity 120000 100000 80000 60000 40000 C 2 H 6 CH 3 CH 2 CH 3 CH 3 CCH 3 EMC DEC 20000 C 3 H 8 HCCH 3 5 10 15 20 25 Time (min) Supplementary Figure 1: Gas

More information

Methods in Mass Spectrometry. Dr. Noam Tal Laboratory of Mass Spectrometry School of Chemistry, Tel Aviv University

Methods in Mass Spectrometry. Dr. Noam Tal Laboratory of Mass Spectrometry School of Chemistry, Tel Aviv University Methods in Mass Spectrometry Dr. Noam Tal Laboratory of Mass Spectrometry School of Chemistry, Tel Aviv University Sample Engineering Chemistry Biology Life Science Medicine Industry IDF / Police Sample

More information

LOCALISATION, IDENTIFICATION AND SEPARATION OF MOLECULES. Gilles Frache Materials Characterization Day October 14 th 2016

LOCALISATION, IDENTIFICATION AND SEPARATION OF MOLECULES. Gilles Frache Materials Characterization Day October 14 th 2016 LOCALISATION, IDENTIFICATION AND SEPARATION OF MOLECULES Gilles Frache Materials Characterization Day October 14 th 2016 1 MOLECULAR ANALYSES Which focus? LOCALIZATION of molecules by Mass Spectrometry

More information

An Alternative Approach: Top-Down Bioanalysis of Intact Large Molecules Can this be part of the future? Lecture 8, Page 27

An Alternative Approach: Top-Down Bioanalysis of Intact Large Molecules Can this be part of the future? Lecture 8, Page 27 An Alternative Approach: Top-Down Bioanalysis of Intact Large Molecules Can this be part of the future? Lecture 8, Page 27 Top-down HRAM Bioanalysis of Native Proteins/Molecules Relative Abundance 100

More information

Development of a Human Cell-Free Expression System to Generate Stable-Isotope-Labeled Protein Standards for Quantitative Mass Spectrometry

Development of a Human Cell-Free Expression System to Generate Stable-Isotope-Labeled Protein Standards for Quantitative Mass Spectrometry Development of a Human Cell-Free Expression System to Generate Stable-Isotope-Labeled Protein Standards for Quantitative Mass Spectrometry Ryan D. omgarden 1, Derek aerenwald 2, Eric Hommema 1, Scott Peterman

More information

Don t miss a thing on your peptide mapping journey How to get full coverage peptide maps using high resolution accurate mass spectrometry

Don t miss a thing on your peptide mapping journey How to get full coverage peptide maps using high resolution accurate mass spectrometry Don t miss a thing on your peptide mapping journey How to get full coverage peptide maps using high resolution accurate mass spectrometry Kai Scheffler, PhD BioPharma Support Expert,LSMS Europe The world

More information

Sequence Identification And Spatial Distribution of Rat Brain Tryptic Peptides Using MALDI Mass Spectrometric Imaging

Sequence Identification And Spatial Distribution of Rat Brain Tryptic Peptides Using MALDI Mass Spectrometric Imaging Sequence Identification And Spatial Distribution of Rat Brain Tryptic Peptides Using MALDI Mass Spectrometric Imaging AB SCIEX MALDI TOF/TOF* Systems Patrick Pribil AB SCIEX, Canada MALDI mass spectrometric

More information

A Definitive Lipidomics Workflow for Human Plasma Utilizing Off-line Enrichment and Class Specific Separation of Phospholipids

A Definitive Lipidomics Workflow for Human Plasma Utilizing Off-line Enrichment and Class Specific Separation of Phospholipids A Definitive Lipidomics Workflow for Human Plasma Utilizing Off-line Enrichment and Class Specific Separation of Phospholipids Jeremy Netto, 1 Stephen Wong, 1 Federico Torta, 2 Pradeep Narayanaswamy, 2

More information

LECTURE-15. itraq Clinical Applications HANDOUT. Isobaric Tagging for Relative and Absolute quantitation (itraq) is a quantitative MS

LECTURE-15. itraq Clinical Applications HANDOUT. Isobaric Tagging for Relative and Absolute quantitation (itraq) is a quantitative MS LECTURE-15 itraq Clinical Applications HANDOUT PREAMBLE Isobaric Tagging for Relative and Absolute quantitation (itraq) is a quantitative MS based method for quantifying proteins subject to various different

More information

Application Note # LCMS-89 High quantification efficiency in plasma targeted proteomics with a full-capability discovery Q-TOF platform

Application Note # LCMS-89 High quantification efficiency in plasma targeted proteomics with a full-capability discovery Q-TOF platform Application Note # LCMS-89 High quantification efficiency in plasma targeted proteomics with a full-capability discovery Q-TOF platform Abstract Targeted proteomics for biomarker verification/validation

More information

Quadrupole and Ion Trap Mass Analysers and an introduction to Resolution

Quadrupole and Ion Trap Mass Analysers and an introduction to Resolution Quadrupole and Ion Trap Mass Analysers and an introduction to Resolution A simple definition of a Mass Spectrometer A Mass Spectrometer is an analytical instrument that can separate charged molecules according

More information

Analysis of N-Linked Glycans from Coagulation Factor IX, Recombinant and Plasma Derived, Using HILIC UPLC/FLR/QTof MS

Analysis of N-Linked Glycans from Coagulation Factor IX, Recombinant and Plasma Derived, Using HILIC UPLC/FLR/QTof MS Analysis of N-Linked Glycans from Coagulation Factor IX, Recombinant and Plasma Derived, Using HILIC UPLC/FLR/QTof MS Ying Qing Yu Waters Corporation, Milford, MA, U.S. A P P L I C AT ION B E N E F I T

More information

Choosing the metabolomics platform

Choosing the metabolomics platform Choosing the metabolomics platform Stephen Barnes, PhD Department of Pharmacology & Toxicology University of Alabama at Birmingham sbarnes@uab.edu Challenges Unlike DNA, RNA and proteins, the metabolome

More information

The Detection of Allergens in Food Products with LC-MS

The Detection of Allergens in Food Products with LC-MS The Detection of Allergens in Food Products with LC-MS Something for the future? Jacqueline van der Wielen Scope of Organisation Dutch Food and Consumer Product Safety Authority: Law enforcement Control

More information

MS/MS to Targeted Proteomics (MRM)

MS/MS to Targeted Proteomics (MRM) MS/MS to Targeted Proteomics (MRM) How it worked on the Human Lens Proteome Jayson Falkner PhD jay@singleorganism.com Genes Show Limited Value in Predicting Diseases With only a few exceptions, what the

More information

AccuMAP Low ph Protein Digestion Kits

AccuMAP Low ph Protein Digestion Kits TECHNICAL MANUAL AccuMAP Low ph Protein Digestion Kits Instruc ons for Use of Products VA1040 and VA1050 5/17 TM504 AccuMAP Low ph Protein Digestion Kits All technical literature is available at: www.promega.com/protocols/

More information

Quantitation of Protein Phosphorylation Using Multiple Reaction Monitoring

Quantitation of Protein Phosphorylation Using Multiple Reaction Monitoring Quantitation of Protein Phosphorylation Using Multiple Reaction Monitoring Application Note Authors Ning Tang, Christine A. Miller and Keith Waddell Agilent Technologies, Inc. Santa Clara, CA USA This

More information

Babu Antharavally, Ryan Bomgarden, and John Rogers Thermo Fisher Scientific, Rockford, IL

Babu Antharavally, Ryan Bomgarden, and John Rogers Thermo Fisher Scientific, Rockford, IL A Versatile High-Recovery Method for Removing Detergents from Low-Concentration Protein or Peptide Samples for Mass Spectrometry Sample Preparation and Analysis Babu Antharavally, Ryan Bomgarden, and John

More information

MALDI Imaging Drug Imaging Detlev Suckau Head of R&D MALDI Bruker Daltonik GmbH. December 19,

MALDI Imaging Drug Imaging Detlev Suckau Head of R&D MALDI Bruker Daltonik GmbH. December 19, MALDI Imaging Drug Imaging Detlev Suckau Head of R&D MALDI Bruker Daltonik GmbH December 19, 2014 1 The principle of MALDI imaging Spatially resolved mass spectra are recorded Each mass signal represents

More information

Applying a Novel Glycan Tagging Reagent, RapiFluor-MS, and an Integrated UPLC-FLR/QTof MS System for Low Abundant N-Glycan Analysis

Applying a Novel Glycan Tagging Reagent, RapiFluor-MS, and an Integrated UPLC-FLR/QTof MS System for Low Abundant N-Glycan Analysis Applying a Novel Glycan Tagging Reagent, RapiFluor-MS, and an Integrated UPLC-FLR/QTof MS System for Low Abundant N-Glycan Analysis Ying Qing Yu Waters Corporation, Milford, MA, USA APPLICATION BENEFITS

More information

Automated Sample Preparation/Concentration of Biological Samples Prior to Analysis via MALDI-TOF Mass Spectroscopy Application Note 222

Automated Sample Preparation/Concentration of Biological Samples Prior to Analysis via MALDI-TOF Mass Spectroscopy Application Note 222 Automated Sample Preparation/Concentration of Biological Samples Prior to Analysis via MALDI-TOF Mass Spectroscopy Application Note 222 Joan Stevens, Ph.D.; Luke Roenneburg; Tim Hegeman; Kevin Fawcett

More information

LECTURE 3. Ionization Techniques for Mass Spectrometry

LECTURE 3. Ionization Techniques for Mass Spectrometry LECTURE 3 Ionization Techniques for Mass Spectrometry Jack Henion, Ph.D. Emeritus Professor, Analytical Toxicology Cornell University Ithaca, NY 14850 Lecture 3, Page 1 Contents Electron ionization (EI)

More information

Quantification with Proteome Discoverer. Bernard Delanghe

Quantification with Proteome Discoverer. Bernard Delanghe Quantification with Proteome Discoverer Bernard Delanghe Overview: Which approach to use? Proteome Discoverer Quantification Method What When to use Metabolic labeling SILAC Cell culture systems Small

More information

LC/MS Method for Comprehensive Analysis of Plasma Lipids

LC/MS Method for Comprehensive Analysis of Plasma Lipids Application Note omics LC/MS Method for Comprehensive Analysis of Plasma s Authors Tomas Cajka and Oliver Fiehn West Coast Metabolomics Center, University of California Davis, 451 Health Sciences Drive,

More information

Ozonolysis of phospholipid double bonds during electrospray. ionization: a new tool for structure determination

Ozonolysis of phospholipid double bonds during electrospray. ionization: a new tool for structure determination Ozonolysis of phospholipid double bonds during electrospray ionization: a new tool for structure determination Michael C. Thomas, Todd W. Mitchell, Stephen J. Blanksby Departments of Chemistry and Biomedical

More information

Ionization Methods. Neutral species Charged species. Removal/addition of electron(s) Removal/addition of proton(s)

Ionization Methods. Neutral species Charged species. Removal/addition of electron(s) Removal/addition of proton(s) Ionization Methods Neutral species Charged species Removal/addition of electron(s) M + e - (M +. )* + 2e - electron ionization Removal/addition of proton(s) M + (Matrix)-H MH + + (Matrix) - chemical ionization

More information

Mass Spectrometry and Proteomics Xudong Yao

Mass Spectrometry and Proteomics Xudong Yao Mass Spectrometry and Proteomics Xudong Yao Dept of Chemistry University of Connecticut Storrs, CT April 19, 2005 Proteomics and -omics Roles of mass spectrometry Comparative proteomics Gel or non-gel

More information

CAMAG TLC-MS INTERFACE

CAMAG TLC-MS INTERFACE CAMAG TLC-MS INTERFACE 93.1 249.2 40 30 97.1 20 10 250.2 0 200 400 m/z WORLD LEADER IN PLANAR-CHROMATOGRAPHY Identification and elucidation of unknown substances by hyphenation of TLC / HPTLC and MS The

More information

UPLC/MS Monitoring of Water-Soluble Vitamin Bs in Cell Culture Media in Minutes

UPLC/MS Monitoring of Water-Soluble Vitamin Bs in Cell Culture Media in Minutes UPLC/MS Monitoring of Water-Soluble Vitamin Bs in Cell Culture Media in Minutes Catalin E. Doneanu, Weibin Chen, and Jeffrey R. Mazzeo Waters Corporation, Milford, MA, U.S. A P P L I C AT ION B E N E F

More information

Determination of 6-Chloropicolinic Acid (6-CPA) in Crops by Liquid Chromatography with Tandem Mass Spectrometry Detection. EPL-BAS Method No.

Determination of 6-Chloropicolinic Acid (6-CPA) in Crops by Liquid Chromatography with Tandem Mass Spectrometry Detection. EPL-BAS Method No. Page 1 of 10 Determination of 6-Chloropicolinic Acid (6-CPA) in Crops by Liquid Chromatography with Tandem Mass Spectrometry Detection EPL-BAS Method No. 205G881B Method Summary: Residues of 6-CPA are

More information

Protein sequence mapping is commonly used to

Protein sequence mapping is commonly used to Reproducible Microwave-Assisted Acid Hydrolysis of Proteins Using a Household Microwave Oven and Its Combination with LC-ESI MS/MS for Mapping Protein Sequences and Modifications Nan Wang and Liang Li

More information

SCS Mass Spectrometry Laboratory

SCS Mass Spectrometry Laboratory SCS Mass Spectrometry Laboratory Contact Information Staff 31 Noyes Laboratory (8:00-5:00 M-F) 217-333-2545 http://scs.illinois.edu/massspec/ Furong Sun (frs@illinois.edu) Furong Sun Director Training

More information

Analysis of Peptides via Capillary HPLC and Fraction Collection Directly onto a MALDI Plate for Off-line Analysis by MALDI-TOF

Analysis of Peptides via Capillary HPLC and Fraction Collection Directly onto a MALDI Plate for Off-line Analysis by MALDI-TOF Analysis of Peptides via Capillary HPLC and Fraction Collection Directly onto a MALDI Plate for Off-line Analysis by MALDI-TOF Application Note 219 Joan Stevens, PhD; Luke Roenneburg; Kevin Fawcett (Gilson,

More information

Measuring Lipid Composition LC-MS/MS

Measuring Lipid Composition LC-MS/MS Project: Measuring Lipid Composition LC-MS/MS Verification of expected lipid composition in nanomedical controlled release systems by liquid chromatography tandem mass spectrometry AUTHORED BY: DATE: Sven

More information

Terminology. Metabolite: substance produced or used during metabolism such as lipids, sugars and amino acids

Terminology. Metabolite: substance produced or used during metabolism such as lipids, sugars and amino acids Terminology Metabolite: substance produced or used during metabolism such as lipids, sugars and amino acids Metabolome: the quantitative complement of all the low molecular weight molecules present in

More information

Mass Spectrometry Infrastructure

Mass Spectrometry Infrastructure Mass Spectrometry Infrastructure Todd Williams, Ph.D. Director KU Mass Spectrometry and Analytical Proteomics Laboratory Mass Spectrometry Lab B025 Malott Hall Mission The Mass Spectrometry and analytical

More information

Nature Methods: doi: /nmeth.3177

Nature Methods: doi: /nmeth.3177 Supplementary Figure 1 Characterization of LysargiNase, trypsin and LysN missed cleavages. (a) Proportion of peptides identified in LysargiNase and trypsin digests of MDA-MB-231 cell lysates carrying 0,

More information

4th Multidimensional Chromatography Workshop Toronto (January, 2013) Herman C. Lam, Ph.D. Calibration & Validation Group

4th Multidimensional Chromatography Workshop Toronto (January, 2013) Herman C. Lam, Ph.D. Calibration & Validation Group 4th Multidimensional Chromatography Workshop Toronto (January, 2013) Herman C. Lam, Ph.D. Calibration & Validation Group MDLC for Shotgun Proteomics Introduction General concepts Advantages Challenges

More information

Dr. Erin E. Chambers Waters Corporation. Presented by Dr. Diego Rodriguez Cabaleiro Waters Europe Waters Corporation 1

Dr. Erin E. Chambers Waters Corporation. Presented by Dr. Diego Rodriguez Cabaleiro Waters Europe Waters Corporation 1 Development of an SPE-LC/MS/MS Assay for the Simultaneous Quantification of Amyloid Beta Peptides in Cerebrospinal Fluid in Support of Alzheimer s Research Dr. Erin E. Chambers Waters Corporation Presented

More information

2D-LC as an Automated Desalting Tool for MSD Analysis

2D-LC as an Automated Desalting Tool for MSD Analysis 2D-LC as an Automated Desalting Tool for MSD Analysis Direct Mass Selective Detection of a Pharmaceutical Peptide from an MS-Incompatible USP Method Application Note Biologics and Biosimilars Author Sonja

More information