Strategies for Characterizing Proteins in Proteomics Research

Strategies for Characterizing Proteins in Proteomics Research Joseph A. Loo Department of Biological Chemistry David Geffen School of Medicine Department of Chemistry and Biochemistry University of California Los Angeles, CA USA

Approaches for Protein Identification What is this protein? Molecular weight Isoelectric point (charge) Amino acid composition Other physical/chemical characteristics Partial or complete amino acid sequence Edman (N-terminal sequence) - if N-term. not blocked C-terminal sequence - not commonly performed Mass spectrometry-measured information

Electrospray Ionization (ESI) ESI 22+ 21+ 20+ 19+ highly charge droplets 18+ 17+ 16+ MS 15+ 14+ 500 700 900 1100 mass/charge (m/z( m/z) Multiple charging More charges for larger molecules MW range > 150 kda Liquid introduction of analyte Interface with liquid separation methods, e.g. liquid chromatography Tandem mass spectrometry (MS/MS) for protein sequencing

ESI-MS of Large Proteins 46512 distribution of multiply charged molecules (M+15H) 15+ 3102 (M+14H) 15+ 14+ 3323 46048 (M+16H) 16+ (M+13H) 13+ 3543 46000 47000 mass 2750 3000 3250 3500 3750 4000 m/z ESI-MS (Q-TOF) ph 7.5

Matrix-assisted assisted Laser Desorption/Ionization (MALDI) Time-of of-flight (TOF) Analyzer MALDI high voltage v 3 m 3 v 2 m 2 v 1 m 1 detector laser sample drift region m 1 m 2 m 3

MALDI Mass Spectrometry of Large Proteins 100 MALDI-MS MS of rat MVP 97430 (M+H) + % Intensity 36446 (M+2H) 2+ 48658 98563 58309 30811 50608 70405 90202 110000 129797 m/z

Approaches for Protein Sequencing and Identification Top Down MS/MS MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNIAQGDVHLKGLMGP GVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEVFSKLRGSRLVVAVSETIPVSQRLSLE EAEFDPHVWFDVKLWSYSVKAVYESLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQS LPAERRVLVTAHDAFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESSI PHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNIDTIVAALAR Enzymatic or chemical degradation MS/MS Bottom Up

Protein Identification by Mass Spectrometry 2-D Gel Electrophoresis MW x 10 3 150-75- 40-25- 18-10- Excise separated protein spots 6.5 6.0 5.5 5.0 4.5 pi 1547 In-gel trypsin digest Peptide mass fingerprint by MALDI-TOF or LC-ESI-MS. Additional sequence information can be obtained by MS/MS. 717 1089 1272 1401 2384 1857 1700 2791 Recover tryptic peptides 500 1500 2500 3000 m/z Protein identification by searching proteomic or genomic databases

Quadrupole - Time-of of-flight (QTOF) MALDI ESI Q0 Q1 Q2 select precursor ion of interest tandem mass spectrometry MS/MS fragment precursor ion detect resulting product ions

De Novo Protein Sequencing of Heat Resistant Proteins Found in Sacred Lotus Lotus, Nelumbo nucifera, has the distinction of having produced the longest living seed, 1300 yr Seed longevity is attributable to its Fruit coat s impermeability to O 2 and water Rapid adaptability to stress Genome is not known de novo protein sequencing Jane Shen-Miller, Kerry Wooding, Yongming Xie

SDS-PAGE Lotus protein (>400 yr) boiled embryo extracts 97 kda 64 kda 51 kda 39 kda 28 kda 19 kda 14 kda Heat hardy proteins are soluble in >100 C enable draught protection Lotus Fe-SOD (oxidative repair) retains 65% activity after 1 hr boiling mass spectrometry and de novo sequencing to identify heat stable proteins

Isotope Labeling as an Aid to De Novo Sequencing Trypsin Digestion in H 2 18 O (Arg, Lys) R 1 R 2 R 3 R 4...NH-CH CH-CO-NH-CH-CO-NH-CH-CO-NH-CH-COOHCOOH Trypsin / H 16 2 O / H 18 2 O R 1 R 2 NH 2 -CH-CO-NH-CH-CO- 16 OH R 1 R 2 NH 2 -CH-CO-NH-CH-CO- 18 OH 16 O or 18 O Aids interpretation of MS/MS spectra Distinguish b-ions (from N- terminus) and y-ions (from C- terminus): 2 Da shift

MALDI-QqTOF Tryptic Peptide Map trypsin digestion performed in H 2 16 O or H 2 18 O 1576.7 1302.5 Boiled Embryo Extract 45 kda Band H 16 2 O 1252.6 1039.5 1098.0 1333.6 1412.6 1633.7 1000 1100 1200 1300 1400 1500 1600 1700 1800 m/z 1254.6 1304.5 1578.7 H 2 18 18 O 1414.6 1000 1100 1200 1300 1400 1500 1600 1700 1800 m/z

Enzyme and Chemical Specificity for Proteolysis Arg/Lys - XX Lys - XX Arg - XX Asp/Glu - XX XX - Asp Met - XX Trypsin Lys-C Arg-C V-8 8 protease Asp-N CNBr

New Chemistries for Proteomics: Chemical Modification of Cysteine (Cys-specific specific cleavage) N H 2 Br 2-bromoethyl amine Lindley, H. Nature,, 1956, 178,, 647. OH - HS NH O Cysteine H 2 N S N H O Digest with trypsin to cleave after Lys, Arg, Cys S-aminoethyl cysteine Thevis and Loo J. Proteome Res.,, 2003

Chemical Modification of Lysine Block Lys-cleavage H 2 N O NH + C H 3 O O CH 3 Lysine Acetic anhydride MeOH H 3 C O H N NH O Digest with trypsin to cleave after Arg, Cys Digest with Lys-C to cleave after Cys Lysine, acetylated (+ 42 Da)

Prior to Digestion: Chemical Modification of Cysteine Cys-specific specific cleavage Ionization enhancement of Cys-peptides N H 2 Br OH - HBr N H 2 S R 2-bromoethyl amine S-aminoethyl cysteine Lindley, H. Nature, 1956, 178,, 647. After Digestion: O CH 3 HN NH 2 O-methylisourea H H + N R 2 N N 2 C S H2 NH 2-bromoethyl amine Hale, J.E. et al. Anal. Biochem. 2000, 287,, 110. H H 2 C S + CH 3 OH R homoarginine derivative (improve MALDI response)

Human Serum Albumin In-gel derivatization and trypsin digestion 100 1062.1411 MALDI-MS MS % Intensity 1468.4890 2117.9183 1637.5044 1942.7391 2240.9442 1078.1616 1544.4962 1312.3552 2513.0894 1141.9080 2382.0698 2809.1502 1510.5257 1796.5198 2573.0795 2763.2002 2923.3063 1000 1420 1840 2260 2680 3100 m/z

Human Serum Albumin In-gel derivatization and trypsin digestion (cleavage after Cys, Arg) DAHKSEVAHR FKDLGEENFK ALVLIAFAQY LQQCPFEDHV KLVNEVTEFA KTCVADESAE NCDKSLHTLF GDKLCTVATL RETYGEMADCR CAKQEPERNE CFLQHKDDNP NLPRLVRPEV LVRPEV DVMCTAFHDN EETFLKKYLY EIARRHPYFY APELLFFAKR YKAAFTECCQ AADKAACLLP KLDELRDEGK ASSAKQRLKC ASLQKFGERA FKAWAVARLS QRFPKAEFAE VSKLVTDLTK VHTECCHGDL CHGDL LECADDRADL AKYICENQDS ISSKLKECCE KPLLEKSHCI I AEVENDEMPA DLPSLAADFV ESKDVCKNYA KNYA EAKDVFLGMF LYEYARRHPD YSVVLLLRLA LA KTYETTLEKC CAAADPHECY Y AKVFDEFKPL VEEPQNLIKQ NCELFEQLGE YKFQNALLVR YTKKVPQVST PTLVEVSRNL GKVGSKCCKH PEAKRMPCAE AE DYLSVVLNQL CVLHEKTPVS C DRVTKCCTES LVNRRPCFSA LEVDETYVPK EFNAETFTFH ADICTLSEKE RQIKKQTALV ELVKHKPKAT KEQLKAVMDD FAAFVEKCCK ADDKETCFAE EGKKLVAASQ AALGL Without homoarginine derivatization In addition, with homoarginine derivatization

α-lactalbumin In-gel derivatization and trypsin digestion Homoarginine derivatization EQLTKCEVFR ELKDLKGYGG VSLPEWVCTT FHTSGYDTQA IVQNNDSTEY GLFQINNKIW CKDDQNPHSS NICNISC NISCDKF LDDDLTDDIM CVKKILDKVG INYWLAHKAL CSEKLDQWLC EKL 86% sequence coverage D K F L D D D L T D D I M C 14 (homoarg; 1785.9) (M+H) + MALDI-MS/MS, MS/MS, QqTOF y 3 y 4 y 7 y 8 y 1 y 2 b 2 b 3 y 5 y 6 y 9 y13 y 10 y 12

Why Bother with Intact Protein Masses? The protein s molecular mass defines the native covalent state of a gene s product, including: Post-translational modifications: Glycosylation, Phosphorylation, Oxidation, Deamidation, Lipoylation, Acetylation, Formylation... Post-transcriptional modifications Alternative splicing, Introns/Exons, Frame-shifts, Sequencing Errors Simple confirmation of suspected id s Spotlight the presence of post-translational modifications The fragmentation pattern from proteins can generate sufficient information for identification from sequence databases, particularly when combined with accurate mass measurements of both the intact molecule and its product ions.

Combining Gel Electrophoreisis with Mass Spectrometry Building better tools Profiling proteins faster in an automated fashion

PAGE-MALDI MALDI-MS: MS: The Virtual 2D Gel Method Potential for automated, high-speed proteomic analysis Rachel Loo

Virtual 2-D 2 D Gel Electrophoresis (Analytical Chemistry 73, 4063 (2001)) Identify proteins based on MW and pi Create a database of intact protein masses and provide a direct link to all data produced now and in the future via classical 2D gels Identify proteome-wide posttranslational modifications

In-Source Decay (ISD) for Protein Sequencing Peptides and large proteins can be fragmented by ISD Fragmentation occurs in the MALDI ion source not generally well controlled Reflectron TOF not necessary (linear TOF sufficient to measure product ions cut R P F E E D P I T Complete sequence information not present, but extensive stretches of sequence from the N- and/or C-termini observed P E P P E P T P E P T I...TIDE... A G M E N T P E P T I D P E P T I D E

Protein ISD from Isoelectric Focusing Gel Counts 2000 9120 ptsh 2+ mete 84601 3+ 2000 S Y W AG N 20000 40000 60000 80000 100000 m/z 24 SYWAGNSTREELLAVGRELRARHWDQQKQAGIDLLPVGDFAWYD 67 Counts N-terminal c-ion series S T R E E I/LI/L G R V E I/L R A R H W K/Q D K/Q K/Q K/Q AGIDI/L LP VG D muli F A W Y D 3000 4000 5000 6000 7000 Ogorzalek Loo, RR, Cavalcoli, JD, VanBogelen, RA, Mitchell, C, Loo, JA, Moldover, B., and Andrews, PC, Anal. Chem. 73, 4063 (2001). m/z

Top-down sequencing of proteins by ESI-MS/MS 100 MW 18171 866 19+ 1070 HIV Integrase Catalytic Core (F185K) 827 % 23+ 15+ 1299 1515 500 0 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 m/z

Integrase: MS/MS (M+19H) 19+ 100 100 b 43 6+ 1046 y 123 13+ % % 762 0 763 764 765 766 767 m/z 764 b 43 5 917 y 123 14 y 2 y 3 y 4 y 5 y 6 y 7 complementary ion pair b 43 6+ -Ile -Pro 44-505 618 y 123 13+ 200 0 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 m/z

MS/MS b 43 6+ (m/z 764) 100 746 875 b 37-42 5+ MS/MS/MS or MS 3 can be performed to obtain further sequence information b 39-42 6+ b 43 6+ b 34-41 4+ % 1093 b 34-40 3+ 1310 500 0 600 700 800 900 1000 1100 1200 1300 1400 1500 m/z

ESI-MS/MS of GroES (97 amino acids) MS/MS yields direct sequence information for the C-terminal 15 amino acid residues. 833 b 12 85 GroES (Chaperonin-10, MW 10435) b n 12+ 12+, n = 82-96 MS 2 (M+12H) 12+ 845 b 12 94 921 Difficult to predict fragmentation sites for intact proteins y 5 y 6 502 615 813 941 y y 11 12 y 13 1146 1277 13 y 1390 14 500 700 900 1100 1300 1500 m/z

Peptide fragmentation in nozzle-skimmer region of ESI interface Often called nozzleskimmer dissociation Fragmentation can be controlled by adjustment of nozzle-skimmer voltage Low energy CAD peptide fragmentation in atmosphere / vacuum interface Useful for obtaining sequence information for relatively pure samples without the need for a tandem mass spectrometer

Mass Spectrometry as a Tool for Protein Crystallography (see Cohen & Chait, Annu. Rev. Biophys. Biomol. Struct. 2001) Construct Verification Domain Elucidation Define folding domains Improve chances to generate high quality diffracting crystals Limited proteolysis cleaves flexible chains, leaving a more compact folding domain

ESI-MS of Intact Protein: ~ 59 kda 3 phosphorylation sites? MW 59335 63+ MW 59255 80 Da ESI-MS 100 65+ 940 950 % 0 800 900 1000 1100 1200 m/z

Limited Proteolysis for Domain Elucidation ESI-MS Trypsinolysis (15 min) Where are the cleavage sites? 17883 19809 19889 20116 80 Da 14451 14000 16000 18000 20000 22000 mass

MW 19809 MS/MS of Intact Proteins Complex spectra Confirm primary sequence Confirm termini, posttranslational modifications MW 19889 (M+18H) 18+ (M+18H) 18+ y-ions b-ions 900 1100 1300 m/z

MS/MS of (M+18H) 18+ y-ions similar, but shifted by 80 Da same C-termini, C but phosphorylated MW 19809 residues 1-1691 169 (M+18H) 18+ 1210.3 MW 19889 residues 1-1691 169 1 phospho-site site (M+18H) 18+ y 14+ n 144 147 150 1216.0 y 15+ n y 13+ n 1100 1150 1200 1250 1300 1350 m/z

Based on homology modeling and limited proteolysis... ESI-MS Trypsinolysis (15 min) 17883 (1-150) 150) 19809 (1-169) 169) 19889 (1-169) 169)-P 20116 (1-171) 171)-P 14451 (385-508) 508)-P 14000 16000 18000 20000 22000 mass

Conclusions Bottom-up and top-down approaches for protein characterization are complementary Bottom-up methods are robust and lead to efficient processes for identifying proteins Top-down methods are not mature, both experimentally and computationally. Protein fragmentation is not predictable (yet). However, information on protein processing and modifications may be flagged by the top-down approach.