PROTEOMICS August 27 31, 2007 Peter D'Eustachio - MSB 328 e-mail: deustp01@med.nyu.edu GOALS OF THIS SEGMENT OF THE COURSE Recognize the structures of the 20 amino acids; understand their properties as weak acids and bases and as stereoisomers; understand the chemical properties of the classes of side chains and of the peptide bond. Understand the use of electrophoresis as an analytical and diagnostic tool. Understand the use of sequence comparisons to define protein families and to define conserved domains within proteins. Understand protein phosphorylation and its functional consequences. Understand the roles of the peptide bond and of hydrogen bonding in determining the major secondary structural features of proteins; know the distinguishing features of α-helices and β- pleated sheets. Understand the concept of a folding pathway, and the role of chaperone proteins in facilitating protein folding in vivo, and the biological consequences of mis-folding. Understand the concept of a transition state and the function of enzymes as catalysts. Understand the properties of serine proteases that make them efficient and specific catalysts. Know how to use the Michaelis-Menten equation to describe enzyme function quantitatively; understand the meanings of K M and V max. Understand the difference between competitive and non-competitive inhibitors; know how to use enzyme kinetic data to distinguish the two forms of inhibition. Understand allostery and cooperativity, and their roles in regulating the activity of multisubunit enzymes. I. THE AMINO ACID SUBUNITS Twenty α-amino acids are the monomeric subunits of proteins. Two features, their properties as weak acids and bases, and their stereochemistry, are common to all 20. The third feature, the chemical properties of their side chains, allows them to be distinguished from one another and classified. This third feature is also key in understanding the roles of individual amino acids in determining the structures and functions of whole proteins. The peptide bond joins individual amino acid residues to form polypeptide chains, and its properties determine several of the key structural features shared by all proteins. proteomics - 1
II. PROTEIN SIZE AND SHAPE The analytical ultracentrifuge provided the first generally useful means for measuring sizes of proteins. While it has been almost completely replaced by other methods, it has contributed a way of describing overall protein shape, and a terminology - the Svedberg unit - that remain important. SDS polyacrylamide gel electrophoresis has become a key tool for characterizing and purifying proteins. Under appropriate conditions, mixtures of protein molecules migrating through a porous medium are separated according to their sizes, as shown in the figure above. Following electrophoresis, proteins can be visualized with nonspecific stains, or individual proteins or biochemical activities can be detected (western blotting and ligand binding assays, respectively). III. THE PRIMARY STRUCTURE OF PROTEINS The primary structure of a protein is its covalent structure. The most important feature of its covalent structure, its amino acid sequence, is determined either by Edman degradation, in which amino acids are cleaved one at a time from the N-terminus, or is deduced from the nucleotide sequence of the DNA that encodes the protein. Analysis of the amino acid sequences of proteins reveals three sorts of useful functional information: Correlation of sequence alteration with altered functions. A classic example is the point mutation in the β-globin chain of hemoglobin. Identification of protein families. Proteins are grouped into families based on shared amino acid sequence motifs. Two kinds of family groupings are possible: the same protein from different species; different proteins from a single species. Both family groupings are typically described and rationalized in evolutionary terms, and provide useful clues about likely functions of newly discovered or newly characterized human proteins. Identification of functional domains. Many large proteins contain multiple functionally distinct domains. Immunoglobulins are a classic example of proteins exhibiting domain structure. Comparison of the variable regions of many different immunoglobulin heavy and light chains shows individual functional variants and a constant structural framework. A. ISOZYMES AND ISOFORMS Isozymes are alternative forms of an enzyme that catalyze the same reaction but that have different amino acid sequences. Different isozymes are typically expressed in different tissues, may differ in efficiency, and may be subject to distinct regulatory factors, providing three important mechanisms for tissue-specific regulation of enzyme activity. Also, blood tests for isozymes that should normally be confined to a particular tissue provide a clinically important assay for detectproteomics - 2
ing tissue damage and estimating its severity (e.g., detection of the CK2 form of creatine kinase in the blood following myocardial infarction). Many kinds of proteins show homology at the level of amino acid sequence, and prove to be structurally and functionally related. Such family members are called isoforms. Glucose transporters are an example: GTR4 260 lkdekrklerer--plsllqllgsr-thrqpliiavvlqlsqqlsginavfyystsifetagv 319 GTR1 244 MKeEsrqmmrEk--kVtiLeLFRsp-aYRQPILIAvVLQLSQQLSGINAVFYYSTSIFekAGV 303 GTR3 242 MKdEsarmsqEk--qVtvLeLFRvs-SYRQPIiIsiVLQLSQQLSGINAVFYYSTgIFkdAGV 301 SLC2A2 276 MrkEreeassEq--kVSiiqLFtns-SYRQPILvAlmLhvaQQfSGINgiFYYSTSIFqtAGi 335 SLC2A5 250 irqedeaekaag--fisvlklfrmr-slrwqllsiivlmggqqlsgvnaiyyyadqiylsagv 309 hypo 199 lqggeapklgpgrprysfldlfrardnmrgrttvglglvlfqqltgqpnvlcyastifssvgf 261 In this alignment of portions of the sequences of five known transporter proteins and one hypothetical protein discovered in the course of human genomic sequencing, amino acids are designated by their one-letter abbreviations. To highlight family relationships, where three or more sequences have the same residue at a given position, its letter is shown in upper case boldface type. Residues strongly related to one another in terms of the physical and chemical properties of their side chains are shown in lower case boldface type, and weakly related ones are shown in lower case italic type. B. COVALENT MODIFICATION OF POLYPEPTIDES These modifications convert hormone and enzyme precursors to their active forms, play a major role in targeting newly synthesized proteins to their correct compartments in the cell, and can be used to regulate protein function. MODIFICATION SITE - REVERSIBLE? FUNCTION EXAMPLES peptide bond cleavage -leader? NO targeting secreted, membrane proteins -internal dibasic NO maturation peptide hormones activation proteases glycosylation asn-x-thr/ser NO targeting lysosomal enzymes stability(?) membrane proteins phosphorylation ser, thr YES regulation enzymes tyr YES regulation signaling cascades IV. THE SECONDARY STRUCTURE OF PROTEINS The secondary structure of a protein is the three-dimensional conformation of its main polypeptide chain, due to hydrogen bonding. Its main features can be understood in terms of the features of the peptide bond itself - its flatness and fixed bond angles, the rotational freedom about a carbon atoms, and the hydrogen bonding potential of the carbonyl oxygen and -NH groups of each amino acid residue. Three motifs dominate the secondary structures of proteins - α-helices, antiparallel β-pleated sheets, and β-bends. The slides used to illustrate the key points of the discussion of secondary structure in lecture are all available on the MGB web site. V. THE TERTIARY STRUCTURE OF PRO- TEINS The orientation of all of the atoms in a protein - its full three dimensional structure - is called its tertiary structure. It is determined experimenproteomics - 3
tally, by X-ray crystallography. Myoglobin. Key features are the extensive α-helix, the concentration of hydrophobic residues on the interior of the molecule and hydrophilic ones on its surface, and the large hydrophobic pocket that holds the heme prosthetic group. Although not clearly shown in the diagram, the structure is compact. If side chains of amino acid residues were shown, there would be very little empty space in the interior of the molecule. Immunoglobulins. The diagram shows the structure of an immunoglobulin light chain variable region. This region of the protein folds into a single compact domain, with a prominent antiparallel β-pleated sheet structure. The loops of polypeptide chain that make up the antigen binding site of the light chain (in red, at the top of the domain) correspond to the hypervariable stretches of amino acid sequence, while the β-pleated sheet backbone corresponds to the sequences that show little variation. A pair of cysteine residues forms a disulfide bond that helps to hold the two β-pleated sheets in the domain together. VI. THE QUATERNARY STRUCTURE OF PROTEINS Quaternary structure refers to the assembly of individual folded protein chains into larger complexes. Hemoglobin. Hemoglobin is a tetramer composed of two identical alpha globin proteins and two identical beta globin proteins. Each monomer folds into a three-dimensional conformation very similar to that of the myoglobin monomer, and each contains a heme prosthetic group capable of reversibly binding O 2. Interactions among the subunits, which you can explore further using the computer module, make the binding of O 2, protons, and CO 2 cooperative and interdependent. Immunoglobulins. A ribbon diagram of the whole molecule shows the tight interactions of heavy and light chains, including the interactions of their variable domains to form antigen-binding sites. Note the repeating domain structure. As the body of structural data has grown, empirical approaches have become useful. That is, structural features of newly discovered proteins can often be predicted based on the known structures of related proteins or protein domains. A particularly useful example of this strategy is the use of hydrophobicity plots to predict membranespanning domains. proteomics - 4
VII. PROTEIN FOLDING IN VIVO While the native folded conformation of a typical protein is energetically optimal, and most proteins can get to this conformation spontaneously, mechanisms exist in vivo to increase the rate of folding and to improve its efficiency. These chaperone mechanisms are also important because some proteins are capable of spontaneously folding into more than one stable conformation, and chaperones are used in the body to direct such proteins preferentially into a correct conformation. The alternative fates open to such a protein are diagrammed in the cartoon. Steps on the pathway that a newly synthesized polypeptide (a) spontaneously follows towards correct folding are shown in red (b, c), and ones assisted by a chaperone (g) are shown in green. Once a protein monomer is correctly folded, it may go on to form multimers (d, e, f) of various sorts (blue). Alternatively, the protein may spontaneously form disordered aggregates (h), or ordered but dysfunctional aggregates (amyloid i). Key points are: Chaperones provide a sheltered environment in which aggregation-prone intermediates are protected. Chaperone activity requires energy. Different chaperones are specialized to help the folding of different classes of cellular proteins. Ordered protein aggregates (i) can be extremely stable once they form, and appear to serve as nuclei to recruit additional misfolded monomers. This chemical feature may explain the infectious mechanism of prion diseases. VIII. PROTEINS AS CATALYSTS OF CHEMICAL REACTIONS A major role of proteins that concerns us in this course is their function as enzymes, enabling chemical reactions in cells to proceed rapidly and specifically under chemically mild conditions. The remainder of this segment of the course will focus on enzymes, starting with a brief consideration of relevant reaction mechanisms, followed by the development of a quantitative model useful for understanding and manipulating reactions in cells and in clinical settings, and finishing with a qualitative review of key aspects of enzyme regulation. A. TRANSITION STATES IN CHEMICAL REACTIONS Transition states are high-energy, unstable structures intermediate between reactants and products. Enzymes provide favorable binding surfaces for reactant molecules that are near a transiproteomics - 5
tion state. The binding surfaces of enzymes also provide chemical groups that facilitate the reaction. These groups may be the enzyme's own amino acids, bound prosthetic groups, or other bound reactant(s). By binding molecules that are primed to react, and providing them with an environment in which completion of the reaction is favored, enzymes can enormously increase both the specificity and the rate of a chemical reaction. B. THE SERINE PROTEASE REACTION MECHANISM Spontaneous hydrolysis of peptide bonds is energetically favorable, but occurs slowly if at all under physiological conditions. There are four classes of proteases (enzymes that catalyze peptide bond hydrolysis) distinguished by the structures of their active sites: serine, zinc, thiol, and aspartyl proteases. Serine proteases are a large family of enzymes that carry out many physiologically important reactions, including protein digestion in the lumen of the small intestine (trypsin, chymotrypsin), and reactions that are key steps of the blood clotting cascade (thrombin). Chymotrypsin serves as a useful illustration of the means by which enzymes function as efficient catalysts and achieve specificity. 1. The active site of a serine protease. Three residues of chymotrypsin, ser-195, his-57, and asp-102, are key elements of the enzyme's active site, forming a so-called catalytic triad. These residues, and their three-dimensional relationship to one another, are conserved in all known serine proteases. The serine -OH group attacks the carbonyl carbon atom of the peptide bond that is to be cleaved, forming a covalent tetrahedral intermediate. The resulting negatively charged carbonyl oxygen atom is stabilized by hydrogen bonding to nearby peptide NH groups in the oxyanion hole of the enzyme. His-57 accepts a proton from ser-195, then donates it to the peptide nitrogen, causing cleavage of the substrate C-N bond and release of the carboxy terminal part of the substrate from the enzyme. The amino terminal part of the substrate remains in place, esterified to the serine -OH group. A water molecule then enters the active site and reacts with this complex, releasing the amino terminal part of the substrate and restoring the enzyme to its original state. 2. The specificity pocket of a serine protease. A large pocket lined with hydrophobic amino acid residues adjoins the catalytic triad in the substrate binding cleft of chymotrypsin. Bulky aromatic side chains (e.g., phe) bind strongly to this pocket and chymotrypsin thus shows greatest activity against peptide bonds on the carboxyl side of such residues. Another serine protease, trypsin, preferentially cleaves peptide bonds on the carboxyl side of arg and lys residues. Its specificity pocket is likewise large, but with an asp residue at its bottom, favoring the binding of large, positively charged amino acid residues. IX. MICHAELIS-MENTEN KINETICS This piece of algebra, now almost a hundred years old, provides a remarkably accurate way of describing diverse reactions in cells, and of understanding how to manipulate those reactions in both basic research and clinical settings. Here, we will briefly show how the equation is developed as a way of understanding what it can be used for. A. THE MICHAELIS-MENTEN EQUATION Many enzyme-catalyzed reactions show hyperbolic kinetics. By considering only reactions with single substrates and products, postulating that product binding to enzyme is relatively weak, postulating the existence of a steady-state level of enzyme-substrate complex, and considering only the initial velocity of the reaction (when [P] 0), Michaelis and Menten derived an equation that provides a remarkably general and useful description of enzyme-catalyzed reactions. proteomics - 6
Here, we also assume that the total concentration of substrate, [S T ], is much greater than the total concentration of enzyme, [E T ], so the concentration of free substrate, [S], is essentially equal to [S T ]. Enzyme and substrate molecules interact to form an enzyme-substrate complex, which reacts to regenerate the enzyme molecule and yield a molecule of product: k 2 k 3 E + S ES E + P k 1 The rate of formation of product, V, equals k 3 [ES]. Some algebraic rearrangements allow us to express [ES] in terms of more easily measurable quantities, and yield the Michaelis-Menten equation: V = V max [S] / ([S] + K M ) Many single substrate reactions are more complex, involving pathways such as E + S ES ES EP E + P where ES' is an intermediate compound such as a covalent enzyme-substrate complex, and EP is an enzyme-product complex. Nevertheless, application of steady state assumptions to the intermediates ES, ES', and EP results in an equation identical to the simple Michaelis-Menten equation, although the constants k 3 and K M are more complex (They include more reaction rate constants than for the simple system). In addition, many enzyme-catalyzed reactions involve two or more substrates. The kinetic analysis of such reactions can often be simplified by maintaining one or more substrates at saturating levels and measuring V as a function of the concentration of the remaining substrate. In these cases, a plot of V vs. [S] can result in a typical hyperbolic curve. B. THE LINEWEAVER-BURK PLOT Rearranging the Michaelis-Menten equation to express 1/V as a function of 1/[S] gives a linear relationship that is useful for data analysis since it allows K M and V max values to be determined graphically. X. SPECIFIC INHIBITION OF ENZYMATIC REACTIONS 1/V 1/V = (1/[S]) K M / V max + 1 / V max 0 1/[S] The slope of the 1/V vs. 1/[S] plot for an enzymatic reaction equals K M /V max, the Y intercept of the plot equals 1/V max, and the X intercept equals 1/ K M. Competitive inhibitors. Substrate analogs bind to enzyme active sites, blocking access by productive substrate moleproteomics - 7
cules. A competitive inhibitor does not affect the V max of a reaction, but increases its apparent K M. Non-competitive inhibitors. Inhibitor molecules binding to specific sites on an enzyme distinct from the active site can induce conformational changes that make the enzyme ineffective as a catalyst, even though substrate continues to bind at the active site. A non-competitive inhibitor reduces the apparent V max of a reaction, but does not affect its K M. The use of Lineweaver-Burk plots to distinguish forms of inhibition. 1/V + inhibitor no inhibitor Competitive inhibitor. An enzymatic reaction in the presence of a constant amount of a competitive inhibitor (solid line). The enzyme s apparent K M increases but its V max is unchanged relative to the reaction proceeding with no inhibitor. 1/V 0 0 1/[S] + inhibitor no inhibitor 1/[S] Non-competitive inhibitor. An enzymatic reaction in the presence of a constant amount of a non-competitive inhibitor (solid line). The enzyme s apparent K M is unchanged but its V max falls relative to the reaction proceeding with no inhibitor. XI. TWO CLINICALLY IMPORTANT PROTEASE INHIBITORS Serpins (serine protease inhibitors) At a biological level, serpins illustrate the use of enzyme inhibition as a normal regulatory mechanism. At a chemical level, their mode of action illustrates the suicide-substrate variant on competitive inhibition. A serpin has a compact globular shape, with an exposed loop of peptide chain on its surface that acts as a bait: the loop fits tightly in the substrate binding groove of a specific serine protease, and a peptide bond in the loop is cleaved. The cleavage products remain tightly bound to the serine protease, however, and water cannot enter to enable hydrolysis of the acyl enzyme intermediate. Inhibitors of aspartyl proteases Aspartyl proteases contain two globular domains. The enzyme's active site is a deep groove formed by the interaction of the two domains, containing two aspartate residues, one from each domain. Human aspartyl proteases include pepsin, the major proteolytic enzyme in gastric juice, and renin, a kidney enzyme whose function is to activate angiotensinogen by specific proteolytic cleavage and thereby regulate blood pressure. HIV-1 protease is also an aspartyl protease, and mediates cleavages of HIV proteins needed for virus particle assembly. The HIV-1 protease is a dimer of two identical domains, each corresponding to one of the domains of the related human proteins. The perfect symmetry of the HIV-1 protease structure has been exploited to design low molecuproteomics - 8
lar weight inhibitors that can bind it strongly and specifically while sparing homologous human proteins. XII. ALLOSTERY For allosteric enzymes, reaction rates often vary with substrate concentration in a sigmoidal way, rather than the hyperbolic one of Michaelis-Menten enzymes, and these sigmoidal plots are modified in characteristic ways in the presence of activators and inhibitors. These phenomena are illustrated by the oxygen-binding properties of tetrameric hemoglobin, which can be compared to the non-cooperative binding of O 2 by monomeric myoglobin. Key physiological phenomena include: O 2 binding to hemoglobin (but not myoglobin) is cooperative. This allows efficient transfer of O 2 from hemoglobin to myoglobin in tissues. High levels of H + promote O 2 release from hemoglobin, while high levels of O 2 promote H + release. 2,3-bisphosphoglycerate (2,3BPG) lowers the affinity of hemoglobin for O 2. The molecular explanation for these phenomena is that the quaternary structure of hemoglobin is altered by O 2 binding. Specifically, the heme iron is pulled into the plane of the heme molecule, moving the liganded histidine residue and setting off a cascade of conformational changes throughout the globin monomer. These in turn set off conformational changes in the other monomers, affecting their affinities for oxygen and H+. The presence of one molecule of 2,3BPG in the central cavity of the deoxyhemoglobin tetramer interferes with these conformational changes. Oxygenation of the hemoglobin tetramer causes 2,3BPG to be released from its binding site in the hemoglobin tetramer. Some useful terminology: The subunits of an allosteric protein are said to switch between R(elaxed) and T(ense) forms, in which they show high affinity / enzymatic activity and low affinity / enzymatic activity, respectively. Compounds that favor the conversion of an allosteric protein to its R form are allosteric activators; those that favor conversion to the T form are allosteric inhibitors. Homotropic effects are ones due to interactions between identical ligands (e.g., the binding of one O2 increases the affinity for subsequent O 2 's); heterotropic effects are ones due to interactions between different ligands (the binding of H + decreases the affinity for O 2 ). XIII. MATERIALS FOR FURTHER STUDY ON YOUR OWN The article by CM Dodson ( Getting out of shape Nature 418:729-730, 2002), which reviews the role of protein misfolding in several common and serious human diseases. proteomics - 9