Prediction of In- Vivo Modification Sites of Proteins from Their Primary Structures

Size: px
Start display at page:

Download "Prediction of In- Vivo Modification Sites of Proteins from Their Primary Structures"

Transcription

1 J, Biochem. 104, (1988) Prediction of In- Vivo Modification Sites of Proteins from Their Primary Structures Kenta Nakai and Minoru Kanehisa' Institute for Chemical Research, Kyoto University, Uji, Kyoto 611 Received for publication, June 13, 1988 In order to make better use of the information contained in rapidly expanding amino acid sequence data, a new method to predict various modification sites of proteins from their primary structures is presented. It is also applicable to the prediction of other functional sites in proteins. Here we show the examples of N-glycosylation and serine/threonine phosphorylation sites. The method is essentially an elaboration of consensus sequence pattern matching based on stepwise discriminant analysis. The occurring amino acids near a potential modification site are represented by six numerical values which reflect various properties of amino acids. Longer-range effects around these sites are also considered. The stepwise procedure enabled us to automatically select effective features for discrimination. A computer program with our method first identifies potential modification sites by a sequence pattern, NX(S/T) for N-glycosylation or (S/T) for phosphorylation, and then decides by discriminant analysis whether a potential site is likely to be a true modification site. The prediction accuracy in the second step of discrimination was about 60% for glycosylation sites and about 80% for phosphorylation sites. The rapid growth of the quantity of DNA sequence data is one of the most remarkable events in biosciences today. The size of the GenBank database is doubling almost every year, but this is likely to be accelerated further as the sequencing of large genomes proceeds. Although various physiological phenomena in living systems are ultimately encoded in genomic DNA, sequencing of genes does not necessarily mean their clarification. In this respect, what we can learn from sequence data is still very poor and insufficient. One of the established methods for obtaining insights into higher structural and functional properties of proteins from sequence data is to search for homologous sequences in the databases. In many cases, however, strong homologies are absent and the significance of weak homologies is difficult to assess. It is also customary to perform secondary structure predictions from the amino acid sequence. But their accuracies are too low to be useful in practice. Despite these limited success rates, computational methods are frequently used and there seems to be a great demand on new sequence analysis methods which make up for existing methods. In this paper we propose a new approach toward the prediction of protein functional sites from its primary structure. Suppose we have to locate, in a newly determined sequence, a DNA binding site, a calcium binding site, and an in- vivo modification site, by sequence analysis. The ordinary way to do this is to use the pattern matching against the so-called consensus sequence. For instance, when there is an N-X-(S or T) pattern in the sequence (here X denotes any amino acid), the site is called a potential glycosylation site (1). The consensus sequence of Tufty and Kretsinger is also well-known with regards to calcium To whom correspondence should be addressed. Abbreviation: PKA, camp-dependent protein kinase, binding sites (2). However, the consensus is basically a short, weak sequence pattern, and when the whole amino acid sequence database is searched, there are usually many additional sites matching only by chance. Our approach is similar to the consensus matching method in the sense that it is also based on the identification of common features among related sites. However, it contains three new elements which will overcome the problems inherent in the simple pattern matching approach. They are: (i) preparation of data set, (ii) numerical representation of amino acid sequences, and (iii) objective selection of features. First, the data set is prepared with practical applications in mind. Suppose one wishes to locate an N-glycosylation site, he or she will no doubt use the NX(S/T) consensus first. The question is therefore: is it possible to discriminate true glycosylation sites from false analogues merely having the NX(S/T) pattern? The consensus sequence is derived only from a set of true sites; there is no control group of false sites. In contrast, our approach uses data sets of both true and false sites. This is the most critically different aspect of our approach. By comparing the patterns of true and false sites, it is possible to establish a practical criterion for discrimination. Second, common features in the sequence may be too weak to be represented by a limited number of characters. This is another drawback of consensus pattern matching. Because the amino acids are represented by characters, they can be either matched or not, although it is possible to include multiple matches. In reality, however, each amino acid has many characteristics which cannot be represented by binary digits. There have been a number of reports on so-called amino acid indices which reflect various aspects of amino acid residues, as summarized in our previous analysis (3). We introduce the numerical representation of Vol No. 5, y'1

2 694 K. Nakai and M. Kanehisa amino acids selected from our database of over 200 published amino acid indices. The numerical representation is also convenient for incorporating environmental factors, namely, long-range effects of residues surrounding the consensus. Thus, we should be able to combine weak signals within and outside of the consensus to improve the accuracy of prediction. Third, the derivation of a consensus sequence is usually based on visual inspection and tends to be subjective. In contrast, we use an objective approach, the stepwise variable selection method by discriminant analysis, which enables us to select features automatically. Recently, another automatic approach for feature detection was developed and applied to discrimination of protein secondary structural segments (4). These automatic procedures are to be compared with our previous method by discriminant analysis (5) where the variables were selected manually. In this paper, we apply the stepwise variable selection method to the prediction of in-vivo modification sites of proteins, especially the N-glycosylation sites (1) and serine/threonine phosphorylation sites (6). MATERIALS AND METHODS Collection of Data-In order to distinguish true modification sites from false analogues, it is necessary to collect both groups of sequence data. False analogues, in our definition, contain sequence patterns similar to true sites, the NX(S/T) consensus pattern in N-glycosylation sites and the (S/T) residue in Ser/Thr phosphorylation sites. False analogues form a control group in the sense that-they are assumed to contain different, or possibly negative, patterns other than the consensus specified above. In contrast, true modification sites are expected to contain positive patterns around the consensus. The sequences around truly modified sites were mainly collected from the NBRF-PIR database (7) Release 13.0 (June 1987). While the positions of the modified sites are reported in the database, there is no description of the sites which are not modified. Therefore, in the sequences reported to have real modification sites, other potential sites are assumed to be non-modified. Strictly speaking, this assumption may not be correct. But when the data set is sufficiently large, errors are expected to be negligible. In the case of N-glycosylation, the data set for false analogues was relatively small as the non-modified sites were selected from glycoproteins, namely, proteins that have at least one modified NX(S/T) pattern. Thus, we also collected data from another source. In 1970 Hunt and Dayhoff (8) studied glycosylated and non-glycosylated sites in their sequence database. We used their compilation of non-glycosylated sites. Because all of these sites existed in non-glycosylated proteins, there was no overlap with the sites collected from the NBRF database. As for the Ser/Thr phosphorylation sites, another level of complication exists. The information concerning the type of enzyme, i.e., protein kinase, that causes phosphorylation is not usually reported in the NBRF database. On the other hand, there are many protein kinases identified in vivo and each kinase seems to have its own substrate specificity (9). Engstrom et al. (10) compiled a list of the in-vitro phosphorylation sites of known kinases. We use their substrate data for the camp-dependent protein kinase (PKA), which is the most extensive. Stepwise Discriminant Analysis-We give a minimum description of the stepwise discriminant analysis here. For further details, see appropriate textbooks (Ref. 11, for example). Discriminant analysis makes it possible to allocate an individual to one of the pre-defined groups on the basis of measurements or values of given variables. It uses the so-called discriminant function for allocation, which is a linear combination of variables with coefficients optimized to best discriminate the given groups of data, i.e., the training set. In our analysis, discrimination is usually performed between two groups, the sequence data of modified and non-modified sites. However, when we analyze the Ser/Thr phosphorylation sites, we also perform a discrimination between the three groups by further dividing modified sites into two groups. Starting from the repertoire of numerous variables, we wish to select a minimum set of variables that gives a sufficiently good discrimination. Generally speaking, it is desirable to combine many variables in order to obtain better discrimination. However, because the number of possible combinations in the repertoire is enormous, it is impractical to test all of them. Moreover, when a new variable is added for discrimination, it may not actually contribute significantly although the accuracy will not decrease. In fact, even if a variable works quite well by itself, it may not work as well in combination with other variables. Since there is no way to tell beforehand whether the added variable is going to contribute effectively, a procedure called the stepwise discriminant analysis is applied. This procedure has another implication; since we prepare the repertoire as a comprehensive set of variables representing various aspects of sequence data, it plays, in effect, the role of feature detection. The stepwise discriminant analysis examines all candidates and enters them into or removes them from the discriminant function one-by-one, according to a pre-established criterion, until a stopping condition is satisfied. In practice, we used the program package FACOM/ ANALYST to perform these analyses (12). Estimation of Prediction Efficiency-Once the discriminant function is determined from the training set, it can be applied to an individual outside the training set. This we call prediction. The efficiency of prediction is, in general, worse than the efficiency of self-discrimination which is the discrimination of individuals within the training set. One way to estimate the prediction efficiency is to divide the available data into the training set and the test set. However, because the data set is rather small in our case, it may not be enough to make a reliable estimation. Thus we use an alternative method, the U-method of jack-knifing, defined as follows (13) : Extract an individual from the sample data and calculate the discriminant function based on the remainder, which is then used to predict the extracted one. By repeating this procedure for each individual in the data set, the average degree of prediction is calculated. This is the estimated prediction efficiency. Variables for Discrimination-The sample data collected as described above are aligned at the possible modification site, the Asn residue for N-glycosylation and the Ser or Thr residue for phosphorylation, without any insertion or deletion. As shown in Fig. 1, we consider a segment of 31 residues containing 15 residues each on both sides of the J. Biochem.

3 Prediction of Modification Sites from Amino Acid Sequences 695 modification site. The relative position within this segment is specified by the numbering shown: 0 for the modification site, positive numbers toward the carboxyl terminus, and negative numbers toward the amino terminus. From the aligned sequence data set we will quantify the importance of respective positions in the 31 residue segment. For this purpose we introduce numerical representations of the amino acid sequence. The 20 naturally occurring amino acids have various physicochemical and biochemical properties which are supposed to play important roles in the manifestation of biological functions in proteins. These properties are often quantified by numerical values called amino acid indices. In our previous analysis (3), we classified 222 published amino acid indices into several categories. We use the following indices, which are representative ones from different categories, in the present analysis: a: a-propensity by Robson and Suzuki (14), f3: /3-propensity by Kanehisa and Tsong (15), Turn: f8-turn propensity by Chou and Fasman (16), Hail: hydrophobicity by Eisenberg (17), Hqf 2: hydrophobicity by Fauchere and Pliska (18), Size: residue size by Chothia (19). a, fl, and turn propensities are the preference of amino acid residues to be in these secondary structure conformations. The two hydrophobicity indices represent somewhat different aspects: HO 1 belongs to a category of hydrophobicity scales for amino acids in proteins derived from the analysis of X-ray crystal structures, while Hdi2, the partition energy of amino acids, is derived from experimental measurements of free amino acids. The last one is the physical size of an amino acid residue. We define the variables for discriminant analysis as follows (see Fig. 1): X(1)-X(11): the values of numerical indices at each position at and 5 neighboring residues around the modification site X(12), X(13): the averages of numerical indices over 1o residues, the 6th through 15th neighboring residues X(14), X(15): the averages of numerical indices over 5 immediately neighboring residues N: the total number of residues R: the relative position from the amino terminus The position dependence of a numerical index is represented by X(i) where i corresponds to a residue or a segment of residues. For example, the value of a at position -2 is called a(4). Thus, because we use 6 different numerical indices, there are, in total, 92 variables for consideration, including the total number of residues N and the relative location of the modification site R. The stepwise discriminant analysis then selects a smaller number of optimized variables with position-specific weightings. Note that the variables at position 0 will have any relevance only in the case of Ser/Thr phosphorylation. When the possible modification site is located in the vicinity of the N or C terminus and some portion of the neighboring 15 residues in either direction does not exist, we simply insert blanks as special amino acids which have 0 values for all properties. RESULTS N-Glycosylation-The number of N-glycosylation sites collected from the NBRF database was 394. Since many glycoproteins had multiple glycosylation sites, the number of proteins was 177. The control group from the NBRF database had 80 elements in 54 proteins, which are potential, but false, glycosylation sites in glycoproteins. In addition, the data of Hunt and Dayhoff contained 59 sites in 32 proteins, as false sites in non-glycosylated proteins. It seems to be widely accepted that the occurrence of the NX(S/T) sequence is necessary for N-glycosylation. However, in the NBRF database, there were four exceptions. Three NXC patterns were found in the N-glycosylation sites of protein C precursor (human and bovine) and of von Willebrand factor precursor (human). Apparently, cysteine can take the place of serine or threonine in the N-glycosylation reaction (20). An NGG pattern was found in immunoglobulin heavy chain V regions (mouse). These data were excluded from our analysis. We prepared the following two training sets. Training set 1: both of the N-glycosylated and nonglycosylated sites are from the NBRF database. TABLE I. Results of discriminant analysis for N-glycosylation sites. Fig. 1. The definition of variables around a potential modification site. The numbering of residues is: 0 for the site, negative toward the amino end, and positive toward the carboxyl end. One of the six numerical indices, a, 8, turn, H(b1, H02, and size, is represented by X. The value of X at the site is denoted by X(6), and of each of the five immediate neighbors on both sides is denoted by X(1) through X(5) and X(7) through X(11). The mean values of X are defined as illustrated. Vol No. 5, 1988

4 696 K. Nakai and M. Kanehisa TABLE II. Selected variables for the discrimination of N-glycosylation sites, a Variable number defined in Fig. 1. TABLE III. Results of discriminant analysis for Ser/Thr phosphorylation sites. TABLE IV. Selected variables for the discrimination of Ser/Thr phosphorylation sites. Training set 2: the non-glycosylated sites from Hunt- Dayhoff's compilation were added to training set 1. According to the stepwise discriminant analysis, a set of optimized variables was selected first for training set 1. Here the stepwise procedure was stopped at the 18th step because remaining variables no longer contributed effectively to discrimination. Table I(a) shows the result of self-discriminating the training set itself by the 18 variables. The rows and columns correspond to the two groups to be discriminated and to be allocated, respectively. The number of individuals allocated from either of the two groups to either of the two groups is shown both in actual numbers and in percentages. The unweighted average of diagonal percentages is regarded as a measure of discriminant accuracy. In this case, it is about 74%. As the ANALYST program excluded some individuals automatically, the total number of the allocated data is somewhat smaller than the number of initially collected data. Next, Hunt-Dayhoff's data for non-glycosylated sites were combined with the data from the NBRF database (training set 2). As the control group became larger, we hoped to obtain a more reliable result. However, as shown in Table I(b), which is the result of self-discrimination with another optimized set of 18 variables, the accuracy was somewhat lower. We adopted the U-method of jack-knifing in order to J. Biochem.

5 Prediction of Modification Sites from Amino Acid Sequences 697 TABLE V. Prediction of camp-dependent phosphorylation sites in bovine a-crystallin A chain. estimate the prediction accuracy when the discriminant function is applied to unknown data, as shown in Table I(c). Training set 1 was used here. The ability to discriminate between true and false sites which are outside the training set is around 60% in contrast to the self-discrimination of over 70% (Table I(a)). The 18 variables selected in the stepwise discriminant analysis using training set 1 are summarized in Table II. The selected variables are represented by circles. The ones selected in the first 3 steps are denoted by double circles. Roughly speaking, the variables selected earlier are considered to give more important contributions. We have also examined the variables selected in the analysis with training set 2 and in the jack-knifing procedure (data not shown). While the correspondence between these cases was not perfect, the relative position R, the fl-turn propensity at position + 2, and the a propensity and size averaged over the region are conserved relatively well, suggesting that environmental factors are better determinants than site-specific factors. Ser/Thr Phosphorylation-The number of Ser/Thr phosphorylation sites collected from the NBRF database was 66 in 30 proteins. In contrast, the number of all other serines and threonines in these proteins amounted to 847, which formed the control group of non-phosphorylated sites. The amino acid distributions at positions around the Ser/Thr site did not show any significant bias in the control group (data not shown). Because half of the collected phosphorylation sites (33 sites) were contained in caseins, and because the phosphorylation of caseins appeared to be somewhat different (see "DISCUSSION"), we also performed discrimination among three groups: phosphorylation sites in caseins, phosphorylation sites in other proteins, and non-phosphorylated sites. The number of substrate sites for camp-dependent protein kinase (PKA) from the compilation of Engstrom et al. (10) was 21 in 17 proteins, including peptide fragments. Three of them were also in the NBRF data. The control group sites were collected from the NBRF database in these 17 proteins and amounted to 526. With these data we prepared training sets as follows: Training set 1: phosphorylated and non-phosphorylated sites from the NBRF database Training set 2: the phosphorylated sites in training set 1 are divided into casein sites and non-casein sites Training set 3: phosphorylated and non-phosphorylated sites in PKA substrate proteins from the compilation of Engstrom et al. The discrimination result for training set 1 is shown in Table III(a) in the same way as in Table I(a). The stepwise procedure was stopped at the 15th step because an overflow occurred at step 20. The discrimination accuracy of the training set was about 88%. The result of three-group discrimination with training set 2 is shown in Table 111(b). Twenty steps were executed. It can be seen that to distinguish the sites in caseins is easier than to distinguish other ones. However, the overall accuracy, the average of the three diagonal values, did not improve. The discrimination result of PKA substrate sites using training set 3 is shown in Table III(c). Because there were some proteins whose sequences were not determined completely, the global variables N and R were omitted in this case, from the repertoire of variables to be selected. Almost perfect discrimination was possible, although parameter fitting against a small number of data often becomes too specific to the original data. The stepwise procedure was stopped at the 20th step. Table III(d) shows the accuracy of predicting unknown Ser/Thr phosphorylation sites estimated by the U-method of jack-knifing. Training set 1 was used. Compared with the result of self-discrimination (Table III(a)), the ability to identify true modification sites dropped about 10% because more true sites were classified as false, while the ability to identify false sites was about the same. As the overall accuracy is still high, the derived discriminant function seems to be of practical use. The prediction accuracy with training set 3 for PKA substrates was also calculated. It decreased drastically when compared with the selfdiscrimination accuracy; only 16 out of 21 true sites (76%) were correctly predicted. This is probably due to the too-specific nature of the variables optimized for the small data set. The variables selected in the discrimination of training sets 2 and 3 are summarized in Tables IV(a) and IV(b), respectively. The result with training set 1 was similar to Table IV(a). The symbols used here are the same as those used in Table II. In Table IV(a) no strong tendency for selecting any amino acid property at any position can be recognized. But it seems that the information about the position + 2 is relatively important. It may reflect the fact that, in many cases, the phosphorylated Ser/Thr is separated from a basic amino acid (Lys or Arg) by one residue (21). V,,' 104, No. 5, 1988

6 698 K. Nakai and M. Kanehisa Note that the variable at position 0 is also selected. It shows that the preference for phosphorylation differs between serines and threonines. In Table IV(b) we can see that the N-terminus side of the modified site, especially from -2 to -4, is essential for discrimination. In addition, hydrophobicity and 8-propensity seem to be relatively more important (see "DISCUSSION" for more details). DISCUSSION We have considered determinants for modification sites mostly from the point of view of protein structures as they are encoded in the amino acid sequence. For example, when a certain modification occurs post-translationally, it is unlikely to involve buried residues. Thus, the consensus sequence is a result of molecular interaction between the modifying enzyme and the modified protein. As pointed out by Wold (22), there exist at least two other types of determinants. Namely, the compartment location of the enzymes involved in processing and the temporal changes in substrate structures during biosynthesis and transport. It will be interesting to see whether these factors can also be identifiable from the amino acid sequence data. For example, the first factor may be related to localization signals in the amino acid sequence like signal sequences, although it will be difficult to identify all the enzymes in the compartment from sequence information alone. The second factor must also be encoded ultimately in the amino acid sequence, although its clarification would be as difficult as the problem of protein folding. Whatever the actual mechanism is, the present approach is an empirical one identifying characteristic features in the amino acid sequence around modified sites. These features have practical implications for use in prediction, as well as providing insights into the molecular mechanisms of modification reactions. Hunt and Dayhoff (8) examined the occurrences of NX(S/T) patterns and bound carbohydrates in their collection of amino acid sequences. Not more than 20 of the 101 NX(S/T) sites in their collection were N-glycosylated. While they observed that the occurrence of the NX(S/T) pattern was much less frequent than that of similar patterns, such as (S/T)XN and N(S/T), they occur as frequently, in the present NBRF database. It is possible that our data set will turn out to contain such a bias in the future. We examined if any family of closely related proteins occupied most of the data, but no serious overlaps were found. More recently Mononen and Karjalainen (23) collected and analyzed possible N-glycosylation sites. In their collection, 139 out of 196 sites were glycosylated; namely, most potential sites were actually modified, which agrees with our result. Of course, it is premature to conclude that this reflects the natural ratio of glycosylated to non-glycosylated sites in general. However, it seems possible to say that in glycoproteins many of the possible sites are actually glycosylated. Mononen and Karjalainen could not find marked differences in sequence patterns between glycosylated and non-glycosylated sites, and they suggested that the proposed 8-turn structure (24) was not a determinant as far as it was estimated by the secondary structure prediction. Not only these earlier results, but also ours suggest the difficulty of predicting N-glycosylation sites precisely. The estimated accuracy of predicting unknown data is about 60%. However, inspection of commonly selected variables still seems to reveal some interesting features. First, the variable selected at position + 2 was 8-turn propensity, which suggests a difference in inclination to be glycosylated between the NXT and NXS patterns. Although there is a natural difference in the occurrences of serines and threonines, the difference of the modification tendency is much stronger. Second, the fact that the global variables are selected relatively well suggests a wider interaction range involved. Perhaps the structure surrounding a potential glycosylation site during the elongation process is, at least, as important as the recognition sequence of the enzyme catalyzing the reaction, since the modification occurs cotranslationally. Furthermore, the strong selection of variable R implies the importance of relative locations in the primary structure. Generally speaking, N-glycosylation sites appear more frequently near amino terminal regions than carboxyl terminal regions, which may be interpreted as glycosylation reactions occurring at earlier stages in peptide synthesis. The Ser/Thr phosphorylation is often used in the turning on or off of various kinds of molecular mechanisms with physiological significance. The reaction is somewhat unstable and modified sites may be different under different conditions in vitro, which makes it difficult to evaluate the collected data. In addition, there are different classes of protein kinases in vivo and each has its own substrate specificity. In this sense, phosphorylation sites constitute a set of different components and a unified prediction of all types of Ser/Thr phosphorylation sites may be difficult. On the other hand, because of its biological importance, the amount of data available is relatively large for phosphorylation sites,which makes it suitable to treat with empirical approaches. Despite different classes, many protein kinase actually have a common character in their substrate specificities; they usually require basic residues near the acceptor sites. Williams (21) pointed out that phosphorylated sites were frequently separated from lysine or arginine by one amino acid. This still seems to hold to some extent. However, according to our analysis of amino acid compositions, the existence of basic residues was not so outstanding. Furthermore, there were also quite a few examples of basic residues existing near a Ser/Thr residue which was actually not phosphorylated. Thus, it is difficult to discriminate a phosphorylation site from the neighboring basic residues only. The results for phosphorylation sites turned out to be relatively good. The accuracy was close to 90% for selfdiscriminating the training set and over 80% for predicting unknown data. Because the phosphorylation sites in caseins occupy one half of our data set and because the phosphorylation of caseins appears to have somewhat different characters, they were then treated separately. However, the separation did not raise the accuracy. Casein kinases prefer acidic residues around acceptor sites while many other protein kinases prefer basic residues as substrate specificities. It is therefore surprising that a single discriminant function can deal with both basic and acidic residues effectively. Possibly, it reflects a common mechanism of phosphorylation reactions. Indeed there is another J. Biorhem.

7 Prediction of Modification Sites from Amino Acid Sequences 699 class of protein kinases, namely, tyrosine kinases, which also seem to require acidic residues and share conserved sequences (25). The variables selected for Ser/Thr phosphorylation sites were more or less dispersed over various positions and properties (Table IV(a)). However, careful inspection may suggest the importance of positions + / - 2 and - 4, which correspond well to the positions suggested by Williams (21). In contrast, the pattern of selected variables in the discrimination of PKA substrates was quite different (Table IV(b)). The pattern suggests the importance of the region toward the amino end, especially positions -2 to -4. This coincides with suggestions from experiments using peptide analogues where the substrate specificity of PKA was analyzed (26). In order to demonstrate how to utilize our method, we show an example of the prediction of camp-dependent phosphorylation sites in bovine a -crystalline A chain. The result of running our program is shown in Table V. It first locates potential sites by simple sequence pattern matching; in the present case all serines and threonines in the sequence were treated as potential phosphorylation sites. Then the program evaluated their potency by the discriminant function derived from training set 3 of the Ser/ Thr phosphorylation sites. As shown in Table V, three sites were predicted to be phosphorylated and one of them (Ser 122) was reported to be an actual phosphorylation site by Voorter et al. (27). The sequence pattern preceding the phosphorylated serine is RRYRLPS, while the proposed recognition sequence of PKA is either KRXXS or RRXS (26). In this paper we tried to predict two types of modification sites from the amino acid sequence. However, our method is also applicable to other functional sites of proteins, once proper data sets are prepared for both true and false functional sites. In reality, however, it requires a major effort to prepare a well-verified control group, simply because there seldom exist experimental data about sites which are, for example, not modified in vivo. The selection of variables that characterize different features in true and false sites can also be developed further. In the present paper we defined variables representing both position-specific properties and global properties covering longer ranges of sequence data, and used stepwise discriminant analysis for the objective selection of variables. There are certainly other ways to set up the repertoire of variables and to select variables with optimally discriminating features. We believe that the prediction of protein function will become more useful and effective means in sequence analyses, as the development of both databases and algorithms continues. We thank Dr. Sho Takahashi for helping us collect the data of modification sites. This work was partly supported by the Protein Engineering Research Institue. REFERENCES 1. Wagh, P.V. & Bahl, O.P. (1981) CRC Crit. Rev. Biochem. 10, Tufty, R.M. & Kretsinger, R.H. (1975) Science 187, Nakai, K., Kidera, A., &Kanehisa, M. (1988) Protein Eng. 2, Kanehisa, M. (1988) Protein Eng. 2, Klein, P., Kanehisa, M., & DeLisi, C. (1984) Biochim. Biophys. Acta 787, Edelman, A.M., Blumenthal, D.K., & Krebs, E.G. (1987) Annu. Rev. Biochem. 56, George, D.G., Barker, W.C., & Hunt, L.T. (1986) Nucleic Acids Res. 14, Hunt, L.T. & Dayhoff, M.O. (1970) Biochem. Biophys. Res. Commun. 39, Hunter, T. (1987) Cell 50, Engstrom, L., Ekman, P., Humble, E., Ragnarsson, U., & Zetterqvist, M. (1984) Methods Enzymol. 107, Afifi, A.A. & Azen, S.P. (1979) Statistical Analysis: A Computer Oriented Approach, 2nd Ed., Academic Press, New York 12. Fujitsu Limited (1984) FACOM ANALYST Kaisetsusho (in Japanese) 13. Mardia, K.V., Kent, J.T., & Bibby, J.H. (1979) Multivariate Analysis, Academic Press, New York 14. Robson, B. & Suzuki, E. (1976) J. Mol. Biol. 107, Kanehisa, M.I. & Tsong, T.Y. (1980) Biopolymers 19, Chou, P.Y. & Fasman, G.D. (1978) Adv. Enzymol. 47, Eisenberg, D. (1984) Annu. Rev. Biochem. 53, Fauchere, J.-L. & Pliska, V. (1983) Eur. J. Med. Chem. 18, Chothia, C. (1975) Nature 254, Bause, E. & Legler, G. (1981) Biochem. J. 195, Williams, R.E. (1976) Science 192, Wold, F. (1981) Annu. Rev. Biochem. 50, Mononen, I. & Karjalainen, E. (1984) Biochim. Biophys. Acta 788, Aubert, J.-P., Biserte, G., & Loucheux-Lefebvre, M.-H. (1976) Arch. Biochem. Biophys. 175, Hunter, T. & Cooper, J.A. (1985) Annu. Rev. Biochem. 54, Krebs, E.G. & Beavo, J.A. (1979) Annu. Rev. Biochem. 48, Voorter, C.E.M., Molders, J.W.M., Bloemendal, H., & de Jong, W.W. (1986) Eur. J. Biochem. 160, Vol. 104, No. 5, 1988

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I Biochemistry - I Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I Hello, welcome to the course Biochemistry 1 conducted by me Dr. S Dasgupta,

More information

Bioinformation Volume 5

Bioinformation Volume 5 Do N-glycoproteins have preference for specific sequons? R Shyama Prasad Rao 1, 2, *, Bernd Wollenweber 1 1 Aarhus University, Department of Genetics and Biotechnology, Forsøgsvej 1, Slagelse 4200, Denmark;

More information

Answers to end of chapter questions

Answers to end of chapter questions Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are

More information

Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000).

Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000). Lecture 2: Principles of Protein Structure: Amino Acids Why study proteins? Proteins underpin every aspect of biological activity and therefore are targets for drug design and medicinal therapy, and in

More information

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER

THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION

More information

Chapter 10. Regulatory Strategy

Chapter 10. Regulatory Strategy Chapter 10 Regulatory Strategy Regulation of enzymatic activity: 1. Allosteric Control. Allosteric proteins have a regulatory site(s) and multiple functional sites Activity of proteins is regulated by

More information

Chapter 4: Information and Knowledge in the Protein Insulin

Chapter 4: Information and Knowledge in the Protein Insulin Chapter 4: Information and Knowledge in the Protein Insulin This chapter will calculate the information and molecular knowledge in a real protein. The techniques discussed in this chapter to calculate

More information

Protein Modification Overview DEFINITION The modification of selected residues in a protein and not as a component of synthesis

Protein Modification Overview DEFINITION The modification of selected residues in a protein and not as a component of synthesis Lecture Four: Protein Modification & Cleavage [Based on Chapters 2, 9, 10 & 11 Berg, Tymoczko & Stryer] (Figures in red are for the 7th Edition) (Figures in Blue are for the 8th Edition) Protein Modification

More information

Summary of Endomembrane-system

Summary of Endomembrane-system Summary of Endomembrane-system 1. Endomembrane System: The structural and functional relationship organelles including ER,Golgi complex, lysosome, endosomes, secretory vesicles. 2. Membrane-bound structures

More information

Section 1 Proteins and Proteomics

Section 1 Proteins and Proteomics Section 1 Proteins and Proteomics Learning Objectives At the end of this assignment, you should be able to: 1. Draw the chemical structure of an amino acid and small peptide. 2. Describe the difference

More information

Page 8/6: The cell. Where to start: Proteins (control a cell) (start/end products)

Page 8/6: The cell. Where to start: Proteins (control a cell) (start/end products) Page 8/6: The cell Where to start: Proteins (control a cell) (start/end products) Page 11/10: Structural hierarchy Proteins Phenotype of organism 3 Dimensional structure Function by interaction THE PROTEIN

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Spring 2016 Protein Structure February 7, 2016 Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called amino acids. Introduction to Protein Structure Amine

More information

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Homework Watch the Bozeman video called, Biological Molecules Objective:

More information

Tala Saleh. Ahmad Attari. Mamoun Ahram

Tala Saleh. Ahmad Attari. Mamoun Ahram 23 Tala Saleh Ahmad Attari Minna Mushtaha Mamoun Ahram In the previous lecture, we discussed the mechanisms of regulating enzymes through inhibitors. Now, we will start this lecture by discussing regulation

More information

Objective: You will be able to explain how the subcomponents of

Objective: You will be able to explain how the subcomponents of Objective: You will be able to explain how the subcomponents of nucleic acids determine the properties of that polymer. Do Now: Read the first two paragraphs from enduring understanding 4.A Essential knowledge:

More information

SEED HAEMATOLOGY. Medical statistics your support when interpreting results SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015

SEED HAEMATOLOGY. Medical statistics your support when interpreting results SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015 SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015 SEED HAEMATOLOGY Medical statistics your support when interpreting results The importance of statistical investigations Modern medicine is often

More information

Properties of amino acids in proteins

Properties of amino acids in proteins Properties of amino acids in proteins one of the primary roles of DNA (but far from the only one!!!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points.

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points. MBB 407/511 Molecular Biology and Biochemistry First Examination - October 1, 2002 Name Social Security Number This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is

More information

Biomolecules: amino acids

Biomolecules: amino acids Biomolecules: amino acids Amino acids Amino acids are the building blocks of proteins They are also part of hormones, neurotransmitters and metabolic intermediates There are 20 different amino acids in

More information

INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1

INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1 INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1 5.1 Clinical Interviews: Background Information The clinical interview is a technique pioneered by Jean Piaget, in 1975,

More information

on Non-Consensus Protein Motifs Analytical & Formulation Sciences, Amgen. Seattle, WA

on Non-Consensus Protein Motifs Analytical & Formulation Sciences, Amgen. Seattle, WA N-Linked Glycosylation on Non-Consensus Protein Motifs Alain Balland Analytical & Formulation Sciences, Amgen. Seattle, WA CASSS - Mass Spec 2010 Marina Del Rey, CA. September 8 th, 2010 Outline 2 Consensus

More information

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6

Alternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6 Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early

More information

Practice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate?

Practice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate? Life Sciences 1a Practice Problems 3 1. Draw the oligopeptide for Ala-Phe-Gly-Thr-Asp. You do not need to indicate the stereochemistry of the sidechains. Denote with arrows the bonds formed between the

More information

Lecture 3. Tandem MS & Protein Sequencing

Lecture 3. Tandem MS & Protein Sequencing Lecture 3 Tandem MS & Protein Sequencing Nancy Allbritton, M.D., Ph.D. Department of Physiology & Biophysics 824-9137 (office) nlallbri@uci.edu Office- Rm D349 Medical Science D Bldg. Tandem MS Steps:

More information

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing

More information

Chemical Mechanism of Enzymes

Chemical Mechanism of Enzymes Chemical Mechanism of Enzymes Enzyme Engineering 5.2 Definition of the mechanism 1. The sequence from substrate(s) to product(s) : Reaction steps 2. The rates at which the complex are interconverted 3.

More information

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.

You can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study. You can t fix by analysis what you bungled by design. Light, Singer and Willett Or, not as catchy but perhaps more accurate: Fancy analysis can t fix a poorly designed study. Producing Data The Role of

More information

Introduction to proteins and protein structure

Introduction to proteins and protein structure Introduction to proteins and protein structure The questions and answers below constitute an introduction to the fundamental principles of protein structure. They are all available at [link]. What are

More information

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies

Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies PART 1: OVERVIEW Slide 1: Overview Welcome to Qualitative Comparative Analysis in Implementation Studies. This narrated powerpoint

More information

6. The catalytic mechanism of arylsulfatase A and its theoretical investigation

6. The catalytic mechanism of arylsulfatase A and its theoretical investigation 6. The catalytic mechanism of arylsulfatase A and its theoretical investigation When the crystal structure of arylsulfatase A was solved, a remarkable structural analogy to another hydrolytic enzyme, the

More information

Intracellular Compartments and Protein Sorting

Intracellular Compartments and Protein Sorting Intracellular Compartments and Protein Sorting Intracellular Compartments A eukaryotic cell is elaborately subdivided into functionally distinct, membrane-enclosed compartments. Each compartment, or organelle,

More information

Previous Class. Today. Detection of enzymatic intermediates: Protein tyrosine phosphatase mechanism. Protein Kinase Catalytic Properties

Previous Class. Today. Detection of enzymatic intermediates: Protein tyrosine phosphatase mechanism. Protein Kinase Catalytic Properties Previous Class Detection of enzymatic intermediates: Protein tyrosine phosphatase mechanism Today Protein Kinase Catalytic Properties Protein Phosphorylation Phosphorylation: key protein modification

More information

Protein Investigator. Protein Investigator - 3

Protein Investigator. Protein Investigator - 3 Protein Investigator Objectives To learn more about the interactions that govern protein structure. To test hypotheses regarding protein structure and function. To design proteins with specific shapes.

More information

AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS

AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS Elena Rivneac PhD, Associate Professor Department of Biochemistry and Clinical Biochemistry State University of Medicine

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information

Towards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version]

Towards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version] Earth/matriX: SCIENCE TODAY Towards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version] By Charles William Johnson Earth/matriX Editions P.O.

More information

LAB#23: Biochemical Evidence of Evolution Name: Period Date :

LAB#23: Biochemical Evidence of Evolution Name: Period Date : LAB#23: Biochemical Evidence of Name: Period Date : Laboratory Experience #23 Bridge Worth 80 Lab Minutes If two organisms have similar portions of DNA (genes), these organisms will probably make similar

More information

Molecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions.

Molecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions. Chapter 9: Proteins Molecular Biology replication general transfer: occurs normally in cells transcription special transfer: occurs only in the laboratory in specific conditions translation unknown transfer:

More information

CHAPTER 10: REGULATORY STRATEGIES. Traffic signals control the flow of traffic

CHAPTER 10: REGULATORY STRATEGIES. Traffic signals control the flow of traffic CHAPTER 10: REGULATORY STRATEGIES Traffic signals control the flow of traffic INTRODUCTION CHAPTER 10 The activity of enzymes must often be regulated so that they function at the proper time and place.

More information

Enzymes Part III: regulation II. Dr. Mamoun Ahram Summer, 2017

Enzymes Part III: regulation II. Dr. Mamoun Ahram Summer, 2017 Enzymes Part III: regulation II Dr. Mamoun Ahram Summer, 2017 Advantage This is a major mechanism for rapid and transient regulation of enzyme activity. A most common mechanism is enzyme phosphorylation

More information

Complexity DNA. Genome RNA. Transcriptome. Protein. Proteome. Metabolites. Metabolome

Complexity DNA. Genome RNA. Transcriptome. Protein. Proteome. Metabolites. Metabolome DNA Genome Complexity RNA Transcriptome Systems Biology Linking all the components of a cell in a quantitative and temporal manner Protein Proteome Metabolites Metabolome Where are the functional elements?

More information

Amino acids. Side chain. -Carbon atom. Carboxyl group. Amino group

Amino acids. Side chain. -Carbon atom. Carboxyl group. Amino group PROTEINS Amino acids Side chain -Carbon atom Amino group Carboxyl group Amino acids Primary structure Amino acid monomers Peptide bond Peptide bond Amino group Carboxyl group Peptide bond N-terminal (

More information

Biological Mass Spectrometry. April 30, 2014

Biological Mass Spectrometry. April 30, 2014 Biological Mass Spectrometry April 30, 2014 Mass Spectrometry Has become the method of choice for precise protein and nucleic acid mass determination in a very wide mass range peptide and nucleotide sequencing

More information

Self-association of α-chymotrypsin: Effect of amino acids

Self-association of α-chymotrypsin: Effect of amino acids J. Biosci., Vol. 13, Number 3, September 1988, pp. 215 222. Printed in India. Self-association of α-chymotrypsin: Effect of amino acids T. RAMAKRISHNA and M. W. PANDIT* Centre for Cellular and Molecular

More information

Protein Trafficking in the Secretory and Endocytic Pathways

Protein Trafficking in the Secretory and Endocytic Pathways Protein Trafficking in the Secretory and Endocytic Pathways The compartmentalization of eukaryotic cells has considerable functional advantages for the cell, but requires elaborate mechanisms to ensure

More information

Comparison of volume estimation methods for pancreatic islet cells

Comparison of volume estimation methods for pancreatic islet cells Comparison of volume estimation methods for pancreatic islet cells Jiří Dvořák a,b, Jan Švihlíkb,c, David Habart d, and Jan Kybic b a Department of Probability and Mathematical Statistics, Faculty of Mathematics

More information

Trilateral Project WM4

Trilateral Project WM4 ANNEX 2: Comments of the JPO Trilateral Project WM4 Comparative studies in new technologies Theme: Comparative study on protein 3-dimensional (3-D) structure related claims 1. Introduction As more 3-D

More information

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED)

IAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED) IAASB Main Agenda (February 2007) Page 2007 423 Agenda Item 6-A PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED) AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS Paragraph Introduction Scope

More information

Ionization of amino acids

Ionization of amino acids Amino Acids 20 common amino acids there are others found naturally but much less frequently Common structure for amino acid COOH, -NH 2, H and R functional groups all attached to the a carbon Ionization

More information

Short polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer

Short polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer HO 1 2 3 H HO H Short polymer Dehydration removes a water molecule, forming a new bond Unlinked monomer H 2 O HO 1 2 3 4 H Longer polymer (a) Dehydration reaction in the synthesis of a polymer HO 1 2 3

More information

The Structure and Function of Macromolecules

The Structure and Function of Macromolecules The Structure and Function of Macromolecules Macromolecules are polymers Polymer long molecule consisting of many similar building blocks. Monomer the small building block molecules. Carbohydrates, proteins

More information

Review of Biochemistry

Review of Biochemistry Review of Biochemistry Chemical bond Functional Groups Amino Acid Protein Structure and Function Proteins are polymers of amino acids. Each amino acids in a protein contains a amino group, - NH 2,

More information

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA

Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.

More information

Copyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings

Copyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings Concept 5.4: Proteins have many structures, resulting in a wide range of functions Proteins account for more than 50% of the dry mass of most cells Protein functions include structural support, storage,

More information

CHAPTER 21: Amino Acids, Proteins, & Enzymes. General, Organic, & Biological Chemistry Janice Gorzynski Smith

CHAPTER 21: Amino Acids, Proteins, & Enzymes. General, Organic, & Biological Chemistry Janice Gorzynski Smith CHAPTER 21: Amino Acids, Proteins, & Enzymes General, Organic, & Biological Chemistry Janice Gorzynski Smith CHAPTER 21: Amino Acids, Proteins, Enzymes Learning Objectives: q The 20 common, naturally occurring

More information

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße

More information

observational studies Descriptive studies

observational studies Descriptive studies form one stage within this broader sequence, which begins with laboratory studies using animal models, thence to human testing: Phase I: The new drug or treatment is tested in a small group of people for

More information

2. Which of the following amino acids is most likely to be found on the outer surface of a properly folded protein?

2. Which of the following amino acids is most likely to be found on the outer surface of a properly folded protein? Name: WHITE Student Number: Answer the following questions on the computer scoring sheet. 1 mark each 1. Which of the following amino acids would have the highest relative mobility R f in normal thin layer

More information

Christine Vogel 1, Edward M. Marcotte 1 *

Christine Vogel 1, Edward M. Marcotte 1 * CALCULATING ABSOLUTE PROTEIN ABUNDANCE FROM MASS SPECTROMETRY BASED PROTEIN EXPRESSION DATA - SUPPLEMENTARY NOTES Christine Vogel 1, Edward M. Marcotte 1 * 1 Center for Systems and Synthetic Biology, Institute

More information

Statin inhibition of HMG-CoA reductase: a 3-dimensional view

Statin inhibition of HMG-CoA reductase: a 3-dimensional view Atherosclerosis Supplements 4 (2003) 3/8 www.elsevier.com/locate/atherosclerosis Statin inhibition of HMG-CoA reductase: a 3-dimensional view Eva Istvan * Department of Molecular Microbiology, Howard Hughes

More information

Lipids: diverse group of hydrophobic molecules

Lipids: diverse group of hydrophobic molecules Lipids: diverse group of hydrophobic molecules Lipids only macromolecules that do not form polymers li3le or no affinity for water hydrophobic consist mostly of hydrocarbons nonpolar covalent bonds fats

More information

Catalysis & specificity: Proteins at work

Catalysis & specificity: Proteins at work Catalysis & specificity: Proteins at work Introduction Having spent some time looking at the elements of structure of proteins and DNA, as well as their ability to form intermolecular interactions, it

More information

Classification of amino acids: -

Classification of amino acids: - Page 1 of 8 P roteinogenic amino acids, also known as standard, normal or primary amino acids are 20 amino acids that are incorporated in proteins and that are coded in the standard genetic code (subunit

More information

1. Describe the relationship of dietary protein and the health of major body systems.

1. Describe the relationship of dietary protein and the health of major body systems. Food Explorations Lab I: The Building Blocks STUDENT LAB INVESTIGATIONS Name: Lab Overview In this investigation, you will be constructing animal and plant proteins using beads to represent the amino acids.

More information

Peptide hydrolysis uncatalyzed half-life = ~450 years HIV protease-catalyzed half-life = ~3 seconds

Peptide hydrolysis uncatalyzed half-life = ~450 years HIV protease-catalyzed half-life = ~3 seconds Uncatalyzed half-life Peptide hydrolysis uncatalyzed half-life = ~450 years IV protease-catalyzed half-life = ~3 seconds Life Sciences 1a Lecture Slides Set 9 Fall 2006-2007 Prof. David R. Liu In the absence

More information

Context of Best Subset Regression

Context of Best Subset Regression Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance

More information

Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based Models

Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based Models Int J Comput Math Learning (2009) 14:51 60 DOI 10.1007/s10758-008-9142-6 COMPUTER MATH SNAPHSHOTS - COLUMN EDITOR: URI WILENSKY* Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based

More information

How to interpret results of metaanalysis

How to interpret results of metaanalysis How to interpret results of metaanalysis Tony Hak, Henk van Rhee, & Robert Suurmond Version 1.0, March 2016 Version 1.3, Updated June 2018 Meta-analysis is a systematic method for synthesizing quantitative

More information

Biochemistry Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture -02 Amino Acids II

Biochemistry Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture -02 Amino Acids II Biochemistry Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology Kharagpur Lecture -02 Amino Acids II Ok, we start off with the discussion on amino acids. (Refer Slide Time: 00:48)

More information

2013 John Wiley & Sons, Inc. All rights reserved. PROTEIN SORTING. Lecture 10 BIOL 266/ Biology Department Concordia University. Dr. S.

2013 John Wiley & Sons, Inc. All rights reserved. PROTEIN SORTING. Lecture 10 BIOL 266/ Biology Department Concordia University. Dr. S. PROTEIN SORTING Lecture 10 BIOL 266/4 2014-15 Dr. S. Azam Biology Department Concordia University Introduction Membranes divide the cytoplasm of eukaryotic cells into distinct compartments. The endomembrane

More information

Bio 366: Biological Chemistry II Test #1, 100 points (7 pages)

Bio 366: Biological Chemistry II Test #1, 100 points (7 pages) Bio 366: Biological Chemistry II Test #1, 100 points (7 pages) READ THIS: Take a numbered test and sit in the seat with that number on it. Remove the numbered sticker from the desk, and stick it on the

More information

The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5

The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5 Key Concepts: The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5 Proteins include a diversity of structures, resulting in a wide range of functions Proteins Enzymatic s

More information

Binary Diagnostic Tests Two Independent Samples

Binary Diagnostic Tests Two Independent Samples Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

More information

Effects of Second Messengers

Effects of Second Messengers Effects of Second Messengers Inositol trisphosphate Diacylglycerol Opens Calcium Channels Binding to IP 3 -gated Channel Cooperative binding Activates Protein Kinase C is required Phosphorylation of many

More information

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination # 5: Section Five May 7, Name: (print)

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination # 5: Section Five May 7, Name: (print) Moorpark College Chemistry 11 Fall 2013 Instructor: Professor Gopal Examination # 5: Section Five May 7, 2013 Name: (print) Directions: Make sure your examination contains TEN total pages (including this

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

PAPER No. : 16, Bioorganic and biophysical chemistry MODULE No. : 22, Mechanism of enzyme catalyst reaction (I) Chymotrypsin

PAPER No. : 16, Bioorganic and biophysical chemistry MODULE No. : 22, Mechanism of enzyme catalyst reaction (I) Chymotrypsin Subject Paper No and Title 16 Bio-organic and Biophysical Module No and Title 22 Mechanism of Enzyme Catalyzed reactions I Module Tag CHE_P16_M22 Chymotrypsin TABLE OF CONTENTS 1. Learning outcomes 2.

More information

Predicting antigenic sites of hemagglutinin-neuraminidase glycoprotein Newcastle disease virus

Predicting antigenic sites of hemagglutinin-neuraminidase glycoprotein Newcastle disease virus Romanian Biotechnological Letters Vol. 14, No.4, 2009, pp. 4589-4596 Copyright 2008 Bucharest University Printed in Romania. All rights reserved Romanian Society of Biological Sciences ORIGINAL PAPER Predicting

More information

Mutations and Disease Mutations in the Myosin Gene

Mutations and Disease Mutations in the Myosin Gene Biological Sciences Initiative HHMI Mutations and Disease Mutations in the Myosin Gene Goals Explore how mutations can lead to disease using the myosin gene as a model system. Explore how changes in the

More information

Lecture 15. Signal Transduction Pathways - Introduction

Lecture 15. Signal Transduction Pathways - Introduction Lecture 15 Signal Transduction Pathways - Introduction So far.. Regulation of mrna synthesis Regulation of rrna synthesis Regulation of trna & 5S rrna synthesis Regulation of gene expression by signals

More information

Chapter 3: Amino Acids and Peptides

Chapter 3: Amino Acids and Peptides Chapter 3: Amino Acids and Peptides BINF 6101/8101, Spring 2018 Outline 1. Overall amino acid structure 2. Amino acid stereochemistry 3. Amino acid sidechain structure & classification 4. Non-standard

More information

Extraction of tumor regions keeping boundary shape information from chest X-ray CT images and benign/malignant discrimination

Extraction of tumor regions keeping boundary shape information from chest X-ray CT images and benign/malignant discrimination Extraction of tumor regions keeping boundary shape information from chest X-ray CT images and benign/malignant discrimination Yasushi Hirano a, Jun-ichi Hasegawa b, Jun-ichiro Toriwaki a, Hironobu Ohmatsu

More information

SEQUENCE FEATURE VARIANT TYPES

SEQUENCE FEATURE VARIANT TYPES SEQUENCE FEATURE VARIANT TYPES DEFINITION OF SFVT: The Sequence Feature Variant Type (SFVT) component in IRD (http://www.fludb.org) is a relatively novel approach that delineates specific regions, called

More information

BIO 311C Spring Lecture 15 Friday 26 Feb. 1

BIO 311C Spring Lecture 15 Friday 26 Feb. 1 BIO 311C Spring 2010 Lecture 15 Friday 26 Feb. 1 Illustration of a Polypeptide amino acids peptide bonds Review Polypeptide (chain) See textbook, Fig 5.21, p. 82 for a more clear illustration Folding and

More information

Molecular Graphics Perspective of Protein Structure and Function

Molecular Graphics Perspective of Protein Structure and Function Molecular Graphics Perspective of Protein Structure and Function VMD Highlights > 20,000 registered Users Platforms: Unix (16 builds) Windows MacOS X Display of large biomolecules and simulation trajectories

More information

The Pretest! Pretest! Pretest! Assignment (Example 2)

The Pretest! Pretest! Pretest! Assignment (Example 2) The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability

More information

Contents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types...

Contents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types... Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium http://www.computationalproteomics.com icelogo manual Niklaas Colaert

More information

Does scene context always facilitate retrieval of visual object representations?

Does scene context always facilitate retrieval of visual object representations? Psychon Bull Rev (2011) 18:309 315 DOI 10.3758/s13423-010-0045-x Does scene context always facilitate retrieval of visual object representations? Ryoichi Nakashima & Kazuhiko Yokosawa Published online:

More information

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project

Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene

More information

Choose an approach for your research problem

Choose an approach for your research problem Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important

More information

Simple Cancer Screening Based on Urinary Metabolite Analysis

Simple Cancer Screening Based on Urinary Metabolite Analysis FEATURED ARTICLES Taking on Future Social Issues through Open Innovation Life Science for a Healthy Society with High Quality of Life Simple Cancer Screening Based on Urinary Metabolite Analysis Hitachi

More information

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello

More information

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY 2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn

More information

CHM 341 C: Biochemistry I. Test 2: October 24, 2014

CHM 341 C: Biochemistry I. Test 2: October 24, 2014 CHM 341 C: Biochemistry I Test 2: ctober 24, 2014 This test consists of 14 questions worth points. Make sure that you read the entire question and answer each question clearly and completely. To receive

More information

Statistically Optimized Biopsy Strategy for the Diagnosis of Prostate Cancer

Statistically Optimized Biopsy Strategy for the Diagnosis of Prostate Cancer Statistically Optimized Biopsy Strategy for the Diagnosis of Prostate Cancer Dinggang Shen 1, Zhiqiang Lao 1, Jianchao Zeng 2, Edward H. Herskovits 1, Gabor Fichtinger 3, Christos Davatzikos 1,3 1 Center

More information

CHM333 LECTURE 6: 1/25/12 SPRING 2012 Professor Christine Hrycyna AMINO ACIDS II: CLASSIFICATION AND CHEMICAL CHARACTERISTICS OF EACH AMINO ACID:

CHM333 LECTURE 6: 1/25/12 SPRING 2012 Professor Christine Hrycyna AMINO ACIDS II: CLASSIFICATION AND CHEMICAL CHARACTERISTICS OF EACH AMINO ACID: AMINO ACIDS II: CLASSIFICATION AND CHEMICAL CHARACTERISTICS OF EACH AMINO ACID: - The R group side chains on amino acids are VERY important. o Determine the properties of the amino acid itself o Determine

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Molecular Medicine: Gleevec and Chronic Myelogenous Leukemia. Dec 14 & 19, 2006 Prof. Erin O Shea Prof. Dan Kahne

Molecular Medicine: Gleevec and Chronic Myelogenous Leukemia. Dec 14 & 19, 2006 Prof. Erin O Shea Prof. Dan Kahne Molecular Medicine: Gleevec and Chronic Myelogenous Leukemia Dec 14 & 19, 2006 Prof. Erin Shea Prof. Dan Kahne 1 Cancer, Kinases and Gleevec: 1. What is CML? a. Blood cell maturation b. Philadelphia Chromosome

More information