Prediction of In- Vivo Modification Sites of Proteins from Their Primary Structures
|
|
- Cameron Small
- 5 years ago
- Views:
Transcription
1 J, Biochem. 104, (1988) Prediction of In- Vivo Modification Sites of Proteins from Their Primary Structures Kenta Nakai and Minoru Kanehisa' Institute for Chemical Research, Kyoto University, Uji, Kyoto 611 Received for publication, June 13, 1988 In order to make better use of the information contained in rapidly expanding amino acid sequence data, a new method to predict various modification sites of proteins from their primary structures is presented. It is also applicable to the prediction of other functional sites in proteins. Here we show the examples of N-glycosylation and serine/threonine phosphorylation sites. The method is essentially an elaboration of consensus sequence pattern matching based on stepwise discriminant analysis. The occurring amino acids near a potential modification site are represented by six numerical values which reflect various properties of amino acids. Longer-range effects around these sites are also considered. The stepwise procedure enabled us to automatically select effective features for discrimination. A computer program with our method first identifies potential modification sites by a sequence pattern, NX(S/T) for N-glycosylation or (S/T) for phosphorylation, and then decides by discriminant analysis whether a potential site is likely to be a true modification site. The prediction accuracy in the second step of discrimination was about 60% for glycosylation sites and about 80% for phosphorylation sites. The rapid growth of the quantity of DNA sequence data is one of the most remarkable events in biosciences today. The size of the GenBank database is doubling almost every year, but this is likely to be accelerated further as the sequencing of large genomes proceeds. Although various physiological phenomena in living systems are ultimately encoded in genomic DNA, sequencing of genes does not necessarily mean their clarification. In this respect, what we can learn from sequence data is still very poor and insufficient. One of the established methods for obtaining insights into higher structural and functional properties of proteins from sequence data is to search for homologous sequences in the databases. In many cases, however, strong homologies are absent and the significance of weak homologies is difficult to assess. It is also customary to perform secondary structure predictions from the amino acid sequence. But their accuracies are too low to be useful in practice. Despite these limited success rates, computational methods are frequently used and there seems to be a great demand on new sequence analysis methods which make up for existing methods. In this paper we propose a new approach toward the prediction of protein functional sites from its primary structure. Suppose we have to locate, in a newly determined sequence, a DNA binding site, a calcium binding site, and an in- vivo modification site, by sequence analysis. The ordinary way to do this is to use the pattern matching against the so-called consensus sequence. For instance, when there is an N-X-(S or T) pattern in the sequence (here X denotes any amino acid), the site is called a potential glycosylation site (1). The consensus sequence of Tufty and Kretsinger is also well-known with regards to calcium To whom correspondence should be addressed. Abbreviation: PKA, camp-dependent protein kinase, binding sites (2). However, the consensus is basically a short, weak sequence pattern, and when the whole amino acid sequence database is searched, there are usually many additional sites matching only by chance. Our approach is similar to the consensus matching method in the sense that it is also based on the identification of common features among related sites. However, it contains three new elements which will overcome the problems inherent in the simple pattern matching approach. They are: (i) preparation of data set, (ii) numerical representation of amino acid sequences, and (iii) objective selection of features. First, the data set is prepared with practical applications in mind. Suppose one wishes to locate an N-glycosylation site, he or she will no doubt use the NX(S/T) consensus first. The question is therefore: is it possible to discriminate true glycosylation sites from false analogues merely having the NX(S/T) pattern? The consensus sequence is derived only from a set of true sites; there is no control group of false sites. In contrast, our approach uses data sets of both true and false sites. This is the most critically different aspect of our approach. By comparing the patterns of true and false sites, it is possible to establish a practical criterion for discrimination. Second, common features in the sequence may be too weak to be represented by a limited number of characters. This is another drawback of consensus pattern matching. Because the amino acids are represented by characters, they can be either matched or not, although it is possible to include multiple matches. In reality, however, each amino acid has many characteristics which cannot be represented by binary digits. There have been a number of reports on so-called amino acid indices which reflect various aspects of amino acid residues, as summarized in our previous analysis (3). We introduce the numerical representation of Vol No. 5, y'1
2 694 K. Nakai and M. Kanehisa amino acids selected from our database of over 200 published amino acid indices. The numerical representation is also convenient for incorporating environmental factors, namely, long-range effects of residues surrounding the consensus. Thus, we should be able to combine weak signals within and outside of the consensus to improve the accuracy of prediction. Third, the derivation of a consensus sequence is usually based on visual inspection and tends to be subjective. In contrast, we use an objective approach, the stepwise variable selection method by discriminant analysis, which enables us to select features automatically. Recently, another automatic approach for feature detection was developed and applied to discrimination of protein secondary structural segments (4). These automatic procedures are to be compared with our previous method by discriminant analysis (5) where the variables were selected manually. In this paper, we apply the stepwise variable selection method to the prediction of in-vivo modification sites of proteins, especially the N-glycosylation sites (1) and serine/threonine phosphorylation sites (6). MATERIALS AND METHODS Collection of Data-In order to distinguish true modification sites from false analogues, it is necessary to collect both groups of sequence data. False analogues, in our definition, contain sequence patterns similar to true sites, the NX(S/T) consensus pattern in N-glycosylation sites and the (S/T) residue in Ser/Thr phosphorylation sites. False analogues form a control group in the sense that-they are assumed to contain different, or possibly negative, patterns other than the consensus specified above. In contrast, true modification sites are expected to contain positive patterns around the consensus. The sequences around truly modified sites were mainly collected from the NBRF-PIR database (7) Release 13.0 (June 1987). While the positions of the modified sites are reported in the database, there is no description of the sites which are not modified. Therefore, in the sequences reported to have real modification sites, other potential sites are assumed to be non-modified. Strictly speaking, this assumption may not be correct. But when the data set is sufficiently large, errors are expected to be negligible. In the case of N-glycosylation, the data set for false analogues was relatively small as the non-modified sites were selected from glycoproteins, namely, proteins that have at least one modified NX(S/T) pattern. Thus, we also collected data from another source. In 1970 Hunt and Dayhoff (8) studied glycosylated and non-glycosylated sites in their sequence database. We used their compilation of non-glycosylated sites. Because all of these sites existed in non-glycosylated proteins, there was no overlap with the sites collected from the NBRF database. As for the Ser/Thr phosphorylation sites, another level of complication exists. The information concerning the type of enzyme, i.e., protein kinase, that causes phosphorylation is not usually reported in the NBRF database. On the other hand, there are many protein kinases identified in vivo and each kinase seems to have its own substrate specificity (9). Engstrom et al. (10) compiled a list of the in-vitro phosphorylation sites of known kinases. We use their substrate data for the camp-dependent protein kinase (PKA), which is the most extensive. Stepwise Discriminant Analysis-We give a minimum description of the stepwise discriminant analysis here. For further details, see appropriate textbooks (Ref. 11, for example). Discriminant analysis makes it possible to allocate an individual to one of the pre-defined groups on the basis of measurements or values of given variables. It uses the so-called discriminant function for allocation, which is a linear combination of variables with coefficients optimized to best discriminate the given groups of data, i.e., the training set. In our analysis, discrimination is usually performed between two groups, the sequence data of modified and non-modified sites. However, when we analyze the Ser/Thr phosphorylation sites, we also perform a discrimination between the three groups by further dividing modified sites into two groups. Starting from the repertoire of numerous variables, we wish to select a minimum set of variables that gives a sufficiently good discrimination. Generally speaking, it is desirable to combine many variables in order to obtain better discrimination. However, because the number of possible combinations in the repertoire is enormous, it is impractical to test all of them. Moreover, when a new variable is added for discrimination, it may not actually contribute significantly although the accuracy will not decrease. In fact, even if a variable works quite well by itself, it may not work as well in combination with other variables. Since there is no way to tell beforehand whether the added variable is going to contribute effectively, a procedure called the stepwise discriminant analysis is applied. This procedure has another implication; since we prepare the repertoire as a comprehensive set of variables representing various aspects of sequence data, it plays, in effect, the role of feature detection. The stepwise discriminant analysis examines all candidates and enters them into or removes them from the discriminant function one-by-one, according to a pre-established criterion, until a stopping condition is satisfied. In practice, we used the program package FACOM/ ANALYST to perform these analyses (12). Estimation of Prediction Efficiency-Once the discriminant function is determined from the training set, it can be applied to an individual outside the training set. This we call prediction. The efficiency of prediction is, in general, worse than the efficiency of self-discrimination which is the discrimination of individuals within the training set. One way to estimate the prediction efficiency is to divide the available data into the training set and the test set. However, because the data set is rather small in our case, it may not be enough to make a reliable estimation. Thus we use an alternative method, the U-method of jack-knifing, defined as follows (13) : Extract an individual from the sample data and calculate the discriminant function based on the remainder, which is then used to predict the extracted one. By repeating this procedure for each individual in the data set, the average degree of prediction is calculated. This is the estimated prediction efficiency. Variables for Discrimination-The sample data collected as described above are aligned at the possible modification site, the Asn residue for N-glycosylation and the Ser or Thr residue for phosphorylation, without any insertion or deletion. As shown in Fig. 1, we consider a segment of 31 residues containing 15 residues each on both sides of the J. Biochem.
3 Prediction of Modification Sites from Amino Acid Sequences 695 modification site. The relative position within this segment is specified by the numbering shown: 0 for the modification site, positive numbers toward the carboxyl terminus, and negative numbers toward the amino terminus. From the aligned sequence data set we will quantify the importance of respective positions in the 31 residue segment. For this purpose we introduce numerical representations of the amino acid sequence. The 20 naturally occurring amino acids have various physicochemical and biochemical properties which are supposed to play important roles in the manifestation of biological functions in proteins. These properties are often quantified by numerical values called amino acid indices. In our previous analysis (3), we classified 222 published amino acid indices into several categories. We use the following indices, which are representative ones from different categories, in the present analysis: a: a-propensity by Robson and Suzuki (14), f3: /3-propensity by Kanehisa and Tsong (15), Turn: f8-turn propensity by Chou and Fasman (16), Hail: hydrophobicity by Eisenberg (17), Hqf 2: hydrophobicity by Fauchere and Pliska (18), Size: residue size by Chothia (19). a, fl, and turn propensities are the preference of amino acid residues to be in these secondary structure conformations. The two hydrophobicity indices represent somewhat different aspects: HO 1 belongs to a category of hydrophobicity scales for amino acids in proteins derived from the analysis of X-ray crystal structures, while Hdi2, the partition energy of amino acids, is derived from experimental measurements of free amino acids. The last one is the physical size of an amino acid residue. We define the variables for discriminant analysis as follows (see Fig. 1): X(1)-X(11): the values of numerical indices at each position at and 5 neighboring residues around the modification site X(12), X(13): the averages of numerical indices over 1o residues, the 6th through 15th neighboring residues X(14), X(15): the averages of numerical indices over 5 immediately neighboring residues N: the total number of residues R: the relative position from the amino terminus The position dependence of a numerical index is represented by X(i) where i corresponds to a residue or a segment of residues. For example, the value of a at position -2 is called a(4). Thus, because we use 6 different numerical indices, there are, in total, 92 variables for consideration, including the total number of residues N and the relative location of the modification site R. The stepwise discriminant analysis then selects a smaller number of optimized variables with position-specific weightings. Note that the variables at position 0 will have any relevance only in the case of Ser/Thr phosphorylation. When the possible modification site is located in the vicinity of the N or C terminus and some portion of the neighboring 15 residues in either direction does not exist, we simply insert blanks as special amino acids which have 0 values for all properties. RESULTS N-Glycosylation-The number of N-glycosylation sites collected from the NBRF database was 394. Since many glycoproteins had multiple glycosylation sites, the number of proteins was 177. The control group from the NBRF database had 80 elements in 54 proteins, which are potential, but false, glycosylation sites in glycoproteins. In addition, the data of Hunt and Dayhoff contained 59 sites in 32 proteins, as false sites in non-glycosylated proteins. It seems to be widely accepted that the occurrence of the NX(S/T) sequence is necessary for N-glycosylation. However, in the NBRF database, there were four exceptions. Three NXC patterns were found in the N-glycosylation sites of protein C precursor (human and bovine) and of von Willebrand factor precursor (human). Apparently, cysteine can take the place of serine or threonine in the N-glycosylation reaction (20). An NGG pattern was found in immunoglobulin heavy chain V regions (mouse). These data were excluded from our analysis. We prepared the following two training sets. Training set 1: both of the N-glycosylated and nonglycosylated sites are from the NBRF database. TABLE I. Results of discriminant analysis for N-glycosylation sites. Fig. 1. The definition of variables around a potential modification site. The numbering of residues is: 0 for the site, negative toward the amino end, and positive toward the carboxyl end. One of the six numerical indices, a, 8, turn, H(b1, H02, and size, is represented by X. The value of X at the site is denoted by X(6), and of each of the five immediate neighbors on both sides is denoted by X(1) through X(5) and X(7) through X(11). The mean values of X are defined as illustrated. Vol No. 5, 1988
4 696 K. Nakai and M. Kanehisa TABLE II. Selected variables for the discrimination of N-glycosylation sites, a Variable number defined in Fig. 1. TABLE III. Results of discriminant analysis for Ser/Thr phosphorylation sites. TABLE IV. Selected variables for the discrimination of Ser/Thr phosphorylation sites. Training set 2: the non-glycosylated sites from Hunt- Dayhoff's compilation were added to training set 1. According to the stepwise discriminant analysis, a set of optimized variables was selected first for training set 1. Here the stepwise procedure was stopped at the 18th step because remaining variables no longer contributed effectively to discrimination. Table I(a) shows the result of self-discriminating the training set itself by the 18 variables. The rows and columns correspond to the two groups to be discriminated and to be allocated, respectively. The number of individuals allocated from either of the two groups to either of the two groups is shown both in actual numbers and in percentages. The unweighted average of diagonal percentages is regarded as a measure of discriminant accuracy. In this case, it is about 74%. As the ANALYST program excluded some individuals automatically, the total number of the allocated data is somewhat smaller than the number of initially collected data. Next, Hunt-Dayhoff's data for non-glycosylated sites were combined with the data from the NBRF database (training set 2). As the control group became larger, we hoped to obtain a more reliable result. However, as shown in Table I(b), which is the result of self-discrimination with another optimized set of 18 variables, the accuracy was somewhat lower. We adopted the U-method of jack-knifing in order to J. Biochem.
5 Prediction of Modification Sites from Amino Acid Sequences 697 TABLE V. Prediction of camp-dependent phosphorylation sites in bovine a-crystallin A chain. estimate the prediction accuracy when the discriminant function is applied to unknown data, as shown in Table I(c). Training set 1 was used here. The ability to discriminate between true and false sites which are outside the training set is around 60% in contrast to the self-discrimination of over 70% (Table I(a)). The 18 variables selected in the stepwise discriminant analysis using training set 1 are summarized in Table II. The selected variables are represented by circles. The ones selected in the first 3 steps are denoted by double circles. Roughly speaking, the variables selected earlier are considered to give more important contributions. We have also examined the variables selected in the analysis with training set 2 and in the jack-knifing procedure (data not shown). While the correspondence between these cases was not perfect, the relative position R, the fl-turn propensity at position + 2, and the a propensity and size averaged over the region are conserved relatively well, suggesting that environmental factors are better determinants than site-specific factors. Ser/Thr Phosphorylation-The number of Ser/Thr phosphorylation sites collected from the NBRF database was 66 in 30 proteins. In contrast, the number of all other serines and threonines in these proteins amounted to 847, which formed the control group of non-phosphorylated sites. The amino acid distributions at positions around the Ser/Thr site did not show any significant bias in the control group (data not shown). Because half of the collected phosphorylation sites (33 sites) were contained in caseins, and because the phosphorylation of caseins appeared to be somewhat different (see "DISCUSSION"), we also performed discrimination among three groups: phosphorylation sites in caseins, phosphorylation sites in other proteins, and non-phosphorylated sites. The number of substrate sites for camp-dependent protein kinase (PKA) from the compilation of Engstrom et al. (10) was 21 in 17 proteins, including peptide fragments. Three of them were also in the NBRF data. The control group sites were collected from the NBRF database in these 17 proteins and amounted to 526. With these data we prepared training sets as follows: Training set 1: phosphorylated and non-phosphorylated sites from the NBRF database Training set 2: the phosphorylated sites in training set 1 are divided into casein sites and non-casein sites Training set 3: phosphorylated and non-phosphorylated sites in PKA substrate proteins from the compilation of Engstrom et al. The discrimination result for training set 1 is shown in Table III(a) in the same way as in Table I(a). The stepwise procedure was stopped at the 15th step because an overflow occurred at step 20. The discrimination accuracy of the training set was about 88%. The result of three-group discrimination with training set 2 is shown in Table 111(b). Twenty steps were executed. It can be seen that to distinguish the sites in caseins is easier than to distinguish other ones. However, the overall accuracy, the average of the three diagonal values, did not improve. The discrimination result of PKA substrate sites using training set 3 is shown in Table III(c). Because there were some proteins whose sequences were not determined completely, the global variables N and R were omitted in this case, from the repertoire of variables to be selected. Almost perfect discrimination was possible, although parameter fitting against a small number of data often becomes too specific to the original data. The stepwise procedure was stopped at the 20th step. Table III(d) shows the accuracy of predicting unknown Ser/Thr phosphorylation sites estimated by the U-method of jack-knifing. Training set 1 was used. Compared with the result of self-discrimination (Table III(a)), the ability to identify true modification sites dropped about 10% because more true sites were classified as false, while the ability to identify false sites was about the same. As the overall accuracy is still high, the derived discriminant function seems to be of practical use. The prediction accuracy with training set 3 for PKA substrates was also calculated. It decreased drastically when compared with the selfdiscrimination accuracy; only 16 out of 21 true sites (76%) were correctly predicted. This is probably due to the too-specific nature of the variables optimized for the small data set. The variables selected in the discrimination of training sets 2 and 3 are summarized in Tables IV(a) and IV(b), respectively. The result with training set 1 was similar to Table IV(a). The symbols used here are the same as those used in Table II. In Table IV(a) no strong tendency for selecting any amino acid property at any position can be recognized. But it seems that the information about the position + 2 is relatively important. It may reflect the fact that, in many cases, the phosphorylated Ser/Thr is separated from a basic amino acid (Lys or Arg) by one residue (21). V,,' 104, No. 5, 1988
6 698 K. Nakai and M. Kanehisa Note that the variable at position 0 is also selected. It shows that the preference for phosphorylation differs between serines and threonines. In Table IV(b) we can see that the N-terminus side of the modified site, especially from -2 to -4, is essential for discrimination. In addition, hydrophobicity and 8-propensity seem to be relatively more important (see "DISCUSSION" for more details). DISCUSSION We have considered determinants for modification sites mostly from the point of view of protein structures as they are encoded in the amino acid sequence. For example, when a certain modification occurs post-translationally, it is unlikely to involve buried residues. Thus, the consensus sequence is a result of molecular interaction between the modifying enzyme and the modified protein. As pointed out by Wold (22), there exist at least two other types of determinants. Namely, the compartment location of the enzymes involved in processing and the temporal changes in substrate structures during biosynthesis and transport. It will be interesting to see whether these factors can also be identifiable from the amino acid sequence data. For example, the first factor may be related to localization signals in the amino acid sequence like signal sequences, although it will be difficult to identify all the enzymes in the compartment from sequence information alone. The second factor must also be encoded ultimately in the amino acid sequence, although its clarification would be as difficult as the problem of protein folding. Whatever the actual mechanism is, the present approach is an empirical one identifying characteristic features in the amino acid sequence around modified sites. These features have practical implications for use in prediction, as well as providing insights into the molecular mechanisms of modification reactions. Hunt and Dayhoff (8) examined the occurrences of NX(S/T) patterns and bound carbohydrates in their collection of amino acid sequences. Not more than 20 of the 101 NX(S/T) sites in their collection were N-glycosylated. While they observed that the occurrence of the NX(S/T) pattern was much less frequent than that of similar patterns, such as (S/T)XN and N(S/T), they occur as frequently, in the present NBRF database. It is possible that our data set will turn out to contain such a bias in the future. We examined if any family of closely related proteins occupied most of the data, but no serious overlaps were found. More recently Mononen and Karjalainen (23) collected and analyzed possible N-glycosylation sites. In their collection, 139 out of 196 sites were glycosylated; namely, most potential sites were actually modified, which agrees with our result. Of course, it is premature to conclude that this reflects the natural ratio of glycosylated to non-glycosylated sites in general. However, it seems possible to say that in glycoproteins many of the possible sites are actually glycosylated. Mononen and Karjalainen could not find marked differences in sequence patterns between glycosylated and non-glycosylated sites, and they suggested that the proposed 8-turn structure (24) was not a determinant as far as it was estimated by the secondary structure prediction. Not only these earlier results, but also ours suggest the difficulty of predicting N-glycosylation sites precisely. The estimated accuracy of predicting unknown data is about 60%. However, inspection of commonly selected variables still seems to reveal some interesting features. First, the variable selected at position + 2 was 8-turn propensity, which suggests a difference in inclination to be glycosylated between the NXT and NXS patterns. Although there is a natural difference in the occurrences of serines and threonines, the difference of the modification tendency is much stronger. Second, the fact that the global variables are selected relatively well suggests a wider interaction range involved. Perhaps the structure surrounding a potential glycosylation site during the elongation process is, at least, as important as the recognition sequence of the enzyme catalyzing the reaction, since the modification occurs cotranslationally. Furthermore, the strong selection of variable R implies the importance of relative locations in the primary structure. Generally speaking, N-glycosylation sites appear more frequently near amino terminal regions than carboxyl terminal regions, which may be interpreted as glycosylation reactions occurring at earlier stages in peptide synthesis. The Ser/Thr phosphorylation is often used in the turning on or off of various kinds of molecular mechanisms with physiological significance. The reaction is somewhat unstable and modified sites may be different under different conditions in vitro, which makes it difficult to evaluate the collected data. In addition, there are different classes of protein kinases in vivo and each has its own substrate specificity. In this sense, phosphorylation sites constitute a set of different components and a unified prediction of all types of Ser/Thr phosphorylation sites may be difficult. On the other hand, because of its biological importance, the amount of data available is relatively large for phosphorylation sites,which makes it suitable to treat with empirical approaches. Despite different classes, many protein kinase actually have a common character in their substrate specificities; they usually require basic residues near the acceptor sites. Williams (21) pointed out that phosphorylated sites were frequently separated from lysine or arginine by one amino acid. This still seems to hold to some extent. However, according to our analysis of amino acid compositions, the existence of basic residues was not so outstanding. Furthermore, there were also quite a few examples of basic residues existing near a Ser/Thr residue which was actually not phosphorylated. Thus, it is difficult to discriminate a phosphorylation site from the neighboring basic residues only. The results for phosphorylation sites turned out to be relatively good. The accuracy was close to 90% for selfdiscriminating the training set and over 80% for predicting unknown data. Because the phosphorylation sites in caseins occupy one half of our data set and because the phosphorylation of caseins appears to have somewhat different characters, they were then treated separately. However, the separation did not raise the accuracy. Casein kinases prefer acidic residues around acceptor sites while many other protein kinases prefer basic residues as substrate specificities. It is therefore surprising that a single discriminant function can deal with both basic and acidic residues effectively. Possibly, it reflects a common mechanism of phosphorylation reactions. Indeed there is another J. Biorhem.
7 Prediction of Modification Sites from Amino Acid Sequences 699 class of protein kinases, namely, tyrosine kinases, which also seem to require acidic residues and share conserved sequences (25). The variables selected for Ser/Thr phosphorylation sites were more or less dispersed over various positions and properties (Table IV(a)). However, careful inspection may suggest the importance of positions + / - 2 and - 4, which correspond well to the positions suggested by Williams (21). In contrast, the pattern of selected variables in the discrimination of PKA substrates was quite different (Table IV(b)). The pattern suggests the importance of the region toward the amino end, especially positions -2 to -4. This coincides with suggestions from experiments using peptide analogues where the substrate specificity of PKA was analyzed (26). In order to demonstrate how to utilize our method, we show an example of the prediction of camp-dependent phosphorylation sites in bovine a -crystalline A chain. The result of running our program is shown in Table V. It first locates potential sites by simple sequence pattern matching; in the present case all serines and threonines in the sequence were treated as potential phosphorylation sites. Then the program evaluated their potency by the discriminant function derived from training set 3 of the Ser/ Thr phosphorylation sites. As shown in Table V, three sites were predicted to be phosphorylated and one of them (Ser 122) was reported to be an actual phosphorylation site by Voorter et al. (27). The sequence pattern preceding the phosphorylated serine is RRYRLPS, while the proposed recognition sequence of PKA is either KRXXS or RRXS (26). In this paper we tried to predict two types of modification sites from the amino acid sequence. However, our method is also applicable to other functional sites of proteins, once proper data sets are prepared for both true and false functional sites. In reality, however, it requires a major effort to prepare a well-verified control group, simply because there seldom exist experimental data about sites which are, for example, not modified in vivo. The selection of variables that characterize different features in true and false sites can also be developed further. In the present paper we defined variables representing both position-specific properties and global properties covering longer ranges of sequence data, and used stepwise discriminant analysis for the objective selection of variables. There are certainly other ways to set up the repertoire of variables and to select variables with optimally discriminating features. We believe that the prediction of protein function will become more useful and effective means in sequence analyses, as the development of both databases and algorithms continues. We thank Dr. Sho Takahashi for helping us collect the data of modification sites. This work was partly supported by the Protein Engineering Research Institue. REFERENCES 1. Wagh, P.V. & Bahl, O.P. (1981) CRC Crit. Rev. Biochem. 10, Tufty, R.M. & Kretsinger, R.H. (1975) Science 187, Nakai, K., Kidera, A., &Kanehisa, M. (1988) Protein Eng. 2, Kanehisa, M. (1988) Protein Eng. 2, Klein, P., Kanehisa, M., & DeLisi, C. (1984) Biochim. Biophys. Acta 787, Edelman, A.M., Blumenthal, D.K., & Krebs, E.G. (1987) Annu. Rev. Biochem. 56, George, D.G., Barker, W.C., & Hunt, L.T. (1986) Nucleic Acids Res. 14, Hunt, L.T. & Dayhoff, M.O. (1970) Biochem. Biophys. Res. Commun. 39, Hunter, T. (1987) Cell 50, Engstrom, L., Ekman, P., Humble, E., Ragnarsson, U., & Zetterqvist, M. (1984) Methods Enzymol. 107, Afifi, A.A. & Azen, S.P. (1979) Statistical Analysis: A Computer Oriented Approach, 2nd Ed., Academic Press, New York 12. Fujitsu Limited (1984) FACOM ANALYST Kaisetsusho (in Japanese) 13. Mardia, K.V., Kent, J.T., & Bibby, J.H. (1979) Multivariate Analysis, Academic Press, New York 14. Robson, B. & Suzuki, E. (1976) J. Mol. Biol. 107, Kanehisa, M.I. & Tsong, T.Y. (1980) Biopolymers 19, Chou, P.Y. & Fasman, G.D. (1978) Adv. Enzymol. 47, Eisenberg, D. (1984) Annu. Rev. Biochem. 53, Fauchere, J.-L. & Pliska, V. (1983) Eur. J. Med. Chem. 18, Chothia, C. (1975) Nature 254, Bause, E. & Legler, G. (1981) Biochem. J. 195, Williams, R.E. (1976) Science 192, Wold, F. (1981) Annu. Rev. Biochem. 50, Mononen, I. & Karjalainen, E. (1984) Biochim. Biophys. Acta 788, Aubert, J.-P., Biserte, G., & Loucheux-Lefebvre, M.-H. (1976) Arch. Biochem. Biophys. 175, Hunter, T. & Cooper, J.A. (1985) Annu. Rev. Biochem. 54, Krebs, E.G. & Beavo, J.A. (1979) Annu. Rev. Biochem. 48, Voorter, C.E.M., Molders, J.W.M., Bloemendal, H., & de Jong, W.W. (1986) Eur. J. Biochem. 160, Vol. 104, No. 5, 1988
Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I
Biochemistry - I Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I Hello, welcome to the course Biochemistry 1 conducted by me Dr. S Dasgupta,
More informationBioinformation Volume 5
Do N-glycoproteins have preference for specific sequons? R Shyama Prasad Rao 1, 2, *, Bernd Wollenweber 1 1 Aarhus University, Department of Genetics and Biotechnology, Forsøgsvej 1, Slagelse 4200, Denmark;
More informationAnswers to end of chapter questions
Answers to end of chapter questions Chapter 1 What are the three most important characteristics of QCA as a method of data analysis? QCA is (1) systematic, (2) flexible, and (3) it reduces data. What are
More informationProteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000).
Lecture 2: Principles of Protein Structure: Amino Acids Why study proteins? Proteins underpin every aspect of biological activity and therefore are targets for drug design and medicinal therapy, and in
More informationTHE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER
THE USE OF MULTIVARIATE ANALYSIS IN DEVELOPMENT THEORY: A CRITIQUE OF THE APPROACH ADOPTED BY ADELMAN AND MORRIS A. C. RAYNER Introduction, 639. Factor analysis, 639. Discriminant analysis, 644. INTRODUCTION
More informationChapter 10. Regulatory Strategy
Chapter 10 Regulatory Strategy Regulation of enzymatic activity: 1. Allosteric Control. Allosteric proteins have a regulatory site(s) and multiple functional sites Activity of proteins is regulated by
More informationChapter 4: Information and Knowledge in the Protein Insulin
Chapter 4: Information and Knowledge in the Protein Insulin This chapter will calculate the information and molecular knowledge in a real protein. The techniques discussed in this chapter to calculate
More informationProtein Modification Overview DEFINITION The modification of selected residues in a protein and not as a component of synthesis
Lecture Four: Protein Modification & Cleavage [Based on Chapters 2, 9, 10 & 11 Berg, Tymoczko & Stryer] (Figures in red are for the 7th Edition) (Figures in Blue are for the 8th Edition) Protein Modification
More informationSummary of Endomembrane-system
Summary of Endomembrane-system 1. Endomembrane System: The structural and functional relationship organelles including ER,Golgi complex, lysosome, endosomes, secretory vesicles. 2. Membrane-bound structures
More informationSection 1 Proteins and Proteomics
Section 1 Proteins and Proteomics Learning Objectives At the end of this assignment, you should be able to: 1. Draw the chemical structure of an amino acid and small peptide. 2. Describe the difference
More informationPage 8/6: The cell. Where to start: Proteins (control a cell) (start/end products)
Page 8/6: The cell Where to start: Proteins (control a cell) (start/end products) Page 11/10: Structural hierarchy Proteins Phenotype of organism 3 Dimensional structure Function by interaction THE PROTEIN
More informationCS612 - Algorithms in Bioinformatics
Spring 2016 Protein Structure February 7, 2016 Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called amino acids. Introduction to Protein Structure Amine
More informationBiological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A
Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Homework Watch the Bozeman video called, Biological Molecules Objective:
More informationTala Saleh. Ahmad Attari. Mamoun Ahram
23 Tala Saleh Ahmad Attari Minna Mushtaha Mamoun Ahram In the previous lecture, we discussed the mechanisms of regulating enzymes through inhibitors. Now, we will start this lecture by discussing regulation
More informationObjective: You will be able to explain how the subcomponents of
Objective: You will be able to explain how the subcomponents of nucleic acids determine the properties of that polymer. Do Now: Read the first two paragraphs from enduring understanding 4.A Essential knowledge:
More informationSEED HAEMATOLOGY. Medical statistics your support when interpreting results SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015
SYSMEX EDUCATIONAL ENHANCEMENT AND DEVELOPMENT APRIL 2015 SEED HAEMATOLOGY Medical statistics your support when interpreting results The importance of statistical investigations Modern medicine is often
More informationProperties of amino acids in proteins
Properties of amino acids in proteins one of the primary roles of DNA (but far from the only one!!!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationThis exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points.
MBB 407/511 Molecular Biology and Biochemistry First Examination - October 1, 2002 Name Social Security Number This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is
More informationBiomolecules: amino acids
Biomolecules: amino acids Amino acids Amino acids are the building blocks of proteins They are also part of hormones, neurotransmitters and metabolic intermediates There are 20 different amino acids in
More informationINTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1
INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1 5.1 Clinical Interviews: Background Information The clinical interview is a technique pioneered by Jean Piaget, in 1975,
More informationon Non-Consensus Protein Motifs Analytical & Formulation Sciences, Amgen. Seattle, WA
N-Linked Glycosylation on Non-Consensus Protein Motifs Alain Balland Analytical & Formulation Sciences, Amgen. Seattle, WA CASSS - Mass Spec 2010 Marina Del Rey, CA. September 8 th, 2010 Outline 2 Consensus
More informationAlternative splicing. Biosciences 741: Genomics Fall, 2013 Week 6
Alternative splicing Biosciences 741: Genomics Fall, 2013 Week 6 Function(s) of RNA splicing Splicing of introns must be completed before nuclear RNAs can be exported to the cytoplasm. This led to early
More informationPractice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate?
Life Sciences 1a Practice Problems 3 1. Draw the oligopeptide for Ala-Phe-Gly-Thr-Asp. You do not need to indicate the stereochemistry of the sidechains. Denote with arrows the bonds formed between the
More informationLecture 3. Tandem MS & Protein Sequencing
Lecture 3 Tandem MS & Protein Sequencing Nancy Allbritton, M.D., Ph.D. Department of Physiology & Biophysics 824-9137 (office) nlallbri@uci.edu Office- Rm D349 Medical Science D Bldg. Tandem MS Steps:
More informationObjectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests
Objectives Quantifying the quality of hypothesis tests Type I and II errors Power of a test Cautions about significance tests Designing Experiments based on power Evaluating a testing procedure The testing
More informationChemical Mechanism of Enzymes
Chemical Mechanism of Enzymes Enzyme Engineering 5.2 Definition of the mechanism 1. The sequence from substrate(s) to product(s) : Reaction steps 2. The rates at which the complex are interconverted 3.
More informationYou can t fix by analysis what you bungled by design. Fancy analysis can t fix a poorly designed study.
You can t fix by analysis what you bungled by design. Light, Singer and Willett Or, not as catchy but perhaps more accurate: Fancy analysis can t fix a poorly designed study. Producing Data The Role of
More informationIntroduction to proteins and protein structure
Introduction to proteins and protein structure The questions and answers below constitute an introduction to the fundamental principles of protein structure. They are all available at [link]. What are
More informationSpeaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies
Speaker Notes: Qualitative Comparative Analysis (QCA) in Implementation Studies PART 1: OVERVIEW Slide 1: Overview Welcome to Qualitative Comparative Analysis in Implementation Studies. This narrated powerpoint
More information6. The catalytic mechanism of arylsulfatase A and its theoretical investigation
6. The catalytic mechanism of arylsulfatase A and its theoretical investigation When the crystal structure of arylsulfatase A was solved, a remarkable structural analogy to another hydrolytic enzyme, the
More informationIntracellular Compartments and Protein Sorting
Intracellular Compartments and Protein Sorting Intracellular Compartments A eukaryotic cell is elaborately subdivided into functionally distinct, membrane-enclosed compartments. Each compartment, or organelle,
More informationPrevious Class. Today. Detection of enzymatic intermediates: Protein tyrosine phosphatase mechanism. Protein Kinase Catalytic Properties
Previous Class Detection of enzymatic intermediates: Protein tyrosine phosphatase mechanism Today Protein Kinase Catalytic Properties Protein Phosphorylation Phosphorylation: key protein modification
More informationProtein Investigator. Protein Investigator - 3
Protein Investigator Objectives To learn more about the interactions that govern protein structure. To test hypotheses regarding protein structure and function. To design proteins with specific shapes.
More informationAMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS
AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS Elena Rivneac PhD, Associate Professor Department of Biochemistry and Clinical Biochemistry State University of Medicine
More informationA review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *
A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many
More informationTowards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version]
Earth/matriX: SCIENCE TODAY Towards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version] By Charles William Johnson Earth/matriX Editions P.O.
More informationLAB#23: Biochemical Evidence of Evolution Name: Period Date :
LAB#23: Biochemical Evidence of Name: Period Date : Laboratory Experience #23 Bridge Worth 80 Lab Minutes If two organisms have similar portions of DNA (genes), these organisms will probably make similar
More informationMolecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions.
Chapter 9: Proteins Molecular Biology replication general transfer: occurs normally in cells transcription special transfer: occurs only in the laboratory in specific conditions translation unknown transfer:
More informationCHAPTER 10: REGULATORY STRATEGIES. Traffic signals control the flow of traffic
CHAPTER 10: REGULATORY STRATEGIES Traffic signals control the flow of traffic INTRODUCTION CHAPTER 10 The activity of enzymes must often be regulated so that they function at the proper time and place.
More informationEnzymes Part III: regulation II. Dr. Mamoun Ahram Summer, 2017
Enzymes Part III: regulation II Dr. Mamoun Ahram Summer, 2017 Advantage This is a major mechanism for rapid and transient regulation of enzyme activity. A most common mechanism is enzyme phosphorylation
More informationComplexity DNA. Genome RNA. Transcriptome. Protein. Proteome. Metabolites. Metabolome
DNA Genome Complexity RNA Transcriptome Systems Biology Linking all the components of a cell in a quantitative and temporal manner Protein Proteome Metabolites Metabolome Where are the functional elements?
More informationAmino acids. Side chain. -Carbon atom. Carboxyl group. Amino group
PROTEINS Amino acids Side chain -Carbon atom Amino group Carboxyl group Amino acids Primary structure Amino acid monomers Peptide bond Peptide bond Amino group Carboxyl group Peptide bond N-terminal (
More informationBiological Mass Spectrometry. April 30, 2014
Biological Mass Spectrometry April 30, 2014 Mass Spectrometry Has become the method of choice for precise protein and nucleic acid mass determination in a very wide mass range peptide and nucleotide sequencing
More informationSelf-association of α-chymotrypsin: Effect of amino acids
J. Biosci., Vol. 13, Number 3, September 1988, pp. 215 222. Printed in India. Self-association of α-chymotrypsin: Effect of amino acids T. RAMAKRISHNA and M. W. PANDIT* Centre for Cellular and Molecular
More informationProtein Trafficking in the Secretory and Endocytic Pathways
Protein Trafficking in the Secretory and Endocytic Pathways The compartmentalization of eukaryotic cells has considerable functional advantages for the cell, but requires elaborate mechanisms to ensure
More informationComparison of volume estimation methods for pancreatic islet cells
Comparison of volume estimation methods for pancreatic islet cells Jiří Dvořák a,b, Jan Švihlíkb,c, David Habart d, and Jan Kybic b a Department of Probability and Mathematical Statistics, Faculty of Mathematics
More informationTrilateral Project WM4
ANNEX 2: Comments of the JPO Trilateral Project WM4 Comparative studies in new technologies Theme: Comparative study on protein 3-dimensional (3-D) structure related claims 1. Introduction As more 3-D
More informationIAASB Main Agenda (February 2007) Page Agenda Item PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED)
IAASB Main Agenda (February 2007) Page 2007 423 Agenda Item 6-A PROPOSED INTERNATIONAL STANDARD ON AUDITING 530 (REDRAFTED) AUDIT SAMPLING AND OTHER MEANS OF TESTING CONTENTS Paragraph Introduction Scope
More informationIonization of amino acids
Amino Acids 20 common amino acids there are others found naturally but much less frequently Common structure for amino acid COOH, -NH 2, H and R functional groups all attached to the a carbon Ionization
More informationShort polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer
HO 1 2 3 H HO H Short polymer Dehydration removes a water molecule, forming a new bond Unlinked monomer H 2 O HO 1 2 3 4 H Longer polymer (a) Dehydration reaction in the synthesis of a polymer HO 1 2 3
More informationThe Structure and Function of Macromolecules
The Structure and Function of Macromolecules Macromolecules are polymers Polymer long molecule consisting of many similar building blocks. Monomer the small building block molecules. Carbohydrates, proteins
More informationReview of Biochemistry
Review of Biochemistry Chemical bond Functional Groups Amino Acid Protein Structure and Function Proteins are polymers of amino acids. Each amino acids in a protein contains a amino group, - NH 2,
More informationMinimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA
Minimizing Uncertainty in Property Casualty Loss Reserve Estimates Chris G. Gross, ACAS, MAAA The uncertain nature of property casualty loss reserves Property Casualty loss reserves are inherently uncertain.
More informationCopyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings
Concept 5.4: Proteins have many structures, resulting in a wide range of functions Proteins account for more than 50% of the dry mass of most cells Protein functions include structural support, storage,
More informationCHAPTER 21: Amino Acids, Proteins, & Enzymes. General, Organic, & Biological Chemistry Janice Gorzynski Smith
CHAPTER 21: Amino Acids, Proteins, & Enzymes General, Organic, & Biological Chemistry Janice Gorzynski Smith CHAPTER 21: Amino Acids, Proteins, Enzymes Learning Objectives: q The 20 common, naturally occurring
More informationStepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework
Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße
More informationobservational studies Descriptive studies
form one stage within this broader sequence, which begins with laboratory studies using animal models, thence to human testing: Phase I: The new drug or treatment is tested in a small group of people for
More information2. Which of the following amino acids is most likely to be found on the outer surface of a properly folded protein?
Name: WHITE Student Number: Answer the following questions on the computer scoring sheet. 1 mark each 1. Which of the following amino acids would have the highest relative mobility R f in normal thin layer
More informationChristine Vogel 1, Edward M. Marcotte 1 *
CALCULATING ABSOLUTE PROTEIN ABUNDANCE FROM MASS SPECTROMETRY BASED PROTEIN EXPRESSION DATA - SUPPLEMENTARY NOTES Christine Vogel 1, Edward M. Marcotte 1 * 1 Center for Systems and Synthetic Biology, Institute
More informationStatin inhibition of HMG-CoA reductase: a 3-dimensional view
Atherosclerosis Supplements 4 (2003) 3/8 www.elsevier.com/locate/atherosclerosis Statin inhibition of HMG-CoA reductase: a 3-dimensional view Eva Istvan * Department of Molecular Microbiology, Howard Hughes
More informationLipids: diverse group of hydrophobic molecules
Lipids: diverse group of hydrophobic molecules Lipids only macromolecules that do not form polymers li3le or no affinity for water hydrophobic consist mostly of hydrocarbons nonpolar covalent bonds fats
More informationCatalysis & specificity: Proteins at work
Catalysis & specificity: Proteins at work Introduction Having spent some time looking at the elements of structure of proteins and DNA, as well as their ability to form intermolecular interactions, it
More informationClassification of amino acids: -
Page 1 of 8 P roteinogenic amino acids, also known as standard, normal or primary amino acids are 20 amino acids that are incorporated in proteins and that are coded in the standard genetic code (subunit
More information1. Describe the relationship of dietary protein and the health of major body systems.
Food Explorations Lab I: The Building Blocks STUDENT LAB INVESTIGATIONS Name: Lab Overview In this investigation, you will be constructing animal and plant proteins using beads to represent the amino acids.
More informationPeptide hydrolysis uncatalyzed half-life = ~450 years HIV protease-catalyzed half-life = ~3 seconds
Uncatalyzed half-life Peptide hydrolysis uncatalyzed half-life = ~450 years IV protease-catalyzed half-life = ~3 seconds Life Sciences 1a Lecture Slides Set 9 Fall 2006-2007 Prof. David R. Liu In the absence
More informationContext of Best Subset Regression
Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression Eugene Kennedy South Carolina Department of Education A monte carlo study was conducted to examine the performance
More informationAgents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based Models
Int J Comput Math Learning (2009) 14:51 60 DOI 10.1007/s10758-008-9142-6 COMPUTER MATH SNAPHSHOTS - COLUMN EDITOR: URI WILENSKY* Agents with Attitude: Exploring Coombs Unfolding Technique with Agent-Based
More informationHow to interpret results of metaanalysis
How to interpret results of metaanalysis Tony Hak, Henk van Rhee, & Robert Suurmond Version 1.0, March 2016 Version 1.3, Updated June 2018 Meta-analysis is a systematic method for synthesizing quantitative
More informationBiochemistry Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology Kharagpur. Lecture -02 Amino Acids II
Biochemistry Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology Kharagpur Lecture -02 Amino Acids II Ok, we start off with the discussion on amino acids. (Refer Slide Time: 00:48)
More information2013 John Wiley & Sons, Inc. All rights reserved. PROTEIN SORTING. Lecture 10 BIOL 266/ Biology Department Concordia University. Dr. S.
PROTEIN SORTING Lecture 10 BIOL 266/4 2014-15 Dr. S. Azam Biology Department Concordia University Introduction Membranes divide the cytoplasm of eukaryotic cells into distinct compartments. The endomembrane
More informationBio 366: Biological Chemistry II Test #1, 100 points (7 pages)
Bio 366: Biological Chemistry II Test #1, 100 points (7 pages) READ THIS: Take a numbered test and sit in the seat with that number on it. Remove the numbered sticker from the desk, and stick it on the
More informationThe Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5
Key Concepts: The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5 Proteins include a diversity of structures, resulting in a wide range of functions Proteins Enzymatic s
More informationBinary Diagnostic Tests Two Independent Samples
Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary
More informationEffects of Second Messengers
Effects of Second Messengers Inositol trisphosphate Diacylglycerol Opens Calcium Channels Binding to IP 3 -gated Channel Cooperative binding Activates Protein Kinase C is required Phosphorylation of many
More informationMoorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination # 5: Section Five May 7, Name: (print)
Moorpark College Chemistry 11 Fall 2013 Instructor: Professor Gopal Examination # 5: Section Five May 7, 2013 Name: (print) Directions: Make sure your examination contains TEN total pages (including this
More informationChapter 5: Field experimental designs in agriculture
Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationPAPER No. : 16, Bioorganic and biophysical chemistry MODULE No. : 22, Mechanism of enzyme catalyst reaction (I) Chymotrypsin
Subject Paper No and Title 16 Bio-organic and Biophysical Module No and Title 22 Mechanism of Enzyme Catalyzed reactions I Module Tag CHE_P16_M22 Chymotrypsin TABLE OF CONTENTS 1. Learning outcomes 2.
More informationPredicting antigenic sites of hemagglutinin-neuraminidase glycoprotein Newcastle disease virus
Romanian Biotechnological Letters Vol. 14, No.4, 2009, pp. 4589-4596 Copyright 2008 Bucharest University Printed in Romania. All rights reserved Romanian Society of Biological Sciences ORIGINAL PAPER Predicting
More informationMutations and Disease Mutations in the Myosin Gene
Biological Sciences Initiative HHMI Mutations and Disease Mutations in the Myosin Gene Goals Explore how mutations can lead to disease using the myosin gene as a model system. Explore how changes in the
More informationLecture 15. Signal Transduction Pathways - Introduction
Lecture 15 Signal Transduction Pathways - Introduction So far.. Regulation of mrna synthesis Regulation of rrna synthesis Regulation of trna & 5S rrna synthesis Regulation of gene expression by signals
More informationChapter 3: Amino Acids and Peptides
Chapter 3: Amino Acids and Peptides BINF 6101/8101, Spring 2018 Outline 1. Overall amino acid structure 2. Amino acid stereochemistry 3. Amino acid sidechain structure & classification 4. Non-standard
More informationExtraction of tumor regions keeping boundary shape information from chest X-ray CT images and benign/malignant discrimination
Extraction of tumor regions keeping boundary shape information from chest X-ray CT images and benign/malignant discrimination Yasushi Hirano a, Jun-ichi Hasegawa b, Jun-ichiro Toriwaki a, Hironobu Ohmatsu
More informationSEQUENCE FEATURE VARIANT TYPES
SEQUENCE FEATURE VARIANT TYPES DEFINITION OF SFVT: The Sequence Feature Variant Type (SFVT) component in IRD (http://www.fludb.org) is a relatively novel approach that delineates specific regions, called
More informationBIO 311C Spring Lecture 15 Friday 26 Feb. 1
BIO 311C Spring 2010 Lecture 15 Friday 26 Feb. 1 Illustration of a Polypeptide amino acids peptide bonds Review Polypeptide (chain) See textbook, Fig 5.21, p. 82 for a more clear illustration Folding and
More informationMolecular Graphics Perspective of Protein Structure and Function
Molecular Graphics Perspective of Protein Structure and Function VMD Highlights > 20,000 registered Users Platforms: Unix (16 builds) Windows MacOS X Display of large biomolecules and simulation trajectories
More informationThe Pretest! Pretest! Pretest! Assignment (Example 2)
The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability
More informationContents. 2 Statistics Static reference method Sampling reference set Statistics Sampling Types...
Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium http://www.computationalproteomics.com icelogo manual Niklaas Colaert
More informationDoes scene context always facilitate retrieval of visual object representations?
Psychon Bull Rev (2011) 18:309 315 DOI 10.3758/s13423-010-0045-x Does scene context always facilitate retrieval of visual object representations? Ryoichi Nakashima & Kazuhiko Yokosawa Published online:
More informationComputational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project
Computational Identification and Prediction of Tissue-Specific Alternative Splicing in H. Sapiens. Eric Van Nostrand CS229 Final Project Introduction RNA splicing is a critical step in eukaryotic gene
More informationChoose an approach for your research problem
Choose an approach for your research problem This course is about doing empirical research with experiments, so your general approach to research has already been chosen by your professor. It s important
More informationSimple Cancer Screening Based on Urinary Metabolite Analysis
FEATURED ARTICLES Taking on Future Social Issues through Open Innovation Life Science for a Healthy Society with High Quality of Life Simple Cancer Screening Based on Urinary Metabolite Analysis Hitachi
More informationProbability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data
Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data Tong WW, McComb ME, Perlman DH, Huang H, O Connor PB, Costello
More informationEvaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY
2. Evaluation Model 2 Evaluation Models To understand the strengths and weaknesses of evaluation, one must keep in mind its fundamental purpose: to inform those who make decisions. The inferences drawn
More informationCHM 341 C: Biochemistry I. Test 2: October 24, 2014
CHM 341 C: Biochemistry I Test 2: ctober 24, 2014 This test consists of 14 questions worth points. Make sure that you read the entire question and answer each question clearly and completely. To receive
More informationStatistically Optimized Biopsy Strategy for the Diagnosis of Prostate Cancer
Statistically Optimized Biopsy Strategy for the Diagnosis of Prostate Cancer Dinggang Shen 1, Zhiqiang Lao 1, Jianchao Zeng 2, Edward H. Herskovits 1, Gabor Fichtinger 3, Christos Davatzikos 1,3 1 Center
More informationCHM333 LECTURE 6: 1/25/12 SPRING 2012 Professor Christine Hrycyna AMINO ACIDS II: CLASSIFICATION AND CHEMICAL CHARACTERISTICS OF EACH AMINO ACID:
AMINO ACIDS II: CLASSIFICATION AND CHEMICAL CHARACTERISTICS OF EACH AMINO ACID: - The R group side chains on amino acids are VERY important. o Determine the properties of the amino acid itself o Determine
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationMolecular Medicine: Gleevec and Chronic Myelogenous Leukemia. Dec 14 & 19, 2006 Prof. Erin O Shea Prof. Dan Kahne
Molecular Medicine: Gleevec and Chronic Myelogenous Leukemia Dec 14 & 19, 2006 Prof. Erin Shea Prof. Dan Kahne 1 Cancer, Kinases and Gleevec: 1. What is CML? a. Blood cell maturation b. Philadelphia Chromosome
More information