Strain in Protein Structures as Viewed Through Nonrotameric Side Chains: I. Their Position and Interaction

Similar documents
CS612 - Algorithms in Bioinformatics

Amino Acids. Review I: Protein Structure. Amino Acids: Structures. Amino Acids (contd.) Rajan Munshi

Introduction to proteins and protein structure

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points.

Bioinformatics for molecular biology

Practice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate?

The Basics: A general review of molecular biology:

Copyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings

Objective: You will be able to explain how the subcomponents of

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A

2. Which of the following amino acids is most likely to be found on the outer surface of a properly folded protein?

Introduction to Protein Structure Collection

Chapter 3: Amino Acids and Peptides

Biomolecules: amino acids

Chemical Nature of the Amino Acids. Table of a-amino Acids Found in Proteins

paper and beads don t fall off. Then, place the beads in the following order on the pipe cleaner:

Review II: The Molecules of Life

Arginine side chain interactions and the role of arginine as a mobile charge carrier in voltage sensitive ion channels. Supplementary Information

Molecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions.

BIO 311C Spring Lecture 15 Friday 26 Feb. 1

Properties of amino acids in proteins

Judy Wieber. Department of Computational Biology. May 27, 2008

Reading from the NCBI

Protein Secondary Structure

Short polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer

Proteins and their structure

Terminal Residues in Protein Chains: Residue Preference, Conformation, and Interaction

Methionine (Met or M)

Accuracy of Side-Chain Prediction Upon Near-Native Protein Backbones Generated by Ab Initio Folding Methods

Lecture 10 More about proteins

Page 8/6: The cell. Where to start: Proteins (control a cell) (start/end products)

The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5

AP Bio. Protiens Chapter 5 1

Chemistry 121 Winter 17

Lecture 15. Membrane Proteins I

Proteins consist of joined amino acids They are joined by a Also called an Amide Bond

PROTEINS. Amino acids are the building blocks of proteins. Acid L-form * * Lecture 6 Macromolecules #2 O = N -C -C-O.

Introduction. Basic Structural Principles PDB

!"#$%&' (#%) /&'(2+"( /&3&4,, ! " #$% - &'()!% *-sheet -(!-Helix - &'(&') +,(-. - &'()&+) /&%.(0&+(! - &'(1&2%( Basic amino acids

Levels of Protein Structure:

The Structure and Function of Macromolecules

Supplementary Figure-1. SDS PAGE analysis of purified designed carbonic anhydrase enzymes. M1-M4 shown in lanes 1-4, respectively, with molecular

Lipids: diverse group of hydrophobic molecules

H C. C α. Proteins perform a vast array of biological function including: Side chain

Multiple-Choice Questions Answer ALL 20 multiple-choice questions on the Scantron Card in PENCIL

1. Describe the relationship of dietary protein and the health of major body systems.

Strain in Protein Structures as Viewed Through Nonrotameric Side Chains: II. Effects Upon Ligand Binding

Macromolecules of Life -3 Amino Acids & Proteins

Protein Structure Monday, March

CHAPTER 21: Amino Acids, Proteins, & Enzymes. General, Organic, & Biological Chemistry Janice Gorzynski Smith

9/6/2011. Amino Acids. C α. Nonpolar, aliphatic R groups

Gentilucci, Amino Acids, Peptides, and Proteins. Peptides and proteins are polymers of amino acids linked together by amide bonds CH 3

Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000).

PROTEINS. Building blocks, structure and function. Aim: You will have a clear picture of protein construction and their general properties

Amino Acids. Lecture 4: Margaret A. Daugherty. Fall Swiss-prot database: How many proteins? From where?

Secondary Structure. by hydrogen bonds

Phenylketonuria (PKU) Structure of Phenylalanine Hydroxylase. Biol 405 Molecular Medicine

Green Segment Contents

Amino acids & Protein Structure Chemwiki: Chapter , with most emphasis on 16.3, 16.4 and 16.6

Ionization of amino acids

Chem Lecture 2 Protein Structure

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I

BCH Graduate Survey of Biochemistry

Chapter 5: Structure and Function of Macromolecules AP Biology 2011

For questions 1-4, match the carbohydrate with its size/functional group name:

(B D) Three views of the final refined 2Fo-Fc electron density map of the Vpr (red)-ung2 (green) interacting region, contoured at 1.4σ.

3.2 Ligand-Binding at Nicotinic Acid Receptor Subtypes GPR109A/B

Secondary Structure North 72nd Street, Wauwatosa, WI Phone: (414) Fax: (414) dmoleculardesigns.com

Reactions and amino acids structure & properties

Lecture 4: 8/26. CHAPTER 4 Protein Three Dimensional Structure

Proteins. Amino acids, structure and function. The Nobel Prize in Chemistry 2012 Robert J. Lefkowitz Brian K. Kobilka

Biology. Lectures winter term st year of Pharmacy study

Introduction to Peptide Sequencing

Prediction of temperature factors from protein sequence

Chemical Mechanism of Enzymes

Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation

LAB#23: Biochemical Evidence of Evolution Name: Period Date :

Atypical Natural Killer T-cell receptor recognition of CD1d-lipid antigens supplementary Information.

A Chemical Look at Proteins: Workhorses of the Cell

Protein Refinement in Discovery Studio 1.7. Presented by: Francisco Hernandez-Guzman, PhD

Towards a New Paradigm in Scientific Notation Patterns of Periodicity among Proteinogenic Amino Acids [Abridged Version]

Supplementary Figure 1 Preparation, crystallization and structure determination of EpEX. (a), Purified EpEX and EpEX analyzed on homogenous 12.

SDS-Assisted Protein Transport Through Solid-State Nanopores

Chapter 20 and GHW#10 Questions. Proteins

1. (38 pts.) 2. (25 pts.) 3. (15 pts.) 4. (12 pts.) 5. (10 pts.) Bonus (12 pts.) TOTAL (100 points)

Biology 2E- Zimmer Protein structure- amino acid kit

For questions 1-4, match the carbohydrate with its size/functional group name:

Proteins are linear polymers built of monomer units called amino acids. Proteins contain a wide range of functional groups.

Catalysis & specificity: Proteins at work

The three important structural features of proteins:

a) The statement is true for X = 400, but false for X = 300; b) The statement is true for X = 300, but false for X = 200;

Structure of the measles virus hemagglutinin bound to the CD46 receptor. César Santiago, María L. Celma, Thilo Stehle and José M.

1-To know what is protein 2-To identify Types of protein 3- To Know amino acids 4- To be differentiate between essential and nonessential amino acids

Proteins are big molecules. The covalent backbone of

Supplementary Information

Four Classes of Biological Macromolecules. Biological Macromolecules. Lipids

BIRKBECK COLLEGE (University of London)

Macromolecules Structure and Function

Transcription:

PROTEINS: Structure, Function, and Genetics 37:30 43 (1999) Strain in Protein Structures as Viewed Through Nonrotameric Side Chains: I. Their Position and Interaction Jaap Heringa 1,2 * and Patrick Argos 2 1 Division of Mathematical Biology, National Institute for Medical Research, London, United Kingdom 2 European Molecular Biology Laboratory, Heidelberg, Germany ABSTRACT We studied the relative spatial positioning of nonrotameric side chains with atypical and strained dihedral angles in well-refined protein tertiary structures. The analysis was confined to buried protein cores, which are less error prone to side-chain positioning. More than half of the with two or more nonrotameric residues displayed clusters of two or more (and up to five) nonrotameric residues. The clusters exhibited lower average crystallographic temperature factors compared with isolated nonrotameric residues. Nonrotameric clusters showed significantly tighter packing than corresponding rotameric clusters and had distinct residue compositions that did not correlate with amino acid characteristics such as size, hydrophobicity, turn preference, and the like. Such nonrotameric residue biases would suggest that spatially concentrated strain in protein folds would be minimized by lowered vibrational energy. Furthermore, nonrotameric residues avoided helices and strands and mostly preferred coil regions. If they were in the helical conformation, then they preferred to be within N-terminal segments. Proteins 1999;37:30 43. 1999 Wiley-Liss, Inc. Key words: rotamer; side-chain clusters; protein folding; protein tertiary structure; protein stability INTRODUCTION The rotamer concept of side-chain conformation relies on a representation of the allowed conformational space by a limited number of torsion angle combinations. Ramakrishnan and Ramachandran 1 found for each amino acid type that some regions in the (, ) torsion angle space are excluded because of unfavorable short-range interactions with the nearest peptide groups. Clustering around certain values for most amino acid side chains represented a rotameric state or a local minimum in potential energy. 2 6 Ponder and Richards 7 loosened the latter criterion and suggested 67 rotamers for 17 amino acid types for which there were sufficient observations. For example, the mean values for 2 of aspartate and asparagine and for 3 of glutamate and glutamine deviate considerably from the expected rotameric value of 90 for planar C and C, atoms, respectively. The Ponder and Richards rotamer approach was confirmed later by studies on conformational entropy. 8,9 Their libraries have been used widely by researchers in tertiary structure prediction, especially in homology modeling procedures. The side chains are positioned on a given main-chain trace, either confined to groups 10 19 or all simultaneously, 20 26 and their placement generally is directed by rotameric side-chain configurations and optimization techniques, because an exhaustive search over all rotameric combinations quickly becomes computationally inhibitive in larger. 27 Schrauber et al. 28 found that the fraction of side chains deviating substantially from the rotamer libraries could not be attributed merely to the absence of crystallographic resolution and that many such outliers result from structural constraints at the level of secondary and tertiary structure. Schrauber et al. also expanded the Ponder and Richards rotamer libraries 8,9 based on increased structural data and assigned a residue rotameric if all of its side-chain dihedral angles were within 20 of that expected from the library. This deviation was found to be physically significant, because a greater variance in either the 1 angle or the 2 angle implied an increase in potential energy of up to 2 kcal/mol,. Moreover, Lee et al. 9 found that the harmonic approximation of the energy well corresponding to a rotamer is valid within 20 of the equilibrium angle. In this study, we investigate the possible structural role of nonrotameric side chains by examining their relative spatial positioning within the protein s tertiary structure and their interactions. As a criterion for declaring a side chain as nonrotameric, a deviation in either of its first two angles by 20 from its nearest associated rotamer was adopted. Care was taken to confine the study to buried residues [with a solvent accessible surface area (ASA) of 5Å 2 ], because solvent-exposed residues often are configured unreliably in crystal structures. The chance in densely packed core regions for residues to flip from one rotameric state to another is virtually absent, 29 in contrast to solventexposed loop regions in which a crystallographically defined side-chain conformation actually may reflect an average between two rotameric states. It is significant for protein stability if the strained nonrotamers are distributed uniformly over the tertiary structure or clustered Abbreviations: rot, rotameric; nonrot, nonrotameric; ASA, solvent accessible surface area. *Correspondence to: Jaap Heringa, Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom. E-mail: jhering@nimr.mrc.ac.uk Received 23 April 1999; Accepted 26 April 1999 1999 WILEY-LISS, INC.

NONROTAMERIC SIDE-CHAIN CLUSTERS 31 TABLE I. Characteristics of Protein Sets Used in This Analysis Protein set (minimal structural resolution in Å) Average no. of included core residues per protein a 1.5 85 33.8 29.8 1.8 234 33.8 31.1 2.0 365 36.6 35.4 2.5 398 45.7 37.1 3.0 475 44.5 37.3 NMR b 174 12.4 8.3 a Numbers are given for core residues with a total solvent accessible surface of 5 Å 2 or less over all main- and side-chain atoms. b Set of nuclear magnetic resonance (NMR)-elicited structures for which no crystallographic resolution is defined. such that one nonrotamer forces another in the vicinity. In this study, we found that nonrotameric residues indeed showed a preference to interact with one another when such clusters included up to five residues in well-refined protein structures. Their amino acid compositions were distinctively different from those of rotameric residues and isolated (i.e., nonclustered) nonrotameric residues. Attempts were made to relate the clusters to other structural characters and to elucidate their possible role in folding and structural stability. Such results are important for protein engineering experiments and side-chain prediction methods based on rotameric libraries. MATERIALS AND METHODS Databases With the routine OBSTRUCT of Heringa et al., 30 five different sets of were compiled, each of which contained X-ray elicited, full-atom tertiary structures from the Protein Data Bank (PDB) depository 31 with resolutions 1.5 Å, 1.8 Å, 2.0 Å, 2.5 Å, and 3.0 Å, respectively. For each set, OBSTRUCT compiled the largest possible set of structures that displayed pairwise sequence identities of 35% after alignment of all possible pairs. For control purposes, a set of 174 nuclear magnetic resonance (NMR)-elicited structures also was obtained using OBSTRUCT. 30 Because NMR structures in general contain fewer amino acids than structures solved by X-ray crystallography, we insisted that each structure would comprise at least 80 residues with pairwise identities 25%. We only considered core residues showing an ASA 5Å 2 for all atoms in the main- and side chains to minimize error. All residue-solvent ASAs were calculated with the program DSSP. 32 Some statistics from the five resulting sets used are presented in Table I. The protein sets with minimal structural resolution of 1.8 Å and 2.0 Å were used to obtain the basic results of this work. The crystallographic R-factors for the latter two sets were 0.22 and 0.23, respectively. The actual protein lists can be obtained from the authors. Statistical Significance In this work, statistical results are given with error estimates. In cases with grouped data rather than distributed univariate data associated with the normal distribution (for which a normal standard error was calculated), the preference data expressed as P i (f i /f)/(f i /F), where f i and F i are the observed frequencies of entity i for the considered case (e.g,., the number of Asp residues in helix over all and the number in all secondary structures, respectively), f f i, and F F i (summation is over all residue types), the error estimate for P i is calculated as follows: 33,34 (P i ) 1 F i /F f i(1 f i /f). With this approximation of the standard error, it is 67.5% certain that P i falls between P i (P i ) and P i (P i ). 33 Rotamericity The angles of each included side chain were determined as in Schrauber et al. 28 The 1 and 2 angles for each residue were compared with the rotamer libraries of Ponder and Richards 7 and those of Schrauber et al. 28 Although both libraries were obtained by using a clustering approach, corresponding angles are nonidentical. Whenever one of the two angles differed by 20 from the nearest rotamer given in any of the two libraries for the residue type considered, the residue was declared nonrotameric; i.e., if 0 r1 1 0 20 or 0 r2 2 0 20, where 1 and 2 are the side-chain dihedral angles of the nonrotameric residue, and r1 and r2, respectively, are the corresponding dihedral angles of the nearest associated rotamer for the particular residue type. For amino acids that are symmetric around their 2 angle (i.e., Asp, Phe, and Tyr), care also was taken to test this angle rotated by 180. The libraries of Ponder and Richards 7 and those of Schrauber et al. 28 were used simultaneously to ensure a more stringent declaration of nonrotameric residues given the binary criterion used (i.e., rotameric or nonrotameric). Residue types Gly and Ala, without defined rotameric angles, and type Pro, which shows restrained side-chain dihedral angles [ 1 35 (C -endo) or 1 35 (C -exo)], were excluded from the analysis. By using the protocol described above for each side chain to determine whether it is rotameric or nonrotameric, the rotamericity of a protein structure is defined as the percentage of residues within the structure found rotameric. Determination of Nonrotameric and Rotameric Clusters Side-chain pairs were considered to interact when two atoms, one from each of the side chains, were separated by 5 Å. Given the near 4-Å carbon van der Waals diameter, the extra angstrom allowed consideration of crystallographic error, hydrogen atoms, or weak packing without permitting an intervening atom.

32 J. HERINGA AND P. ARGOS From the lists of interactions for each core residue, clusters of rotameric and nonrotameric residues were generated. A residue was assigned to a cluster if it interacted with at least one other member in the cluster. Hence, the minimum number of residues in a cluster was two. The algorithm used for cluster delineation was a depth-firstsearch procedure that determined all nonrotameric and rotameric unconnected clusters. Interaction Preference of Nonrotameric Side Chains To assess whether nonrotameric residues tend to interact more than rotameric ones, we performed the following control: For any nonrotameric side chain, the random likelihood that at least one of its neighbors also is nonrotameric is given by P nonrotcls 1 (P rot ) N, where P rot is the probability of a neighboring rotameric residue (i.e., the rotamericity expressed as a fraction), and N is the average number of neighboring side chains. A nonrotameric contact preference can then be calculated by compiling CntPrf P obs /P nonrotcls, where P obs is the number of observed nonrotameric contacts divided by the number of nonrotameric residues (i.e., the observed average number of nonrotameric neighbors for a nonrotameric residue). The nonrotameric contact preference (CntPrf) was calculated over complete protein sets (see Table I) from which the average rotameric fraction and average number of surrounding residues were compiled. In addition, because each protein displays an individual rotamericity and average number of neighbors for each side chain, the contact preference also was calculated over individual with at least two nonrotameric residues (and thus were able to afford a nonrotameric cluster). The above-described control on the nonrotameric interaction preference does not take any compositional biases into account, which might well give rise to differences in interaction behavior. Therefore, a second control experiment was carried out: For each protein, we attempted to find a set of rotameric side chains with numbers of residues as well as amino acid types equal to those of the nonrotameric residue set. If, for particular residue types, the rotameric abundance was greater than that of the nonrotameric abundance, then all possible rotameric groups were assembled where each group possessed the nonrotameric residue group composition. Then, for the nonrotameric and rotameric control group(s), the numbers of unique pairwise side-chain interactions were determined by using the contact criteria described above. If more than one rotameric control group could be found in a protein, then the mean interaction value for each rotameric group was used. The ratio obtained in this manner between the nonrotameric group interaction and the mean rotameric group interaction was taken as the preference value for the association of nonrotameric side chains in a given protein. A value 1.0 indicates that the nonrotameric side chains associate more than expected, whereas a value 1.0 implies avoidance. Only with at least one rotameric interacting residue pair and one nonrotameric interacting residue pair were considered. Secondary Structure Determination Secondary structures, as assigned by DSSP, 32 were classified as follows: G (3 10 -helix), H ( -helix), and I ( -helix) were taken as helical (H); B ( -bulge) and E (extended conformation) were taken as strand (E); S (bend) and T (reverse turn) were taken as turn (T); and C (remainder) was taken as coil (C). For the rotameric and nonrotameric residues as well as clusters of at least two, three, and four residues, the number of side chains in each of the four secondary structures was determined over all, and the fractions for each of the five states were calculated. The preferences of the nonrotameric and various cluster groups were compiled by calculating the ratio of the fraction of side chains in a particular secondary structure for a side-chain group considered and the corresponding fraction for the rotameric residues. For each of the four categorized secondary structures (H, E, T, and C), N-terminal, middle, and C-terminal fragments were defined. The lengths of the N-terminal and C-terminal fragments were set at 3, 2, 1, and 3 residues for H, E, T, and C, respectively. The length of the middle fragments was the number of residues remaining in the secondary structure segment, which could be zero, because only elements were sampled with respective minimum lengths of 6, 4, 2, and 6 residues for H, E, T, and C, respectively. Crystallographic Temperature Factor Statistics Crystallographic temperature or B-factors for single atoms were compared over side-chain groups. The B-factors measure the energetic or dynamic state of atoms by expressing the vibration amplitude about an equilibrium atomic position. 35,36 Although temperature factors measure this vibrational energy in principle, the parameter is influenced by crystal features, like static or dynamic disorder of atomic positions, which often are indistinguishable from genuine vibration. Furthermore, because the factors are determined by various refinement methods, they scale in each protein crystallographic structure determination. However, Carugo and Argos 36 recently compared a set of protein structures elicited through unbiased refinements, which show true values for differences between thermal factors of bonded atoms, 37 with corresponding PDB structures. They found that the standard deviations of thermal factors in the PDB structures were smaller than those in the structures generated through unbiased refinements. Therefore, Carugo and Argos 36 argued that trends in thermal factors from published structures might well be underestimated, and they showed further that systematic errors in thermal factors relating to refinement restraints become minimal under normalization to zero mean and unit variance; i.e., by comparing Z-scores of the thermal factors. Following Carugo and Argos, 36 a Z-score was calculated for side-chain atom groups in each protein as Z (AvBf AvBF Rot )/ Rot, where AvBf is the average B-factor of the group of sidechain atoms considered, AvBF Rot is that of the side-chain

NONROTAMERIC SIDE-CHAIN CLUSTERS 33 Fig. 1. Plot of 2 versus 1 for all relevant side -chains from protein structures at 1.8 Å resolution or better. atoms from all buried residues (ASA 5Å 2 ), and Rot is the standard deviation of the rotameric side-chain B-factors. Ratios of Z-scores for compared groups were averaged over all to yield the values reported. Intracluster Contacts To determine whether the nonrotameric side-chain clusters possess an internal packing different from that of rotameric clusters, we compared each nonrotameric cluster with all possible rotameric clusters within the same structure that contained numbers of amino acids identical to those in the observed nonrotameric cluster. All rotameric control clusters were assembled by using the same clustering criterion used for nonrotameric clusters. The intracluster contacts were determined following the method of Heringa and Argos 38,41 (i.e., for all side-chain atoms of a cluster residue, the number of unique atoms from other cluster side chains within a radius of 5 Å was determined and then normalized with respect to the maximum expected total number of surrounding atoms for the residue type considered, 38 thus eliminating any bias due to cluster composition). The average fraction of intracluster side-chain contacts was then taken over all cluster residues for each nonrotameric cluster as well as its associated rotameric control clusters, and the ratio of these fractions was determined for all nonrotameric clusters. Finally, the average of these latter ratios allowed comparison of the packing (contact) density found in nonrotameric and rotameric side-chain clusters. RESULTS Nonrotamericity and Angles Because our criterion for nonrotamericity involves a minimum deviation of 20 in either the 1 angle or the 2 angle relative to the corresponding angle in the nearest library rotamer, it is essential to show the independence of the 1 and 2 distributions. Figure 1 plots the ( 1, 2 ) points for all relevant side chains with solvent ASA values 5Å 2 and in protein structures with resolution 1.8 Å; clearly no relation is apparent. The correlation coefficient over all ( 1, 2 ) pairs is 0.07. Furthermore, the fraction of residues with 1 2 is 49%. However, the same fraction for only nonrotameric residues is 21%, indicating that more dramatic deviations from the rotamer states are indeed less allowable for 1 angles than for 2 angles. It could be argued that, if the placement of a side chain is determined equally by its 1 and 2 angles, then both should be taken into account when assessing rotamericity and strain by, for example, taking their sum. However, plotting ( 1 2 ) versus the maximum of 1 and 2 for all side chains (data not shown) shows a correlation coefficient of 0.97. The nonrotameric residues showed a mean angular displacement [( 1 2 )/2] that, on average, was 19 greater than that for rotameric amino acids, showing the potential for significantly increased strain in nonrotameric residues.

34 J. HERINGA AND P. ARGOS Fig. 2. Plot of the overall rotamericity versus cut-off values for solvent accessibility areas (ASA) for all residues in protein structures at 1.8 Å resolution or better. Fig. 3. Plots of the average rotamericity values versus the threshold values in crystallographic resolution for five protein sets (see Table I). Plots are presented for residues with ASA 5Å 2 (solid line), ASA 10 Å 2 (dotted line), and no restraints on ASA values (dashed line), i.e., all residues are sampled. ASA refers to the residue solvent accessible surface area for all of the main-chain and side-chain atoms. Solvent Accessibility and Structural Resolution There is no correlation ( 0.02) between the maximum of 1 and 2 values and the ASA of buried rotameric or nonrotameric residues; thus, the summed threshold of 5 Å 2 for all of the residue atoms to assign buried residues used in this study is sufficiently small to avoid any biases. However, as shown in Figure 2 for the protein set with a crystallographic resolution of 1.8 Å, there is a decreasing relation between the threshold surface area and the degree of rotamericity (fraction of side chains found rotameric) using a 20 angular deviation cut-off value. Clearly, degrees of freedom for side-chain dihedral angles quickly increase with solvent exposure, which supports our tight selection of almost completely buried residues. Figure 3 shows the rotamericity values of five protein sets under varying crystallographic resolution limits (Table I) for residues with maximal ASA values of 5 Å 2 and 10 Å 2 and without limit. The figure shows that rotamericities are affected dramatically by crystallographic resolutions 1.8 Å, which is in agreement with the findings of Schrauber et al. 28 and indicates that the 1.8-Å data set is optimal for the statistics gathered here. Although, at resolutions 1.8 Å, the rotamericities become clearly less affected, they do not become stable, as reported by Schrauber et al.

NONROTAMERIC SIDE-CHAIN CLUSTERS 35 TABLE II. Amino Acid Composition of All Rotameric, All Nonrotameric Isolated and Nonrotameric Clustered Groups, and Compositional Preferences of Nonrotameric Groups Amino Nnonrot in acid type Rot% a Nrot b Nnonrot c clusters d Nonrot preference e,f Nonrot cluster preference relative to rof e,g Nonrot cluster preference relative to nonrot e,h C 96.4 377 14 2 0.41 0.11 0.19 0.13 0.46 0.32 D 69.5 157 69 19 4.84 0.55 4.26 0.93 0.88 0.19 E 91.3 116 11 1 1.04 0.31 0.30 0.30 0.29 0.29 F 86.8 585 89 26 1.67 0.17 1.56 0.29 0.93 0.17 H 71.3 107 43 18 4.42 0.65 5.92 1.33 1.34 0.30 I 94.3 1047 63 20 0.66 0.08 0.67 0.14 1.02 0.22 K 85.2 23 4 1 1.91 0.95 1.53 1.53 0.80 0.80 L 95.1 1404 72 26 0.56 0.06 0.65 0.12 1.16 0.21 M 88.8 318 40 11 1.38 0.21 1.22 0.36 0.88 0.26 N 67.7 147 70 21 5.24 0.59 5.03 1.04 0.96 0.20 Q 93.9 108 7 2 0.71 0.27 0.65 0.46 0.91 0.64 R 87.0 67 10 4 1.64 0.52 2.10 1.04 1.28 0.63 S 93.2 551 40 10 0.80 0.12 0.64 0.20 0.80 0.25 T 97.4 522 14 4 0.30 0.08 0.27 0.13 0.91 0.45 V 97.4 1297 35 14 0.30 0.05 0.38 0.10 1.28 0.33 W 81.0 141 33 10 2.58 0.44 2.50 0.77 0.97 0.30 Y 86.3 284 45 17 1.74 0.25 2.11 0.49 1.21 0.28 Total 91.7 7251 659 206 (31.26%) a The rotamericity (Rot%) is defined as 100.0 Nrot/(Nrot Nnonnrot), where Nrot is the number of rotameric residues, and Nnonrot is the number of those found nonrotameric. b The number of rotameric residues. c The number of nonrotameric residues. d The number of nonrotameric residues in clusters comprising two or more contacting nonrotameric residues. e Preferences 1.20 are indicated in boldface. f Compositional preferences of the general nonrotameric side-chain composition given relative to the general rotameric side-chain composition g Compositional preferences of clustered nonrotameric side chains given relative to the general rotameric amino acid composition. h Compositional preferences of clustered nonrotameric side chains given relative to the general nonrotameric amino acid composition. Compositional Preferences Table II shows the numbers of all rotameric, nonrotameric, and nonrotameric clustered residues for the 1.8-Å resolution, ASA 5Å 2 protein set. Asp, Phe, His, Lys, Met, Asn, Arg, Trp, and Tyr have rotamericities 90%, comprising a mixture of polar, charged, and apolar residue types. More than 30% of the nonrotameric residues are in clusters. Table II also lists the amino acid preferences and associated error numbers of the nonrotameric residues relative to the rotameric composition (preference as ratio of two compositions) as well as the preferences of the clustered, nonrotameric residues relative to the rotameric and nonrotameric residue compositions. Clustered, nonrotameric residues show a residue composition similar to that of the nonrotamer residues in general. Residue types preferred in nonrotameric clusters, compared with nonrotameric residues in general, were His, Arg, Val, Tyr, and (to a lesser extent) Leu, again of varying natures. The residue types that showed consistent preferences in all three categories were His, Arg, and Tyr. The preferences listed in Table II showed no correlation values 0.6 with any of the typical amino acid characteristics in the data base of Nakai et al., 39 such as bulkiness, hydrophobicity, size, polarity, charge, and the like. It must be stressed that the compositional preferences were derived from critically small data samples and, particularly for Lys residues, must be viewed with caution. Of the aforementioned residues with rotamericities 90%, all except Met and Arg comprise sp 3 -sp 2 bond 2 dihedral angles, which are easier to vary stereochemically than the angles for the other bond types, although it must be stressed that, in densely packed protein cores, this freedom should be constrained to a large extent by steric conflicts. To determine whether the grouping into rotameric and nonrotameric was caused primarily by the 2 -angle distributions of the sp 3 -sp 2 residues, we checked for the 13 residue types with defined 1 and 2 angles to determine what the percentage of residues for each type showed 2 1, i.e. more strained 2 angles than 1 angles. For the sp 3 -sp 2 residue types Asp, Phe, His, Asn, Trp, and Tyr, the percentage comprised 81.1 on average, whereas the remaining seven residues of type sp 3 -sp 3 on average showed 73.7% with 2 1. It is clear that the sp 3 -sp 2 bond angle effect is minimal, with the associated residue types showing on average only 7% more side chains with predominant strain in their 2 angles than those adopting the sp 3 hybridized bond type. To test further any biases arising from inclusion of the 2 angle, we calculated nonrotameric amino acid compositional preferences using only the 1 angle of each residue in determining whether the residue was rotameric or nonrotameric. A cut-off value of 20 was used for the angular difference between a residue s 1 angle and its nearest associated library value. In Table III, the nonrotameric compositional preferences are given for

36 J. HERINGA AND P. ARGOS Residue type TABLE III. Nonrotameric Residue Preferences and Associated Ranks for Residues Declared Nonrotameric Using the Default Criterion and Taking Only the 1 Angle Into Account Nonrot preference c 1,2 20 a Nonrot preference rank Nonrot preference rank excluding C, S, T, V d Nonrot preference c 1 20 b Nonrot preference rank Nonrot preference rank excluding C, S, T, V d C 0.41 15 1.22 8 D 4.84 2 2 1.33 5 4 E 1.04 10 10 0.80 14 10 F 1.67 7 7 0.99 10 8 H 4.42 3 3 1.59 4 3 I 0.66 13 12 0.41 17 13 K 1.91 5 5 1.28 7 6 L 0.56 14 13 0.49 16 12 M 1.38 9 9 1.12 9 7 N 5.24 1 1 2.04 3 2 Q 0.71 12 11 0.86 13 9 R 1.64 8 8 1.31 6 5 S 0.80 11 2.39 1 T 0.30 16 0.87 12 V 0.30 16 0.89 11 W 2.58 4 4 0.75 15 11 Y 1.74 6 6 2.32 2 1 a The default criterion for rotameric residues including the 1 and 2 angle (see Materials and Methods). b Criterion for rotameric residues based on the 1 angle only. c Preferences 1.20 are indicated in bold. d Cys, Val, Thr, and Val (C, S, T, U, respectively) residue types have only a single side-chain dihedral angle ( 1 ) defined. the default criterion and the above 1 -only criterion. It is clear that the nonrotameric compositional biases are similar under both criteria. Only the four residue types Cys, Ser, Thr, and Val, with a single angle defined and the sp 3 -sp 2 bonded type Tyr, showed increased rankings when the 1 -only criterion was effected. The increased ranking for Cys, Ser, Thr, and Val is obvious, because the other residues have a smaller chance of becoming nonrotameric (it is not achievable through their 2 angles any longer), and the result for Tyr is interesting. Also, the two sp 3 -sp 2 - type residues Asn and Asp, with greatest rotational freedom in their 2 angles, remained preferred nonrotamers under the 1 -only criterion; when Cys, Ser, Thr, and Val were excluded from the analysis, Asn and Asp lowered their ranks by only 1 and 2 positions, respectively (Table III). Moreover, the overall rank correlation (excluding Cys, Ser, Thr, and Val) showed a value of 0.73, whereas the linear correlation of the raw nonrotameric compositional preferences remained high at 0.60. The notion that Asn residues show little clustering for the 2 values was challenged recently by Lovell et al., 40 who observed clear clustering behavior of the Asn 2 angles over a large database of 1,490 Asn residues selected on low B-factors from 240. Interaction of Nonrotameric Side Chains To test the tendency of nonrotameric residues to form clusters, the random likelihood for nonrotameric cluster formation was calculated as a control and compared with the observed fraction of nonrotameric side chains in clusters. The probability for any nonrotameric side chain that at least one of its neighbors also is nonrotameric, as explained in Materials and Methods, is given by 1 rotcty N, where rotcty is the fraction of rotameric residues, and N is the average number of neighboring side chains around each side chain (irrespective of whether it is rotameric or not). Table III shows that the average rotcty value in the 1.8-Å resolution, ASA 5Å 2 protein set was 0.9167, whereas N was 3.26 (i.e., a total of 25,768 contacts over 7,910 residues). Substituting these numbers in the above formula gives 1 (0.9167) 3.26 0.247, such that, if nonrotameric residues were randomly distributed, then 24.7% of those would be expected to occur in clusters. The observed percentage of 31.3 for nonrotameric clustered side chains results in a preference value of 1.27 for their clustering tendency compared with random nonrotameric association. It should be mentioned that, in these statistics, with zero or only a single nonrotameric residue also are included that are unable to afford a nonrotameric cluster and, thus, might bias the statistics. We therefore repeated the calculation over 114 containing at least two nonrotameric residues. The observed percentage of nonrotameric side chains in clusters over those 114 structures amounted to 33.23% (206 of 620 residues in clusters). Because the value of rotcty was at 0.9027 (5,752 rotameric residues out of 6,372 total), and N was 3.32 (21,120 contacts over 6,372 residues), the expected fraction of clustered nonrotamers was 28.77%, such that the nonrotameric association preference in with at least two nonrotamers showed a value of 1.16. The statistics given here were averages over the protein set used and, thus, assume an even distribution of nonrota-

NONROTAMERIC SIDE-CHAIN CLUSTERS 37 TABLE IV. Average Nonrotameric Cluster Preferences in Proteins Grouped According to Number of Nonrotameric Residues Nnonrot Average nonrotameric cluster preference 2 114 1.14 0.13 3 97 1.34 0.15 4 74 1.20 0.13 5 57 1.29 0.15 6 41 1.36 0.15 7 31 1.21 0.15 8 22 1.30 0.14 9 18 1.31 0.17 10 11 1.24 0.18 meric residues as well as the number of side chains around any given side chain, both of which are unlikely in the nonredundant set of protein structures used here. Therefore, the statistics described above also were calculated for individual, and the results are given in Table IV grouped according to the number of nonrotameric residues. The association preference for the general case (including with at least two nonrotameric residues) is the lowest and was only 14% higher than expected; indeed, the 17 containing two nonrotameric residues contributed no side-chain cluster (Table IV). However, the larger groups all showed nonrotameric association preferences significantly beyond the randomly expected value of 1.0 (Table IV). The correlation between the group sizes and cluster preferences was weakly positive, with a coefficient of 0.41, which is consistent given the nonrotameric cluster tendency. A second type of control that included the amino acid composition of clusters was performed for the individual in the 1.8-Å resolution, ASA 5Å 2 set (see Materials and Methods). In assessing the tendency of nonrotameric side chains to form contacting clusters compared with that of rotameric residues, it is essential that the rotameric control groups display the same amino acid composition to avoid any bias in clustering tendencies of individual residues. 38,41 With the 1.8-Å protein set and an ASA cut-off value of 5 Å 2, only 12 afforded both nonrotameric and rotameric control clusters. The tendency for nonrotameric residues to cluster on average was 2.15-fold ( 0.51) greater than that for rotameric residues. Because the sample of only 12 was critically small, it also was determined whether, for which no rotameric control groups could be assembled, displayed different association behavior. The average nonrotameric contact values for these were slightly greater than those of the small control group (data not shown), which, thus, could be considered representative for the complete protein set. Because both of the control experiments described above clearly indicate that nonrotameric side chains have a distinct tendency to cluster, we tried to relate the nonrotameric clusters to other structural features that might give insight into their significance for protein structure. In the remainder of this article, the structural stability and compactness of the clusters as well as their associated secondary structures are addressed. Cluster Statistics In the 1.8-Å data set, 221 out of the 234 total contained at least one residue that was 5 Å 2 or less solvent exposed. Contacts among rotameric residues were observed in 217 of 221 (97.7%), whereas 62 of 114 structures with two or more nonrotameric side chains showed interresidue contacts (53.4%), i.e., more than half of the tertiary structures in which a nonrotameric cluster could be formed did contain such clusters. To assess the significance of the latter fraction, we calculated the probability for the occurrence of a nonrotameric cluster in the average protein, which is 1 [P(no nonrot contact)] N, where P(no nonrot contact) denotes the probability of no contact between random nonrotameric residues, and N is the average number of nonrotameric residues. The random chance for nonrotameric association (over 114 protein structures with two or more nonrotamers) was 0.2877, such that P(no nonrot contact) was 0.7123, whereas N was 3.32 (see above). This led to a random probability of 1 (0.7123) 3.32 0.673 for the occurrence of a cluster in the average protein. The observed fraction of 0.534 is 20% lower, so that the distribution of the nonrotameric clusters over the protein structures is slightly biased to structures that comprise nonrotameric clusters, consistent with the observation that nonrotameric residues can form larger clusters with more than two residues. The numbers of nonrotameric clusters observed with 2, 3, 4, and 5 members, respectively, were 61, 21, 4, and 1 in the 1.8-Å,ASA 5Ådata set. Thus, clusters with three or more side chains comprised 30% of all nonrotameric clusters, whereas their constituents comprised 41% of all nonrotameric side chains. Stability To assess whether nonrotameric residues differ in stability, we compared the normalized crystallographic temperature factors of the nonrotameric side-chain atoms with those for rotameric side chains. In principle, temperature factors measure the flexibility or thermal vibrational state of individual protein atoms. Although they contain a random component, a relation between the thermal factor and vibrational energy is discernible 36 (see Materials and Methods). Table V lists the number of standard deviations (Z score) by which the mean of normalized, nonrotameric side-chain, atomic temperature factors was greater than the corresponding mean for rotameric side-chain atoms used to elicit the standard deviation. Included are all nonrotameric residues as well as those in clusters containing a minimum of two, three, or four residues. The data are given for the 1.8-Å resolution protein set and for residues buried under the 5-Å 2 and 10-Å 2 thresholds to check for consistency. Table V shows that nonrotameric residues have higher temperature factors than rotameric residues. On average, the nonrotameric temperature factors are

38 J. HERINGA AND P. ARGOS TABLE V. Average Normalized Crystallographic Temperature Factor Z-Scores of All Nonrotameric Residues, Those in Isolation, and Various Nonrotameric Cluster Size Groups Ratio a N b Z-score c N b Z-score ASA 5 Å ASA 10 Å Nonrot 152 0.51 0.19 169 0.47 0.12 Isol Nonrot 93 0.70 0.15 92 0.79 0.19 Cls 2 59 0.21 0.13 77 0.09 0.10 Cls 3 20 0.27 0.18 34 0.10 0.14 Cls 4 5 0.41 0.30 8 0.11 0.31 a ASA, accessible surface area; Nonrot, nonrotameric residues; Isol Nonrot, isolated nonrotameric residues not in clusters; Cls 2, the nonrotamers in all clusters; Cls 3 and Cls 4, nonrotameric side chains in clusters of three or more and four or more residues, respectively. b N, the number of protein structures for which a ratio is defined. c The mean Z-scores are given together with standard error numbers calculated over Z-scores from individual. For all groups, each individual protein Z-score was compiled as the average over the side-chain thermal factor Z-scores, determined by using the mean and standard deviation of buried rotameric side chains (see Materials and Methods). 10% (ASA 5Å 2 ) and 13% (ASA 10 Å 2 ) higher than those for rotameric side-chain atoms, which suggests that nonrotameric side-chain atoms have a higher degree of vibrational motion and energy and, thus, would be destabilizing. This notion is supported by the recent finding by Lovell et al. 40 that 2 angles of Asn residues give rise to clearly definable conformers (vide supra) when sampled in the absence of Asn residues with high thermal factors. However, Table V shows that, when they are clustered, nonrotamers display lower thermal factors than isolated nonrotameric side chains and, particularly in the ASA 10 Å 2 set, the factors were only slightly higher than those in rotameric side groups. Moreover, isolated nonrotameric side chains showed distinctly higher thermal factors than those in clusters, because their average thermal factors were separated by three standard errors (Table V). In the 1.8-Å resolution protein set with ASA 5Å 2, clusters with a minimum of four nonrotameric side chains were associated with elevated thermal factors, albeit they still were lower than those for all nonrotameric residues. This may have been due to the small sample of only five clusters. These results suggest that clustering of nonrotameric residues could be a mode to alleviate their structural strain. Thermal factor Z scores like those presented here over large sets of protein structures were shown recently to be statistically significant, 36 because they alleviate the systematic error component of the factors (see Materials and Methods). Although the error numbers in Table V are relatively high, the consistency over the two data sets is striking. Intracluster Contacts Comparing the internal contacts of nonrotameric clusters with rotameric control clusters, as described in Materials and Methods, we found that side chains in nonrotameric clusters contribute over one-fourth more (0.29) of their contact surface to other cluster side chains than rotameric side chains observed in the 1.8-Å resolution and TABLE VI. Numbers and Fractions for Secondary Structures of Nonrotameric Residues in General and in Clusters Numbers Fractions (%) Group H E T C H E T C Rot 2,641 3,213 254 1,139 36 44 4 16 Nonrot 205 289 34 131 31 44 5 20 Cls 2 64 96 6 40 31 47 3 19 Cls 3 25 43 0 16 30 51 0 19 Cls 4 7 8 0 6 33 38 0 29 The secondary structure designations H, E, T, and C, respectively, refer to helix, strand, reverse turn, and coil. Nonrot, Rot, Cls 2, Cls 3, and Cls 4 are defined in Table V. TABLE VII. Preferences for Secondary Structures of Nonrotameric Residues in General and in Clusters Group a H E T C Nonrot 0.85 0.05 0.99 0.04 1.47 0.25 1.27 0.10 Cls 2 0.85 0.09 1.05 0.08 0.83 0.33 1.24 0.18 Cls 3 0.82 0.14 1.16 0.12 n.a. 1.21 0.27 Cls 4 0.92 0.30 0.86 0.24 n.a. 1.82 0.63 a Group denotes preferences of group side chains compared with the rotameric reference group. n.a., not applicable. For other definitions, see Tables V and VI. ASA 5Å 2 data set. Using an ASA threshold of 10 Å 2 yielded essentially the same result (0.26 more). Optimizing van der Waals contacts in clusters also may be a way to overcome the increased strain of nonrotameric residues. Secondary Structure Preferences and Stability Tables VI and VII show that the fractions and preferences of nonrotameric residues are in particular secondary structures. Nonrotameric residues in general prefer to be in turns or coil, albeit the observed fraction is 25% (Table VI). Clustered nonrotameric residues display an overall preference only for coil structures (Table VII). The small preference value for nonrotameric residues in reverse turns might be due to the associated low sampling number. It is striking that helices and strands are not preferred, despite the completely buried residue sample. Table VIII shows the preferences of all nonrotameric residues and clustered nonrotameric residues within regions of secondary structure elements. The average lengths of the secondary structure types sampled for the regions were similar for the rotameric, all nonrotameric, and clustered nonrotameric residues. Statistics on larger cluster size groups are not presented due to data paucity. Clearly, nonrotameric residues mostly prefer N-terminal sections of helix and coil (Table VIIIa); although, overall, the preferences remain high for all coil sections (Table VIIIb), in accordance with Table VI. Clustered nonrotameric residues show biases similar to those of nonrotamer residues in general. It is interesting that the bias for N-terminal helix segments that was noted before for side-chain clusters in general 38 persists for nonrotameric residues. It would be interesting to look for other structural features that could be correlated with the observed secondary structure biases. For

NONROTAMERIC SIDE-CHAIN CLUSTERS 39 TABLE VIII. Preferences of Nonrotameric Residues Within Segments of Secondary Structure Elements Relative to Rotameric Residues All nonrotameric residues Clustered nonrotameric residues b Secondary structure N-terminal Middle C-terminal N-terminal Middle C-terminal (a) Preferences normalized per secondary structure H 1.73 0.22 0.83 0.06 1.00 0.16 1.85 0.39 0.78 0.10 1.09 0.30 E 1.07 0.10 0.94 0.07 1.02 0.10 0.93 0.17 1.03 0.12 1.02 0.16 T 0.69 0.19 1.39 0.37 1.10 0.24 1.23 0.50 1.49 0.86 0.45 0.41 C 1.37 0.23 0.88 0.28 0.76 0.17 1.52 0.33 0.70 0.37 0.74 0.25 (b) Overall preferences c H 1.47 0.20 0.70 0.06 0.84 0.14 1.46 0.35 0.62 0.10 0.87 0.25 E 1.09 0.11 0.96 0.08 1.03 0.11 0.98 0.19 1.09 0.15 1.07 0.19 T 1.11 0.37 2.25 0.70 1.78 0.49 1.11 0.63 1.34 0.94 0.41 0.41 C 2.30 0.52 1.47 0.52 1.27 0.35 3.61 1.11 1.65 0.95 1.75 0.70 a Secondary structure designations H, E, T, and C refer, respectively, to helix, strand, reverse turn, and coil. Designation of residues within the N-terminal, middle, and C-terminal fragments of individual secondary structural elements is explained in Materials and Methods. b Preferences for reverse turn (T) have not been included, because the numbers of sampled nonrotameric residues in clusters were only 3, 2, and 1 for N-terminal, middle, and C-terminal, respectively. c Note that the overall preferences for N-terminal, middle, and C-terminal segments of each secondary structure do not necessarily add up to the preferences given in Tables VI and VII, because sampling frequencies can differ due to minimum lengths imposed on the secondary structures when sampling their segments (see Materials and Methods). TABLE IX. Average Z-Scores of Crystallographic Temperature Factors of Atoms in Secondary Structure Elements for the 1.8-Å Resolution, Accessible Surface Area I5Å 2 Protein Data Set Secondary structure Number of Rotameric side chains Average Z-score Nonrotameric side chains Average Z-score Clustered nonrotameric side chains Average Z-score H 195 0.07 0.03 90 0.28 0.10 35 0.29 0.17 E 190 0.08 0.03 109 0.29 0.13 41 0.19 0.15 T 111 0.08 0.08 30 0.11 0.17 6 0.62 0.08 C 184 0.11 0.04 71 0.20 0.12 24 0.14 0.14, the number of in which one or more side chains in the associated secondary structure were observed. The Z-scores were derived from the mean and standard deviation of normalized temperature factors for each individual protein, calculated from side-chain atoms of buried rotameric residues, irrespective of secondary structure (see Materials and Methods). Standard errors were compiled over the thus obtained Z-scores from each individual protein. example, Wan and Milner-White 42,43 found that Asp, Asn, Ser, and Thr side chains in helix N-termini frequently are involved in two-hydrogen bond formation. In an earlier study, Doig et al. 44 observed specific hydrogen bonding patterns in helix N-termini for those four residues. Unfortunately, scrutinizing nonrotameric composition effects was not possible for most secondary structure regions due to the sparse data for nonrotameric residues when grouped according to secondary structure region and residue type. However, it is noteworthy with respect to the abovementioned side chain-backbone hydrogen bonding, which may well be correlated with the helix N-terminal nonrotameric bias, thatasn is relatively underrepresented as nonrotamer in helix N-termini, whereas Ser becomes relatively frequent. The statistics in Table VIII for reverse turn (T) segments show high error estimates and should be viewed with caution, because they were derived from critically low sampling numbers: The numbers of nonrotamers (and those in clusters) in N-terminal, middle, and C-terminal turn segments, respectively, were 9 (3), 10 (2) and 13 (1). Table IX shows average Z scores of temperature factors of side-chain atoms in the four secondary structure elements for rotameric, nonrotameric, and clustered nonrotameric residues: Statistics on larger cluster groups are not given due to scant data. Although the trends are generally weak, it is noticeable that the ranking of the temperature factors over the secondary structures does not correspond between rotameric and nonrotameric residues; for example, the average B factors are the lowest for rotameric side chains in strands whereas they are the highest for nonrotamers. Furthermore, the Z scores observed for nonrotameric residues show a trend with their secondary structure preferences (Table VI); the two most preferred secondary structures, turn and coil, show the lowest relative increase in average temperature factor compared with rotameric constituents. The clustered nonrotameric residues show the same trend: The lowest values are observed for coil and dramatically so for turn. The cluster values are low compared with nonrotamer residues in general, except for helix. Temperature-factor Z scores for secondary structural fragments are given in Table X only for helix and strand due to data sparseness. There is a clear pattern in the

40 J. HERINGA AND P. ARGOS TABLE X. Average Z-Scores of Crystallographic Temperature Factors for Secondary Structure Fragments in the 1.8-Å Resolution, Accessible Surface Area I5Å 2 Set Secondary structural fragment Rotameric side chains Average Z-score Nonrotameric side chains Average Z-score Clustered nonrotameric side chains Average Z-score H N-terminal 122 0.00 0.06 36 0.05 0.13 13 0.20 0.21 H middle 166 0.05 0.04 59 0.37 0.14 22 0.25 0.23 H C-terminal 122 0.26 0.06 25 0.57 0.29 10 0.89 0.41 E N-terminal 158 0.13 0.04 50 0.22 0.13 17 0.33 0.24 E middle 156 0.12 0.03 57 0.36 0.17 21 0.33 0.27 E C-terminal 150 0.10 0.04 53 0.15 0.22 20 0.03 0.17 Average Z-scores were calculated as for Table XI., the number of in which one or more side chains in the associated secondary structure fragment were observed. Fig. 4. Illustration of a five-residue, nonrotameric cluster in the oligopeptide binding protein from Salmonella typhimurium (2olb chain A). Cluster constituent side chains are Leu297A, Arg299A, Ile302A, Trp382A, and Val388A. The side-chain atoms are depicted by colored spheres. The main-chain stretch running from the residue sequentially preceding the N-terminal cluster constituent by two residues to that positioned two residues beyond the C-terminal cluster constituent is colored in blue (residues 295 390). The figure was drawn by using RasMol 46 with standard amino acid-type coloring. temperature factors for the three helical fragments: The N-terminal helical nonrotameric side chains showed the lowest B factors, whereas those in C-terminal helical positions yielded the greatest values. This is especially salient for clustered nonrotamers, because B factors for C-terminal helical nonrotameric side chains, on average, are 28% greater than those for the helical N-termini, a dramatic difference. Thus, the high preference of nonrotameric residues for helix N-terminal fragments (Table VIII) is associated with distinctly low temperature factors, also relative to the listed thermal factors for rotameric side chains. The average Z scores for strand fragments are more balanced for nonrotamers, in line with the compositional preferences (Table VIII). Although the trend is weak, Table X shows the highest strand thermal value for nonrotameric residues in general in the strand middle section, which is the strand region most avoided by nonrotameric residues (Table VIII). Clustered side chains in strand regions show no such opposite trend between thermal Z scores and residue preferences. Cluster Example Figure 4 depicts the largest strained nonrotameric cluster in the 1.8-Å, ASA 5Åprotein set, which consists of five side chains and occurs in the Salmonella typhimurium oligopeptide-binding protein (2olb). The cluster connects two helices with a long intervening loop laying over the surface of the remaining structure: Arg299A and Ile302A (at positions 1 and 4), from a six-residue helix preceded by a coiled, N-capping residue Leu297A (position 1 of two coil residues), interact with Trp382A (position 14 of 18 helical residues) C-capped by Val388A (position 2 of two coiled