Abstract. SMITH, BRANDYE M. The Application of Single-Pass Attenuated Total Reflectance

Size: px
Start display at page:

Download "Abstract. SMITH, BRANDYE M. The Application of Single-Pass Attenuated Total Reflectance"

Transcription

1 Abstract SMITH, BRANDYE M. The Application of Single-Pass Attenuated Total Reflectance Fourier-Transform Infrared Spectroscopy for Protein Analysis (under the direction of Stefan Franzen) The application of single-pass attenuated total reflection Fourier-transform infrared (ATR-FT-IR) spectroscopy using an internal reflectance element was investigated for the following; 1) secondary structure protein analysis, 2) protein secondary structure prediction, 3) amino acid classification, 4) peptide conformational studies, and 5) the investigation of protein binding interactions. The goal of this research was the validation of the single-pass ATR-FT-IR method and to explore applications of the method. Once validated, several applications will be discussed. Since this research is the first reported application of single-pass ATR-FT-IR for protein analysis, the method was validated using transmission FT-IR and multi-pass ATR-FT-IR as referee methods. The single-pass ATR-FT-IR technique was advantageous since the single-pass geometry permitted rapid secondary structure analysis on small volumes of protein in solution without the use of demountable thin pathlength sample cells. Moreover, the fact that backgrounds were small allowed the simultaneous observation of the polyamide bands of the protein without having to perform subtraction. A comparison of replicate protein spectra indicated that the single-pass ATR-FT-IR method yielded more reproducible data than those acquired by transmission FT-IR. The observed trends for the amide I-III, and A bands obtained by single-pass ATR-FT-IR agreed with those in the literature for conventional transmission FT-IR.

2 Principle component regression (PCR) was applied to a spectral library of proteins in solution acquired by single-pass ATR-FT-IR spectroscopy to predict the secondary structure content, principally α-helical and the β-sheet content, of proteins within a spectral library. Quantitation of protein secondary structure content was performed as a proof of principle that use of single-pass ATR-FTIR was an appropriate method for protein secondary structure analysis. An inside model space bootstrap and a genetic algorithm (GA) were successfully used to improve prediction results. Three spectral libraries are presented where one was better suited for β-sheet content prediction, the other for α-helix content prediction, and the third for simultaneous β-sheet content and α-helix content prediction. The validation results using these three methods yielded an average absolute error of 1.6% for α-helix content prediction and an average absolute error of 1.7% for β-sheet content prediction. Single-pass ATR-FT-IR and normal mode analysis was employed to study amino acids and their role in the spectra of peptides. An amino acid identification model based upon Mahalanobis distance yielded 100% correct classification for all 20 amino acids used in this study. Conformational analysis of peptides that are functionally relevant yields critical information on interactions in native proteins that affect folding, targeting and processing. Such studies are also crucial for the analysis of bioconjugates used in drug delivery. Comparisons of peptides in free solution and those associated with proteins were made via single-pass ATR-FT-IR spectroscopy. It was established that the singlepass ATR method has a sufficient signal-to-noise ratio to obtain difference spectra for

3 peptides in solution and peptides conjugated or bound to proteins. Studies showed that protein-associated peptides differ in conformation from those in free solution. To investigate protein-binding interactions a Ge internal reflectance element was modified for the specific binding of histidine-tagged biomolecules. The modification of the surface was monitored via single-pass ATR-FT-IR. Initially, 7- octenyltrimethoxysilane (7-OTMS) attached to the Ge crystal via surface hydroxyl groups. The vinyl moiety of 7-OTMS was oxidized to a carboxylic acid, which was functionalized by 1,1 -carbonydiimidazole (CDI) to produce a labile imidazole ester. The labile imidazole that resulted from the CDI coupling was then displaced by nitrilotriacetic acid (NTA)-amine. Nickel sulfate was added to the system and coordinated with the three carbonyl groups on the NTA, thus leaving the ability of Ni to coordinate with two adjacent histidine residues. This study was carried out by ATR-FT- IR analysis of the step-by-step surface modification. Successful binding of his-tagged dehaloperoxidase (DHP) and his-tagged biotin was observed. The surface modification method presented had minimal non-specific binding, the Ni-NTA surface was re-usable if stored properly, and complete removal of the organic surface was achievable.

4 The Application of Single-Pass Attenuated Total Reflectance Fourier- Transform Infrared Spectroscopy for Protein Analysis Brandye M. Smith A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy Physical and Mathematical Sciences Department of Chemistry North Carolina State University Raleigh, NC /07/2002 APPROVED BY: Chair of Advisory Committee - Chair of Advisory Committee

5 Biography I was born in Wilson, NC on June 9, 1973 to Michael and Patricia Smith. My academic career began at East Carolina University in August I received a B.S. in chemistry and a BS.B. in biochemistry in December While an undergraduate, I did research under the direction of Dr. Bill Church (neurochemistry) for two semesters, Dr. Bob Morrison (theoretical Chemistry) for two semesters, and Dr. Paul Gemperline (analytical Chemistry) for one summer. I also completed a summer internship at Burroughs Wellcome, a pharmaceutical company, prior to graduation. I began the graduate program at East Carolina University in August 1995 and joined Dr. P Gemperline s group, which specialized in analytical chemistry and chemometrics. While working on my Masters degree, I was employed as a teaching assistant, a research assistant, and an instructor. In addition to my employment at East Carolina University, I also worked part-time at Metrics, Inc., a contract pharmaceutical laboratory followed by a full-time position at H&A Scientific Inc., a software development company. I graduated with a Masters in December 1997 with a concentration in chemometrics. My thesis was titled Application of the Bootstrap and the Genetic Algorithm to Pattern Recognition. Afterwards, I continued to work full-time at H&A Scientific Inc. and taught a general chemistry lab at East Carolina University until I began the Ph.D. program at North Carolina State University in August I joined Stefan Franzen s research group, which specializes in biophysical chemistry, in December ii

6 Table of Contents Page LIST OF TABLES... vi LIST OF FIGURES.. viii 1. Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in Solution Introduction Secondary Structure of Proteins Current Methods for Determining Protein Secondary Structure Introduction to FT-IR Spectroscopy Determination of Protein Secondary Structure via FT-IR Spectroscopy versus D 2 O as a Solvent Attenuated Total Reflection (ATR) FT-IR Spectroscopy Theory Multi-Pass ATR-FT-IR Single-Pass ATR-FT-IR Experimental Details Results and Discussion Conclusions References Single-Pass Attenuated Total Reflection Fourier-Transform Infrared Spectroscopy for the Prediction of Protein Secondary Structure Introduction Assignment of Protein Secondary Structure Experimental Details Principal Component Analysis Principal Component Analysis: Mathematical Details Wavelength Selection Principal Component Regression Partial Least Squares The Bootstrap 74 iii

7 2.2.5 The Genetic Algorithm Results and Discussion Conclusions References Investigation of Amino Acids and Their Role the Single-Pass ATR-FT-IR Spectra of Peptides and Proteins Introduction Pattern Recognition Mahalanobis Distance Experimental Details Single-Pass ATR-FT-IR Analysis Normal Mode Analysis Amino Acid Classification Routine Results and Discussion Experimental and Theoretical Amino Acid Spectra Classification Results Conclusions References Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Conformational Analysis of Peptides Introduction Procaspase BSA Bioconjugates Experimental Details Results Procaspase BSA-Peptide Bioconjugates Conclusions References Surface Modification of a Germanium Attenuated Total Reflectance Element Introduction..136 iv

8 5.2 Experimental Details Results and Discussion Conclusions References Appendices Matlab code used to classify the amino acids, classify.m Matlab code used to perform the inside model space bootstrap, ims_boot.m Matlab code to perform the genetic algorithm, ga_v3.m Matlab code used to calculate the probability densities and distances of samples within the classify program, pr_d.m Matlab code used to calculate the Mahalanobis distance, mahald_2.m Matlab code used to perform principal component analysis, pca_svd.m Matlab code used to perform the crossover procedure within the genetic algorithm, x_over_v3.m Matlab code to calculate the percent classification within the classify program, prctc.m Matlab code to perform principal component regression within the genetic algorithm, pcr_fast2.m Plots of experimental amino acid single-pass ATR-FT-IR spectra and theoretical amino acid spectra Eigenvector mapping results 189 v

9 List of Tables Page Chapter 1 Table 1.1 Table downloaded from the Protein Data Bank website ( in order to illustrate the number of protein structures available in the library.6 Table 1.2 Table downloaded from the Protein Data Bank website ( in order to illustrate the number of protein structures available in the library and the corresponding method used for secondary structure determination...7 Table 1.3 The nine characteristic absorption bands for proteins...14 Table 1.4 A list of all the proteins used in this research 28 Table 1.5 A comparison of multi-gaussian fits for six sets of lysozyme spectra..34 Chapter 2 Table 2.1 Proteins used in this study as well as the number of single-pass ATR-FT- IR spectra included in the spectral library for each protein...53 Table 2.2 Illustration of the variance associated with each eigenvalue.60 Table 2.3 Initial prediction results using both PCR and PLS 80 Table 2.4 The two separate PCR methods that yielded the best validation results for α-helix prediction and β-sheet prediction, respectively 83 Table 2.5 Training set predictions results for the third spectral library and the third set of regression parameters...87 Table 2.6 Test set predictions results for the third spectral library and the third set of regression parameters.88 Table 2.7 Validation set predictions results for the third spectral library and the third set of regression parameters...89 Chapter 3 Table 3.1 Example indicating how a sample is assigned to a given class based upon its probability density Table 3.2 Amino acids and peptides used in this study as well as the number of single-pass ATR-FT-IR spectra included in the amino acid spectral library Table 3.3 Example of a confusion matrix vi

10 Chapter 4 Table 4.1 Listing of all the peptide sequences used in the BSA-peptide bioconjugates study.123 Table 4.2 Secondary structure results for BSA, constructs (#3, #5, #7, #10, and #11), and peptides (#3, #5, #7, #10, and #11) using the α-helix and β-sheet prediction methods detailed in Chapter Appendix Table 6.1 Eigenvector mapping results for alanine..189 Table 6.2 Eigenvector mapping results for arginine 190 Table 6.3 Eigenvector mapping results for the asparagine..195 Table 6.4. Eigenvector mapping for aspartic acid 198 Table 6.5 Eigenvector mapping results for cysteine 202 Table 6.6 Eigenvector mapping results for glutamine. 204 Table 6.7 Eigenvector mapping results for glutamic acid Table 6.8 Eigenvector mapping results for glycine.212 Table 6.9 Eigenvector mapping results for histidine Table 6.10 Eigenvector mapping results for isoleucine. 217 Table 6.11 Eigenvector mapping results for leucine..222 Table Eigenvector mapping results for lysine 225 Table Eigenvector mapping results for methionine Table Eigenvector mapping results for phenylalanine Table Eigenvector mapping results for proline..239 Table Eigenvector mapping results for serine 242 Table Eigenvector mapping results for threonine..244 Table Eigenvector mapping results for tryptophan 248 Table Eigenvector mapping results for tyrosine 254 Table Eigenvector mapping results for valine Table Eigenvector mapping results for charged aspartic acid Table Eigenvector mapping results for charged glutamic acid..263 Table Eigenvector mapping results for charged lysine..267 vii

11 List of Figures Page Chapter 1 Figure 1.1 The partial double-bond character of a peptide bond is illustrated..2 Figure 1.2 Illustration of the phi (φ) and psi (ψ) dihedral angles in a protein...2 Figure 1.3 A Ramachandran plot for cytochrome C where the upper left portion represents β-structure and the lower left portion represents the α-helical structure found within the protein...3 Figure 1.4 Illustration of a α-helix, a β-sheet, and the extended conformation 4 Figure 1.5 (a) An illustration of the hydrogen bonding that stabilizes a α-helix. (b) An illustration of a parallel β-sheet and the hydrogen bonding that stabilizes the structure 5 Figure 1.6 Representation of Hooke s Law where two atoms are connected by a spring..10 Figure 1.7 Representation of the harmonic oscillator approximation.11 Figure 1.8 Representation of the Morse oscillator (red) compared to the harmonic oscillator (blue)..12 Figure 1.9 An illustration of the mid-infrared spectrum of a protein where the amide I-III, A, and B bands are displayed 15 Figure 1.10 Transmission FT-IR spectrum of H2O using a 12.5-micron spacer in between two CaCl2 windows.16 Figure 1.11 An illustration of a transmission FT-IR sample cell. The cell used in this research consisted of two CaF 2 windows and either a 12.5 or 6-micron spacer enclosed with a copper casing...17 Figure 1.12 FT-IR spectrum of H2O (blue), D2O (red), and 50:50 H2O:D2O (green) illustrating the limitations that exist for either solvent since the strong D-O-D, H-O-D, and H-O-H bending modes obscures the spectral regions from cm-1, cm-1, and cm-1, respectively...18 Figure 1.13 The transmission FT-IR spectrum of hemoglobin acquired using a micron spacer, CaF 2 windows, and D 2 O as a background.19 Figure 1.14 (a) Transmission FT-IR Spectrum of hemoglobin in D 2 O solution revealing only Amide I. (b) Transmission FT-IR Spectrum of hemoglobin in solution revealing only Amide II. Thus when using transmission mode FT-IR with a viii

12 spacer greater than 6 microns, Amide I and II can only be observed if the protein is ran in both D 2 O and.20 Figure 1.15 Illustration of how polarized light travels through an ATR crystal and is totally internally reflected when θ is greater than the critical angle,θ critical...22 Figure 1.16 Red represents the multi-pass ATR-FT-IR spectrum of 3mM myoglobin whereas blue represents the multi-pass ATR-FT-IR spectrum of.24 Figure 1.17 The geometry of the single-pass attenuated total reflection cell is shown. The protein sample is placed in a depression below the Ge crystal..26 Figure 1.18 An example of spectral enhancement upon protein gel formation is displayed. The spectra shown are of the protein pepsin from a liquid sample (blue), to an intermediate state (green), to a gel state (red) 29 Figure 1.19 Displayed are the spectra of the protein chymotrypsinogen from protein in solution (blue), to an intermediate state (green), to a gel state (red). These spectra signify the ability to observe any protein denaturation that occurs as a function of gel formation. The amide I position of the hydrated state spectrum is different of that of the intermediate and gel states. Such a significant shift suggests protein denaturation.30 Figure 1.20 Displayed are the spectra of the protein chymotrypsin from protein in solution (blue), to an intermediate state (green), to a gel state (red). These spectra signify the ability to observe any protein denaturation that occurs as a function of gel formation. The amide I position of gel state spectrum is not different of that of the fully hydrated and intermediate states suggesting that no protein denaturation has occurred.31 Figure 1.21 The mid-frequency ATR-FT-IR spectra of six samples of lysozyme are displayed where the blue dotted spectra represent the first lot of lysozyme and the red solid spectra represent the second lot of lysozyme. These spectra were used for a reproducibility study and represent the data given in Table Figure 1.22 Spectra of the protein hemoglobin in where the blue spectrum is hemoglobin in the gel state and the red spectrum is hemoglobin in a hydrated state but with subtracted from the spectrum. This represents the argument presented that no subtraction is needed for protein spectra that are in the gel state as acquired via single-pass ATR-FT-IR.37 Figure 1.23 Mid-infrared spectra of hemoglobin in where the green spectrum was acquired by the conventional transmission FT-IR method, the blue spectrum was acquired by multi-pass ATR-FT-IR using a subtraction algorithm, and the red spectrum was acquired by the single-pass ATR-FT-IR method without using a subtraction algorithm. This demonstrates that the single-pass ATR-FT-IR method yields enhanced spectral information relative to transmission FT-IR and multi-pass ATR-FT-IR.38 Figure 1.24 An enlargement of the amide I region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by ix

13 the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum Figure 1.25 An enlargement of the amide II region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum...40 Figure 1.26 An enlargement of the amide III region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum...41 Figure 1.27 An enlargement of the amide A and B region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum. 42 Chapter 2 Figure 2.1 The structure of lysozyme downloaded from the Protein Data Bank (PDB).50 Figure 2.2 The amide I band in the mid-infrared spectrum of the protein lysozyme. This figure represents the contributions from various secondary structures.51 Figure 2.3 Schematic of digitized spectral data..57 Figure 2.4 Representation of the singular value decomposition function for a (25 x 250) data matrix 59 Figure 2.5 The first plot is overly trained, whereas the second plot is ideally trained 62 Figure 2.6 Schematic of one point representing the data of an object 66 Figure 2.7 Clustering of the 25 objects in the given example.67 Figure 2.8 Representation of the first principal component..68 Figure 2.9 Description of a principal component score..69 Figure 2.10 Representation of how the second principal component is found.70 Figure 2.11 Representation of two parent binary strings undergoing crossover at two positions to produce two offspring 76 Figure 2.12 Plot of the single-pass ATR-FTIR spectra of the proteins concanavalin A (solid) and cytochrome C (dashed) at the GA optimized wavenumbers Figure 2.13 Plot of the single-pass ATR-FTIR spectra of the proteins pepsin (solid) and lactoglobulin (dashed) at the GA optimized wavenumbers 85 x

14 Figure 2.14 Plot of the single-pass ATR-FTIR spectra of the proteins hemoglobin (solid) and papain (dashed) at the GA optimized wavenumbers...87 Chapter 3 Figure 3.1 Projection of a point in a higher dimensional space onto a 2-dimensional sub-space 100 Figure 3.2 Scatter plot of principal component 1 versus principal component 2 for the amino acid alanine where each circle in the plot represents a single-pass ATR-FTIR spectrum projected into a two dimensional subspace. The red ellipse represents the 95% confidence interval and the blue ellipse represents the 99% confidence interval..101 Figure 3.3 The black dashed single-pass ATR-FT-IR spectrum represents the dipeptide alanine-aspartic acid and the red solid spectrum represents the first principal component in the ALA-ASP, ALA, and ASP data set Figure 3.4 Black dashed single-pass ATR-FT-IR spectrum represents alanine and the blue solid spectrum represents the second principal component in the ALA-ASP, ALA, and ASP data set..113 Figure 3.5 The black dashed single-pass ATR-FT-IR spectrum represents aspartic acid and the magenta solid spectrum represents the third principal component in the ALA-ASP, ALA, and ASP data set 114 Figure 3.6 The black dashed single-pass ATR-FT-IR spectrum represents the tripeptide phenylalanine-glycine-glycine and the red solid spectrum represents the first principal component in the PHE-GLY-GLY, PHE, and GLY data set..115 Figure 3.7 Black dashed single-pass ATR-FT-IR spectrum represents phenylalanine and the blue solid spectrum represents the second principal component in the PHE-GLY- GLY, PHE, and GLY data set.116 Figure 3.8 The black dashed single-pass ATR-FT-IR spectrum represents glycine and the magenta solid spectrum represents the third principal component in the PHE-GLY- GLY, PHE, and GLY data set.116 Chapter 4 Figure 4.1 An illustration of procaspase-3, the pro-less variant, the pro-peptide, and the pro-peptide/pro-less variant mixture used in this study.120 Figure 4.2 A schematic of the BSA-peptide bioconjugates preparation where 3- maleimido benzoic acid N-hydroxysuccinimide ester (MBS) is added to BSA and is followed by the addition of a cysteine-terminated peptide..122 Figure 4.3 The blue solid single-pass ATR-FT-IR spectrum represents procaspase-3, the red dashed spectrum represents the pro-less variant, and the green dotted spectrum represents the pro-peptide 124 xi

15 Figure 4.4 The addition of the pro-less variant and the pro-peptide spectra is given as the dotted orange spectrum whereas the magenta dashed spectrum is of the mixture (proless variant and pro-peptide). The solid blue spectrum is procaspase Figure 4.5 The dotted red spectrum represents the difference between procaspase-3 and the pro-less variant. This spectrum represents the bound pro-peptide whereas the solid orange spectrum is of the pro-peptide in free solution 126 Figure 4.6 The magenta structure is human serum albumin (HSA) obtained from the Protein Data Bank (PDB ID 1BM0) and the blue structure is a homology model of bovine serum albumin (BSA)..128 Figure 4.7 The green dotted single-pass ATR-FT-IR spectrum represents construct 3, the red solid spectrum represents BSA, the blue dashed spectrum represents peptide 3, and the purple dashed spectrum represents the difference between construct 3 and BSA..130 Figure 4.8 The green dotted single-pass ATR-FT-IR spectrum represents construct 5, the red solid spectrum represents BSA, the blue dashed spectrum represents peptide 5, and the purple dashed spectrum represents the difference between construct 5 and BSA..130 Figure 4.9 The green dotted single-pass ATR-FT-IR spectrum represents construct 10, the red solid spectrum represents BSA, the blue dashed spectrum represents peptide 10, and the purple dashed spectrum represents the difference between construct 10 and BSA..131 Figure 4.10 The green dotted single-pass ATR-FT-IR spectrum represents construct 11, the red solid spectrum represents BSA, the blue dashed spectrum represents peptide 11, and the purple dashed spectrum represents the difference between construct 11 and BSA.132 Figure 4.11 The green dotted single-pass ATR-FT-IR spectrum represents construct 7, the red solid spectrum represents BSA, the blue dashed spectrum represents peptide 7, and the purple dashed spectrum represents the difference between construct 7 and BSA..133 Chapter 5 Figure 5.1 A schematic of the Ge ATR element used in this experiment. a) Representation of the Ge ATR element as it appears in the single-pass ATR-FT-IR microscope. The red lines indicate the infrared beam and the black arrow represents the evanescent wave that penetrates the sample. The blue corresponds to the Teflon holder designed to keep sample in contact with the Ge crystal while minimizing evaporation. b) Representation of the stainless steel holder that slides into the microscope where the white signifies the Ge crystal Figure 5.2 a) Scheme for the attachment of 7-octenyltrimethoxysilane to the Ge surface. b) Scheme illustrating the oxidation of the double bond to produce a carboxylic acid. c) Representation of CDI coupling to the carboxylic acid group xii

16 Figure 5.3 a) Red represents the single-pass ATR-FT-IR spectrum of 7- octenyltrimethoxysilane on the Ge surface, whereas blue represents a reference singlepass ATR-FT-IR spectrum of bulk 7-octenyltrimethoxysilane. b) Green represents the oxidation of 7-octenyltrimethoxysilane on the Ge surface, whereas red represents the spectrum of 7-octenyltrimethoxysilane on the Ge surface as shown in Figure 5.3a. c) Purple represents the spectrum of carbonyldiimidazole (CDI) on the surface, whereas blue represents a reference spectrum of bulk CDI. The green spectrum represents the oxidized 7-octenyltrimethoxysilane as shown in Figure 5.3b.143 Figure 5.4 a) Scheme for the displacement of imidazole with NTA-amine. b) Illustration of the coordination of Ni to NTA and to two adjacent histidine residues.145 Figure 5.5 Red represents the spectrum of NTA-amine on the surface, the blue represents a reference spectrum of bulk NTA-amine, and the orange spectrum represents Ni-NTA. The purple spectrum represents the CDI layer as shown in Figure 5.3c.147 Figure 5.6 The red represents the spectrum of his tagged DHP attached to the Ni- NTA surface, the blue represents a spectrum of his-tagged DHP after a 20-mM imidazole wash to remove non-specifically bound DHP, and the green represents a DHP. The DHP spectrum (green) was ran to determine if non-specific binding of DHP was taking place. Observing the green spectrum it is obvious that there is a signal in the amide I region but the line-shape is not consistent with that of a protein. The signals observed in this control were due to the absorbance of buffer, which is particularly evident in the region from 900 to 1200 cm Figure 5.7 a) The red spectrum represents his tagged DHP attached to the Ni-NTA surface and the blue spectrum represents his tagged DHP attached to the Ni-NTA surface after washing the surface with 20-mM. The function of the 20-mM imidazole rinse was to remove any non-specific bound his-tagged DHP to the Ni-NTA surface. b) Both spectra from a are included along with a reference spectrum of his-tagged DHP (purple) and of histidine (green). Both of the reference spectra were scaled 149 Figure 5.8 a) The red represents the spectrum of his tagged biotin attached to the Ni- NTA surface, the blue represents a reference spectrum of his-tagged biotin, and the green represents a reference spectrum of histidine. Both of the reference spectra were scaled. b) The red dashed spectrum represents immobilized his-tagged biotin and the purple solid spectrum represents avidin that is bound to the his-tagged biotin Appendix Figure 6.1 Red represents the theoretical alanine spectrum, blue represents the theoretical alanine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of alanine..166 Figure 6.2 Red represents the theoretical arginine spectrum, blue represents the theoretical arginine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of arginine 167 xiii

17 Figure 6.3 Red represents the theoretical asparagine spectrum, blue represents the theoretical asparagine in spectrum, and black represents the experimental singlepass ATR-FT-IR spectrum of asparagine 168 Figure 6.4 Red represents the theoretical aspartic acid spectrum, blue represents the theoretical aspartic acid in spectrum, and black represents the experimental singlepass ATR-FT-IR spectrum of aspartic acid.169 Figure 6.5 Red represents the theoretical charged aspartic acid spectrum, blue represents the theoretical charged aspartic acid in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of aspartic acid 170 Figure 6.6 Red represents the theoretical cysteine spectrum, blue represents the theoretical cysteine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of cysteine 171 Figure 6.7 Red represents the theoretical glutamine spectrum, blue represents the theoretical glutamine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of glutamine.172 Figure 6.8 Red represents the theoretical glutamic acid spectrum, blue represents the theoretical glutamic acid in spectrum, and black represents the experimental singlepass ATR-FT-IR spectrum of glutamic acid 173 Figure 6.9 Red represents the theoretical charged glutamic acid spectrum, blue represents the theoretical charged glutamic acid in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of glutamic acid.174 Figure 6.10 Red represents the theoretical glycine spectrum, blue represents the theoretical glycine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of glycine.175 Figure 6.11 Red represents the theoretical histidine spectrum, blue represents the theoretical histidine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of histidine Figure 6.12 Red represents the theoretical isoleucine spectrum, blue represents the theoretical isoleucine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of isoleucine.177 Figure 6.13 Red represents the theoretical leucine spectrum, blue represents the theoretical leucine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of leucine..178 Figure 6.14 Red represents the theoretical lysine spectrum, blue represents the theoretical lysine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of lysine 179 Figure 6.15 Red represents the theoretical charged lysine spectrum, blue represents the theoretical charged lysine in spectrum, and black represents the experimental singlepass ATR-FT-IR spectrum of lysine 180 xiv

18 Figure 6.16 Red represents the theoretical methionine spectrum, blue represents the theoretical methionine in spectrum, and black represents the experimental singlepass ATR-FT-IR spectrum of methionine Figure 6.17 Red represents the theoretical phenylalanine spectrum, blue represents the theoretical phenylalanine in spectrum, and black represents the experimental singlepass ATR-FT-IR spectrum of phenylalanine Figure 6.18 Red represents the theoretical proline spectrum, blue represents the theoretical proline in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of proline..183 Figure 6.19 Red represents the theoretical serine spectrum, blue represents the theoretical serine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of serine 184 Figure 6.20 Red represents the theoretical threonine spectrum, blue represents the theoretical threonine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of threonine..185 Figure 6.21 Red represents the theoretical tryptophan spectrum, blue represents the tryptophan in spectrum, and black represents the experimental single-pass ATR-FT- IR spectrum of tryptophan Figure 6.22 Red represents the theoretical tyrosine spectrum, blue represents the tyrosine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of tyrosine 187 Figure 6.23 Red represents the theoretical valine spectrum, blue represents the valine in spectrum, and black represents the experimental single-pass ATR-FT-IR spectrum of valine 188 xv

19 Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in Solution 1.1 Introduction The interpretation of genetic data is dependent upon the ability to establish the structure and function of a vast number of proteins known only by their primary sequence. Large scale screening of proteins to identify structural motifs relies on the development of techniques that rapidly provide information on protein secondary structure. Such techniques can be combined with primary sequence information and homology modeling to obtain threedimensional structures. (1, 2) During the course of the last thirty years, Fourier-transform infrared spectroscopy (FT-IR) has been developed as a procedure for protein secondary structure determination. (3-12) Proteins have nine characteristic absorption amide bands, labeled amide A, B, and I-VII, in the mid-infrared that can be interpreted in terms of structure but presently, secondary structure determination is largely based on just the examination of amide I as acquired by transmission FT-IR or multi-pass attenuated total reflectance (ATR)-FT-IR. (3, 5, 8-11, 13, 14) Transmission FT-IR and multi-pass ATR-FT- IR have disadvantages associated with solvent interference and sample cell assembly. This study introduces the use of single-pass ATR-FT-IR to overcome such disadvantages as well as to maximize spectral information and to improve the efficiency of protein analysis Secondary Structure of Proteins Protein secondary structure refers to the symmetrical and repeating organization of amino acid residues in a polypeptide chain. (15, 16) The peptide bond is rather rigid due to 1

20 its partial double-bond character; however, rotation is allowed along the N-C α and C α -C bond. (15) O O δ - O - C N C δ + N C N + H H H Figure 1.1. The partial double-bond character of a peptide bond is illustrated. The two angles described by these rotations are referred to as dihedral angles. The phi (φ) dihedral angle describes the rotation about the N-C α bond and the psi (ψ) dihedral angle describes the rotation about the C α -C bond. H R O C C α C α N φ ψ H C O H N Figure 1.2. Illustration of the phi (φ) and psi (ψ) dihedral angles in a protein 2

21 The secondary structure of a protein is assigned by the values of the φ and ψ angles. Due to the rigidity of the peptide bond, the possible types of secondary structure are limited to α- helical and β-sheet motifs. Additional types of secondary structure include extended conformation and β-turns that connect the ends of two adjacent β-sheets; otherwise, all other structure is assigned as random coil. (15) Construction of a Ramachandran plot, where the φ angles are along the x-axis and the ψ angles are along the y-axis, indicates the amount of α- helix and β-sheet within a protein. For example, a right-handed α-helix yields a φ angle of 60û and a ψ angle from -45û to 50û, whereas a left-handed α-helix has a φ angle from 45û to 60û and a ψ angle from 15û to 105û. (15, 17) A β-sheet has a φ angle from -45û to -180û and a ψ angle from 15û to 180û. (15, 16) Psi Angle Phi Angle Figure 1.3. A Ramachandran plot for cytochrome C where the upper left portion represents β-structure and the lower left portion represents the α-helical structure found within the protein. 3

22 Figure 1.4. Examples of the α-helix, the β-sheet, and the extended conformation are given in α-helix β-sheet extended conformation Figure 1.4. Illustration of a α-helix, a β-sheet, and the extended conformation The α-helix is a tightly bound structure with 3.6 amino residues per turn where the amino side chains project to the exterior of the structure (refer to Figure 1.5a). In a β-sheet, the polypeptide forms a zigzag conformation where the amino side groups are in an opposite orientation of the zigzagging backbone (refer to Figure 1.5b). β-sheets are further classified as antiparallel or parallel depending upon the direction of the adjacent peptide chain. In a parallel β-sheet, the peptide chains have the same amino-to-carboxyl direction, whereas in an antiparallel β-sheet, the peptide chains have opposite amino-to-carboxyl directions. (16) 4

23 Hydrogen Bonding Hydrogen Bonding Figure 1.5. (a) An illustration of the hydrogen bonding that stabilizes a α-helix. (b) An illustration of a parallel β-sheet and the hydrogen bonding that stabilizes the structure. 5

24 1.1.2 Current Methods for Determining Protein Secondary Structure X-ray diffraction, nuclear magnetic resonance (NMR) spectroscopy, circular dichroism (CD), Raman spectroscopy, and FT-IR spectroscopy are the most common techniques for protein secondary structure determination. X-ray diffraction and NMR are ideal procedures in that they supply site-specific structural information at atomic resolution. (17, 18) When the atomic coordinates of a protein have been determined, the structure is placed in the protein data bank (PDB), a public domain operated by the Research Collaboratory for Structural Bioinformatics that is sponsored by the National Science Foundation, the Department of Energy, and units of the National Institute of Health. Table 1.1. Table downloaded from the Protein Data Bank website ( in order to illustrate the number of protein structures available in the library Year Deposited structures for the year Total available structures Year Deposited structures for the year Total available structures As of mid 2002, the atomic coordinates for 18,339 structures of proteins and protein complexes were available in the PDB. As seen in Table 1.1, not all of the structures 6

25 deposited into the PDB are unique. The overwhelming majority of atomic coordinates in the PDB, in order of the most used to the least used method, are obtained by X-ray diffraction, NMR, and theoretical modeling (as shown in Table 1.2). Table 1.2. Table downloaded from the Protein Data Bank website ( in order to illustrate the number of protein structures available in the library and the corresponding method used for secondary structure determination. Method Proteins, Peptides, & Viruses Protein/Nucleic Acid Complexes X-ray Diffraction 13, NMR 2, Theoretical Modeling Total 16, Although extraordinarily useful, X-ray diffraction and NMR have the disadvantages of sample restrictions and lengthy experiment times. X-ray diffraction requires a protein sample that can be crystallized and NMR is limited to protein samples with molecular weights less than 30-kiloDaltons (kd). (17) Both X-ray crystallography and NMR involve extensive acquisition times as well as complex and time-consuming data analysis, thus, these methods have limited applications. (18) Circular dichroism (CD), Raman spectroscopy, and FT-IR spectroscopy are highly sensitive techniques that are routinely used to rapidly yield insight on protein secondary structure. (17, 18) The method of CD measures the extent to which molecules absorb left and right circularly polarized light. For example, the α-helical motif yields negative bands at 222 and 208 nm and a positive band at 192 nm in the CD spectrum. For β-sheet proteins, negative bands occur at 216 nm and 175 nm, whereas a positive band occurs in the region of nm. (17) There are two types of CD, electronic circular dichroism (ECD) and vibrational circular dichroism (VCD). ECD utilizes the n-π* and π-π* transitions in the far- 7

26 UV to yield a CD spectrum. (18) VCD, on the contrary, measures the change in absorbance for left circularly polarized infrared light and right circularly polarized infrared light, ( A= A L - A R ). (18) A disadvantage to both ECD and VCD is that the signal-to-noise ratio is poor relative to FT-IR spectroscopy. (18) FT-IR and Raman spectroscopy are complementary techniques. Weak bands in the infrared may yield a stronger signal via Raman spectroscopy and vice versa. For instance, the amide III band obtained via FT-IR is much weaker than the band reported by Raman spectroscopy. Although complementary, Raman has the disadvantages of being more sensitive to fluorescence and having the possibility of destroying the sample via the powerful excitation beam. (17) Introduction to FT-IR Spectroscopy The focus of this study is the use of FT-IR for protein secondary structure elucidation. Absorption of infrared absorption produces a change in the vibrational energy level of a molecule. Rotational satellite bands are observed only for molecules in the gas state since rotational motion is limited in both the liquid state and is completely absent in the solid state. Pure rotational absorption bands are observed in the far-infrared ( cm -1 ) whereas vibrational bands are observed in the mid-infrared (10-4,000 cm -1 ). (19, 20) There are three basic types of vibration that are referred to as stretching modes, bending modes, and torsions. The bending modes are further classified as scissoring and rocking modes, and torsions are further classified as wagging and twisting modes. (19) Equation 1.1, where µ is the permanent dipole and Q is the displacement from the equilibrium bond distance, defines a vibrational transition dipole moment. 8

27 Q χ χ µ = Q M (1.1) When a vibrational transition dipole fluctuates, it results in a field that can interact with infrared radiation but only when the frequency of the infrared radiation is equivalent to the frequency of vibration. (19) The vibrational transition dipole from the ground state to the first excited state, M 01, follows Equation 1.1 where χ 0 is the ground state vibrational wavefunction and χ 1 is the first excited state wavefunction. dq e Q e Q M Q Q α α µ = (1.2) Q e α π α χ = (1.3) Q Qe α α π α χ = (1.4) α µ = Q M (1.5) Considering that each vibrational wavefunction is a Gaussian multiplied by a Hermite polynomial as shown for the v = 0 and v = 1 levels in Eqns. 1.3 and 1.4, respectively. Equation 1.2 is rewritten as Equation 1.5. The frequency of vibration for a diatomic is defined in Equation 1.6 where k is the force constant.

28 m 1 m 2 x Figure 1.6. Representation of Hooke s Law where two atoms are connected by a spring 1 k( m1 + m2 ) v m = (1.6) 2 π m m 1 2 As seen in Equation 1.6, the frequency of vibration is energy independent, thus a change in energy will only affect the amplitude of the vibration and in turn affect the intensity of the absorption band. (19) The idea that atoms are attached by a spring with a force constant, k, is known as Hooke s Law and this idea represents the harmonic oscillator approximation (refer to Figures 1.6 and 1.7), which is a classical approach to vibration. The harmonic oscillator approximation states that there are 3N-5 normal modes for a linear molecule and 3N-6 normal modes for a nonlinear molecule. (20) Thus for the diatomic given in Figure 1.6, there is one normal mode of vibration. 10

29 Energy (cm -1 ) v=2 v=1 v= Internuclear Distance (Å) Figure 1.7. Representation of the harmonic oscillator approximation Applying the harmonic oscillator approximation, the potential energy is given in Equation 1.7 where h is Planck s constant and ν is the vibrational quantum number. 1 E = ( v + ) hv m (1.7) 2 The selection rule is that the change in the vibrational quantum number ν is equal to ± 1. This entails that the vibrational levels are uniformly spaced and that a vibration results in only one absorption. The fundamental transition of a molecule arises when there is absorption of radiation that equals the energy difference in vibrational energy levels 0 and 1. (19) The harmonic oscillator approximation is not accurate since it ignores coulombic 11

30 repulsion and bond dissociation. (19, 21) The inclusion of these terms gives rise to the anharmonic oscillator as illustrated in Figure Energy (cm -1 ) v=2 v=1 v=0 R e D e Internuclear Distance (Å) Figure 1.8. Representation of the Morse oscillator (red) compared to the harmonic oscillator (blue). Two major differences between the harmonic and anharmonic models is that the change in energy becomes smaller with increasing vibrational quantum number and that the selection rule no longer holds for the anharmonic oscillator (for example ν = ± 2 or 3 etc. are possible transitions for the Morse oscillator shown in Figure 1.8). These changes result in the observation of overtones and combination bands in the infrared spectrum. (19, 21) An overtone results in an absorption band two ( ν = ± 2) or three ( ν = ± 3) times the frequency of the fundamental transition and a combination band occurs when infrared radiation interacts with two vibrational modes simultaneously. (19, 21) 12

31 Fourier-transform infrared spectroscopy (FT-IR) has been developed as a procedure for the determination of protein secondary structure since the method offers improved sensitivity, signal-to-noise (S/N) ratios, and wave number precision relative to standard dispersive infrared methods. (3, 12, 16, 22) Dispersive methods utilize a grating to distribute a collimated beam of infrared light. The grating angle is varied with respect to the incident infrared beam allowing energy from spectral resolution elements to reach the detector consecutively. FT-IR, a nondispersive technique, makes use of an interferometer that permits energy from all spectral resolution elements to reach the detector at once. (3, 12) The modifications in instrument design, more specifically the use of an interferometer rather than a monochromator, offer explanation for the improvement in the infrared measurement. One such example is that dispersive instruments employ a slit whereas FT-IR instruments do not. The slit allows only a minuscule range of frequencies to reach the detector and consequentially lowers the throughput of the incident beam. (3) Thus the benefit of using an FT-IR instrument is that the throughput is greater, referred to as the Jacquinot advantage, which ultimately results in better sensitivity. (3, 12) The Fellgett advantage involves the use of an interferometer rather than a monochromator that enables faster acquisition of the infrared spectrum. Thus, more scans can be made within the same amount of time if using an interferometer. Since the S/N ratio is proportional to the square root of the number of scans, the S/N increases as the number of scans increase. (3, 12) The high wavenumber precision of FT-IR spectrometers, known as Conne s advantage, is attributed to the laser reference within the interferometer. (3) The high wavenumber precision enables signal-averaging which in turn increases the S/N ratio. 13

32 1.1.4 Determination of Protein Secondary Structure via FT-IR Spectroscopy Proteins have nine characteristic absorption amide bands, labeled amide A, B, and I- VII, in the mid-infrared that can be interpreted in terms of structure (refer to Table 1.3 and Figure 1.9). (3, 23) Table 1.3. The nine characteristic absorption bands for proteins (23-25) Band Frequency (cm -1 ) Vibration Amide A 3300 N-H stretching Amide B 3100 N-H stretching Amide I Predominately C=O stretching Amide II C-N stretching and N-H bending Amide III Complex in-plane modes, more specifically C-N stretching and N-H bending Amide IV Complex in-plane modes, more specifically O=C-N bending mixed with other modes Amide V Out-of-plane N-H bending Amide VI Out-of-plane C=O bending Amide VII 200 Skeletal torsion 14

33 Figure 1.9. An illustration of the mid-infrared spectrum of a protein where the amide I-III, A, and B bands are displayed. FT-IR has been successfully used for secondary structural analysis largely based on the examination of amide I due to limitations imposed by large solvent bands in both and D 2 O that obscure the other amide regions of the infrared spectrum. (3, 5, 8-11, 13, 23) The limitations imposed by and D 2 O will be discussed in further detail in the following section Transmission FTIR: versus D 2 O as a Solvent is the ideal solvent for biological samples, however when used in transmission FT-IR experiments it causes serious error in the amide I region ( cm -1 ). This is due to the strong absorption of (1640 cm -1 ), which can be 10 3 times greater than that of amide I. (3, 4 Jongh, 1996 #25, 5, 6, 23) In fact, the absorption of masks the absorption 15

34 of protein to such an extent that cell path lengths of <6 µm need to be used when analyzing a protein in solution (as shown in Figure 1.10). (6, 26) 1.5 H-O-H Bending O-H Stretching Absorbance Combination Energy (cm -1 ) Figure Transmission FT-IR spectrum of H2O using a 12.5 micron spacer in between two CaCl2 windows Transmission FT-IR (see Figure 1.11) requires the placement of proteins between two calcium fluoride (barium fluoride, zinc selenide etc.) windows with thin path length spacers (3-25 microns). 16

35 Blank (D 2 O) Compartment Sample Compartment Figure An illustration of a transmission FT-IR sample cell. The cell used in this research consisted of two CaF 2 windows and either a 12.5 or 6-micron spacer enclosed with a copper casing. Even with a path length of 3-6 microns, there is a potential for solvent interference. Solvent interference is evident in the spectrum given in Figure 1.10 where the O-H bending and O-H stretching regions block any light from reaching the detector. Some investigators have overcome the interference by by means of a sample cell with a very short path length. Since spacers are not available with a thickness of less than 6 microns such thin path lengths can only be obtained by means of calcium fluoride windows milled to a depth of 3 microns to allow the sample to be placed in a ultra thin pathlength cell. However, this is expensive and difficult to achieve in practice, as a consequence there is little published work. Limitations in pathlength reduce options for transmission FT-IR experiments since only the amide II region ( cm -1 ) can be reliably obtained when using as a solvent. Due to masking the amide I region, D 2 O has often been used. The deuterium exchanged amide I band and amide II band are referred to as amide I and amide II in the spectroscopy 17

36 literature. There are three significant problems associated with the use of D 2 O; 1) the exchange of proteins into D 2 O for determination of spectra is tedious and can compromise the integrity of the sample, 2) hydrogen atoms within the protein exchange with deuterium over a wide range of time scales unless the protein is fully denatured, (23, 27-29) 3) absorption bands due to H-O-D, D-O-D, and H-O-H are all present upon the introduction of D 2 O (refer to Figures 1.12 and 1.13). (5) Absorbance Energy (cm -1 ) Figure FT-IR spectrum of H2O (blue), D2O (red), and 50:50 H2O:D2O (green) illustrating the limitations that exist for either solvent since the strong D-O-D, H-O-D, and H- O-H bending modes obscures the spectral regions from cm-1, cm- 1, and cm-1, respectively. 18

37 Ab sor ba nce H-O-D masking Amide II' D-O-D masking Amide III' H-O-D masking Amide A' and Amide B' I' D-O-D blocking the detector Energy (cm -1 ) Figure The transmission FT-IR spectrum of hemoglobin acquired using a 12.5-micron spacer, CaF 2 windows, and D 2 O as a background. The exchange of N-H with N-D shifts the amide II band from 1550 to 1450 cm -1 (5, 29) and the presence of H-O-D in solution overlaps the amide II, A, and B regions. Thus, limitations exist for either solvent since the strong D-O-D and H-O-H bending modes obscures the spectral regions from cm -1 and cm -1, respectively. (5) Although in theory, both the amide I band and amide II band can be resolved if spectra in both solvents are obtained (refer to Figure 1.14). (5) In spite of the potential for highly informative spectra, the problems associated with interference from solvent bands combined with involved sample preparation have made transmission FT-IR less attractive as a routine technique for secondary structure determination. (3-6, 28, 30) 19

38 H-O-D Amide I Absorbance Energy (cm -1 ) Amide II Absorbance Energy (cm -1 ) Figure (a) Transmission FT-IR Spectrum of hemoglobin in D 2 O solution revealing only Amide I. (b) Transmission FT-IR Spectrum of hemoglobin in solution revealing only Amide II. Thus when using transmission mode FT-IR with a spacer greater than 6 microns, Amide I and II can only be observed if the protein is ran in both D 2 O and. 20

39 Rather than D 2 O exchange, interference can be minimized by a reduction in sample path length. Specifically, a path length reduction resulting in less than 0.3 A.U. for would result in reliable protein spectra in solution. For example, the spectrum of in Figure 1.10 is worse than the spectrum of given in Figure 1.12; the reason being the latter was acquired by attenuated total reflectance (ATR) FT-IR, which will be discussed in the following section Attenuated Total Reflection (ATR) FT-IR Spectroscopy Attenuated total reflectance (ATR)-FT-IR is the desired method to overcome solvent masking since the penetration depth (refer to Equation 1.10) of infrared light is dependent upon wavenumber. For example, when considering a wavenumber of 1650 cm -1, the corresponding penetration depth is approximately 0.4 µm. (31) Therefore, the effective path length for ATR-FT-IR is sufficiently short to enable the analysis of protein in solution. This in turn has led to the use of algorithms for subtraction of from protein spectra that are provided in the literature. (26, 29, 30) Subtraction of from the protein sample spectrum has provided a means to simultaneously observe the amide I and II bands for the first time. (6) However, there are potential difficulties with subtraction algorithms, particularly artifacts that are caused by the large size of the signals relative to the protein bands. In addition, subtraction algorithms do not account for -protein interactions Theory Attenuated total reflection (ATR)-FT-IR spectroscopy has been used to overcome the sample preparation problems commonly associated with transmission FT-IR. (31, 32) For several decades ATR-FT-IR with a Ge or ZnSe waveguide as the internal reflection element 21

40 (IRE) in a Fourier-transform infrared spectrometer has been used to study the orientation of adsorbed molecules and to quantitate trace solutes in aqueous samples. (32-35) In addition to these applications, the use of ATR-FT-IR spectroscopy presents an alternative means of determining the amide I line shape in. (26, 31, 36) The index of refraction of Ge and ZnSe crystals are sufficiently high that the infrared light is totally internally reflected. ATR Crystal θ Infrared Radiation Figure Illustration of how infrared light travels through an ATR crystal and is totally internally reflected when θ is greater than the critical angle,θ critical. (26) If the angle at which infrared light enters the internal reflective element (also known as an ATR crystal) is greater than the critical angle, then the light will be totally internally reflected. The critical angle is calculated as follows: 22

41 1 n2 θ = critical sin (1.8) n1 where n 1 represents the refractive index of the sample and n 2 is the refractive index of the ATR crystal. The refractive index of germanium is 4.0 and the refractive index of a typical protein solution at 1550 cm -1 is 1.5 (26), thus the critical angle is 22.0º. Referring to Figure 1.15, there are numerous internal reflections, denoted as N, which is calculated by the following equation. lengthatr Crystal N = contangent( θ ) (1.9) thickness ATR Crystal At each internal reflection, an evanescent field penetrates the sample to a depth defined by Equation (26) d p λ = 2 n2 2πn 1 sin θ n (1.10) Considering an incidence angle of 60.0º, the penetration depth of a typical protein at 1550 cm -1 is µm. The evanescent wave interacts with the sample and a spectrum is obtained Multi-Pass ATR-FT-IR In ATR-FT-IR, a multi-pass or a single-pass internal reflection element can be employed. As the names imply, the multi-pass configuration allows for numerous internal reflections whereas the single-pass configuration allows for one internal reflection. Multipass ATR-FT-IR has been frequently used to study the conformation and orientation of 23

42 adsorbed proteins, commonly referred to as protein films. (26, 27, 30, 31) However when using multi-pass ATR-FT-IR, the background absorption from either or D 2 O can still be significant (refer to Figure 1.16). Subtraction of is difficult at best if the absorbance of the bands is larger than 0.5 A.U. Not only are problems encountered due to the nonlinearity of the MCT detector response, but there are also artifacts due to the large size of the signals relative to the protein bands. To circumvent these problems, Harrick developed the single-pass technique a decade ago. (33) Absorbance Energy (cm -1 ) Figure Red represents the multi-pass ATR-FT-IR spectrum of 3mM myoglobin whereas blue represents the multi-pass ATR-FT-IR spectrum of. Transmission FT-IR and multi-pass ATR-FT-IR techniques require a cell to confine the protein solution. The cell geometry in a transmission or multi-pass experiment can change upon disassembly leading to variations in path length, especially in the transmission 24

43 mode. It has even been stated in the literature that an external method for sample loading and unloading needs to be developed to avoid changes in path length and deviations in the angle of incidence. (27) The single-pass technique presented here attempts to provide such a method for sample deposition and recovery Single-Pass ATR-FT-IR To circumvent problems associated with solvent interference and sample cell assembly, the single-pass technique developed by Harrick a decade ago (33) can be used. Single-pass ATR-FT-IR microscopy can be contrasted with current multi-pass ATR-FT-IR methods that employ transparent cells. However, multi-pass ATR-FT-IR experiments are complicated by the use of a cell to confine the protein solution. In a multi-pass experiment, the cell geometry can change upon disassembly leading to variations in the path length and the number of reflections obtained for each sample. Thus, sample-to-sample variability is much greater than in a single-pass configuration. (27) In addition, the IRE in a multi-pass configuration has a larger surface area than in a single-pass configuration. (37) Thus, smaller sample volumes are needed when using the single-pass configuration. Application of singlepass ATR-FT-IR has been employed for the trace analysis of solutes in solution since the technique is robust and requires little sample preparation. (33) The focus of this study is on the use of single-pass ATR-FT-IR for protein analysis, particularly secondary structure determination. The method of single-pass ATR-FT-IR is valuable since it is a rapid technique that does not require protein exchange into D 2 O or sample cell assembly. The single-pass ATR-FT-IR sample geometry shown in Figure

44 exposes the sample to a N 2 environment and allows for the easy deposition and retrieval of protein samples from the Teflon reservoir. Totally Internally Reflected IR Light N 2 Environment Ge Internal Reflective Element Protein Sample in H 2 O Solution Teflon Sample Holder Figure The geometry of the single-pass attenuated total reflection cell is shown. The protein sample is placed in a depression below the Ge crystal. Since the protein is in an N 2 environment, it slowly dehydrates into a concentrated gel state. Rapid scanning can continuously monitor the changes from a fully hydrated state to a concentrated gel state. Concentrating the protein to a gel state yields spectral enhancement such that protein amide bands can be observed simultaneously without performing subtraction. This study marks the first time that single-pass ATR-FT-IR has been reported for solutions of proteins. (38) Therefore, this chapter will concentrate on the validation of the method for protein secondary structure analysis. 1.2 Experimental Details The proteins listed in Table 1.4 were prepared without further purification to a final concentration of approximately 3mM in and D 2 O. In the ATR-FT-IR experimental apparatus, the Ge crystal is at the focus of a Cassagranian objective in a UMA500 26

45 microscope (Digilab). The sample was injected onto a cylindrical sample well that was milled in a Teflon block (refer to Figure 1.17). More specifically, µl of the sample was injected onto the Teflon block using a Wheaton automatic pipette. The protein spectra were recorded at ambient temperature with a resolution of 2 cm -1 and averaged over 64 scans on a Digilab FTS 6000 FT-IR spectrometer equipped with a liquid nitrogen-cooled MCT detector in a single-pass ATR mode and in a multi-pass mode. In both single-pass and multipass mode, the Ge crystal was rinsed with solvent and allowed to dry prior to loading another protein sample. 27

46 Table 1.4. A list of all the proteins used in this research Protein Source Company Protein Purchased From Order Number PDB ID Albumin Bovine Sigma A-7638 Albumin Human Serum Sigma A AO6 Casein Bovine Milk Sigma C-6780 Caspace NCSU, Dept. of N/A 1CP3 Biochem. α-chymotrypsin Bovine ICN AB9 α-chymotrypsinogen A Bovine Pancreas ICN CHG Conalbumin Chicken Egg White Sigma C OVT Concanavalin A Canavalia Sigma C APN Ensiformis Cytochrome C Horse Heart Muscle Aldrich 10, CCR Dehaloperoxidase NCSU, Dept. of N/A 1EW6 Chemistry. Elastase (Type I) Porcine Pancreas Sigma E EST Glutathione Reductase Wheat Germ Sigma G GRS (Type II) Hemoglobin Bovine Sigma H A3N Immunoglobulin (IgG) Human Serum Fluka REI Lysozyme Chicken Egg White Sigma and ICN L AZF and Myoglobin Horse Heart Sigma M MBS Myosin University of San N/A 1B7T Diego Papain Papaya latex ICN PPD Pepsin Porcine Stomach Sigma P PEP Mucosa Ribonuclease A Bovine Pancreas ICN AFK Ribonuclease B Bovine Pancreas Sigma R RBB Subtilisin Bacillus Species ICN STI Trypsin Porcine Pancreas ICN BPI Trypsin Inhibitor Chicken Egg Whites ICN PTI Trypsinogen Bovine Pancreas Sigma T TGN Single-pass ATR-FT-IR spectra were recorded immediately after the sample was deposited onto the Teflon block. A steady stream of N 2 gas over the Teflon block was used 28

47 to gently dehydrate the protein samples. Subsequent collection of spectra allowed for the observation of three states that are referred to throughout this paper as the hydrated state, the intermediate state, and the gel state (refer to Figure 1.18). As the names imply, the hydrated state refers to the protein sample when it is first analyzed whereas the intermediate and gel states refer to the protein sample as it is allowed to concentrate in the N 2 environment in which it is placed Absorbance Energy (cm -1 ) Figure An example of spectral enhancement upon protein gel formation is displayed. The spectra shown are of the protein pepsin from a liquid sample (blue), to an intermediate state (green), to a gel state (red). The gel state is differentiated from the intermediate state in that when in the gel state, no more spectral changes are evident. 29

48 Denaturation of a protein due to gel formation or to an interaction with the internal reflection element has been discussed in the literature and is a concern for all ATR-FT-IR techniques. (31) In the single-pass ATR method, acquiring protein solution spectra as the protein slowly concentrated enabled any protein denaturation to be observed and distinguished from the native state. The ability to differentiate those proteins that suffer from denaturation under the conditions of the experiment is shown in comparison spectra given in Figures 1.19 and Any spectra that showed evidence of protein denaturation, such as the chymotrypsinogen spectrum shown in Figure 1.19, were discarded Absorbance x Energy (cm -1 ) Figure Displayed are the spectra of the protein chymotrypsinogen from protein in solution (blue), to an intermediate state (green), to a gel state (red). These spectra signify the ability to observe any protein denaturation that occurs as a function of gel formation. The amide I position of the hydrated state spectrum is different of that of the intermediate and gel states. Such a significant shift suggests protein denaturation. 30

49 Absorbance x Energy (cm -1 ) Figure Displayed are the spectra of the protein chymotrypsin from protein in solution (blue), to an intermediate state (green), to a gel state (red). These spectra signify the ability to observe any protein denaturation that occurs as a function of gel formation. The amide I position of gel state spectrum is not different of that of the fully hydrated and intermediate states suggesting that no protein denaturation has occurred. Transmission FT-IR was used as a referee method for the validation of the single-pass ATR-FT-IR technique. Since is not an adequate solvent for Amide I analysis via transmission FT-IR, the proteins listed in Table 1.4 were also prepared to a final concentration of 3mM in D 2 O. When in transmission mode, a liquid nitrogen-cooled wideband MCT detector was used. The sample cell (Figure 1.11) consisted of CaF 2 windows separated by either a 6 or 12.5-micron spacer with a partition to yield a compartment for sample and a compartment for solvent. Approximately 5 µl of both sample and solvent were loaded into the cell. All spectral data were acquired using the software package Win- 31

50 IR-Pro v2.97 manufactured by Digilab. The spectral range of cm -1 was used for protein analysis. For the validation process, single-pass ATR-FT-IR protein data was compared to transmission FT-IR protein data where D 2 O was used as the solvent. When validating the accuracy of the single-pass ATR-FT-IR method, a D 2 O subtraction was used to allow for the observation of the amide III region. When performing the secondary structure analysis, protein spectra were acquired by single-pass ATR-FT-IR where was used as the solvent. All data analysis was performed using the software package Igor-Pro v3.1. There was no need for subtraction when using the single-pass ATR-FT-IR technique since the protein sample dehydrated and formed a concentrated gel on the Ge IRE. The only time a subtraction algorithm was used in this study was to compare the dehydrated protein spectra to those of water subtracted protein spectra to ensure that no protein denaturation was occurring in the single-pass ATR-FT-IR configuration Results and Discussion The objective of work presented in this chapter was to validate the single-pass ATR- FT-IR technique for secondary structure determination. Statistical analysis was performed to test the accuracy and reproducibility of the single-pass ATR-FT-IR method for protein analysis. This involved the spectral comparison of proteins in D 2 O solution acquired by single-pass ATR-FT-IR to that acquired via transmission FT-IR. Two lots of the protein lysozyme were prepared in D 2 O solution and allowed to exchange overnight. All respective replicate spectra were superimposable, thus a day-to-day study was performed over three consecutive days. It was predicted that the single-pass ATR-FT-IR method would be more reproducible since variations in sample cell path length would not be an issue when using 32

51 this technique. As seen in Table 1.5, the single-pass ATR-FT-IR technique yielded smaller standard deviations for a Gaussian fit of the spectral range from cm -1, which included the amide I region. The Gaussian fitting function used was as follows L j ω = A j 2πσ j exp ω ω 0 j 2σ j 2 2 (1.11) where j represents each spectral component. A j represents the amplitude, ω 0j the frequency, and σ j the variance of the jth Gaussian. A significant reduction in χ 2 was achieved when a three Gaussian fit was used, thus three frequencies are reported for each sample spectrum. The results given in Table 1.5 also demonstrate that the single-pass ATR-FT-IR technique yields amide I positions comparable to those protein spectra acquired by transmission FT-IR in D 2 O solution. The results presented in Table 1.5 are representative in that several proteins were analyzed and all yielded standard deviations less than those obtained from transmission FT-IR and had amide I positions comparable to those acquired by transmission FT-IR (See Figure 1.21). 33

52 15 Absorbance x Energy (cm -1 ) Figure The mid-frequency ATR-FT-IR spectra of six samples of lysozyme are displayed where the blue dotted spectra represent the first lot of lysozyme and the red solid spectra represent the second lot of lysozyme. These spectra were used for a reproducibility study and represent the data given in Table 1.5. Table 1.5. A comparison of multi-gaussian fits for six sets of lysozyme spectra Single-pass ATR-FT-IR microscope Lot 1, Day 1 Lot 1, Day 2 Lot 1, Day 3 Standard χ E E E-06 Deviation Gaussian Gaussian Gaussian

53 Lot 2, Day 1 Lot 2, Day 2 Lot 2, Day 3 Standard Deviation χ E E E-06 Gaussian Gaussian Gaussian Transmission FT-IR Lot 1, Day 1 Lot 1, Day 2 Lot 1, Day 3 Standard Deviation χ Gaussian Gaussian Gaussian Lot 2, Day 1 Lot 2, Day 2 Lot 2, Day 3 Standard Deviation χ Gaussian Gaussian Gaussian

54 A second advantage of the technique is that no subtraction was needed as illustrated in Figure In this representation, two spectra of the protein hemoglobin as acquired via single-pass ATR-FT-IR are compared. One spectrum is of the protein in a hydrated state where a subtraction algorithm was applied whereas the other is a spectrum of the protein in the gel state. The comparison supports the argument that no subtraction is needed for single-pass ATR-FT-IR spectra when in the gel state since the amide I position and line shape for the subtracted spectrum is no different from that of the gel state spectrum where no subtraction algorithm for was applied. The benefit of not having to perform a subtraction is that such algorithms do not consider the effects of the protein on the solvent. (36) Thus, the single-pass FT-IR-ATR method is well suited for a holistic approach that regards the spectrum as that of the protein and associated solvent. In addition, the spectrum of the protein in the gel state is more resolved in the amide A, amide B, and the C-H stretching regions than that of the subtracted spectrum. 36

55 40 30 Absorbance x Energy (cm -1 ) Figure Spectra of the protein hemoglobin in where the blue spectrum is hemoglobin in the gel state and the red spectrum is hemoglobin in a hydrated state but with subtracted from the spectrum. This represents the argument presented that no subtraction is needed for protein spectra that are in the gel state as acquired via single-pass ATR-FT-IR. To further validate the accuracy of the single-pass ATR-FT-IR method, the midinfrared spectra of a representative set of proteins in solution were acquired using transmission FT-IR, multi-pass ATR-FT-IR, and single-pass ATR-FT-IR (example spectra are given in Figure 1.23). This study showed that the amide II regions were comparable for the three techniques. Even with a 6-micron spacer, the transmission FT-IR spectra had enough interference to obscure the amide I region. The amide I regions in the multipass and single-pass ATR-FT-IR spectra were similar. The most noticeable difference in Figure 1.23 is that the single-pass ATR-FT-IR method allows for the simultaneous observation of amide I-III, A, and B. Even with subtraction, the multi-pass ATR-FT-IR 37

56 spectra do not have as distinctive amide A and B regions as the single-pass ATR-FT-IR spectra Absorbance Energy (cm -1 ) Figure Mid-infrared spectra of hemoglobin in where the green spectrum was acquired by the conventional transmission FT-IR method, the blue spectrum was acquired by multi-pass ATR-FT-IR using a subtraction algorithm, and the red spectrum was acquired by the single-pass ATR-FT-IR method without using a subtraction algorithm. This demonstrates that the single-pass ATR-FT-IR method yields enhanced spectral information relative to transmission FT-IR and multi-pass ATR-FT-IR The trend for amide I is that β-sheet structures have a maximum near 1633 cm -1 with a shoulder at 1685 cm -1 and α-helical structures have a maximum near 1650 cm -1. (17) Structures with a mixture of β-sheet and α-helical structure exhibit linear combinations of these two basic spectral forms. Myoglobin and cytochrome C are primarily α-helical in structure and ribonuclease A and chymotrypsin have significant amounts of β-structure. As 38

57 seen in Figure 1.24, these band shapes were in agreement with previous studies. (6, 17, 23, 25, 31) Absorbance x Energy (cm -1 ) Figure An enlargement of the amide I region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum. Surprisingly, from observation of all the amide bands in the ATR-FT-IR spectra, the amide II and amide III band shapes showed a significant dependence on secondary structure as well. Many investigators maintain that the amide II region is considerably less significant than amide I in distinguishing secondary structure. (13, 17, 36) Thus, reported trends for the amide II region are limited. Enlarged amide II bands are displayed in Figure The amide II line shape appears to show clear differences between the two primarily α-helical proteins and the primarily β-sheet proteins. The reported trend is that a strong amide II 39

58 component occurs in the region of cm -1 and a weak component occurs in the region of cm -1. (17) The spectrum for cytochrome C followed this trend and the spectrum of myoglobin was very similar. The strong components for these primarily α- helical proteins were at 1541 cm -1 for cytochrome C and at 1536 cm -1 for myoglobin. The amide II bands have strong components in the region from cm -1 for ribonuclease A and chymotrypsin. The weak component predicted in the region of cm -1 was present for the four amide II bands of all of the proteins studied. However, for the primarily β-sheet proteins, the component in the region of cm -1 appeared just as strong as the component in the region of 1530 to 1540 cm Absorbance Energy (cm -1 ) Figure An enlargement of the amide II region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum. 40

59 Amide III bands for the same four proteins exhibited substantial changes in line shape as a function of varying secondary structure as well (refer to Figure 1.26). The amide III band is usually observed by Raman spectroscopy since the amide III band is so weak in the infrared. (17) As with the amide II region, proteins with high β-sheet content yielded strong bands at a lower frequency than proteins with high α-helical content. This is in agreement with the trend that the following secondary structure motifs; turns, β-sheet, α-helix, and random coils yield bands in respective order in the amide III region. (36) Absorbance x Energy (cm -1 ) Figure An enlargement of the amide III region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum. Examination of Figure 1.27 revealed trends in the amide A and B bands that correspond to protein secondary structure. The pattern for amide A was similar to that observed in amide I 41

60 (Figure 1.24) where bands for cytochrome C, myoglobin, ribonuclease A, and chymotrypsin occur in order of decreasing frequencies. The lowest frequency (3277 cm -1 ) corresponded to the β-sheet structure of chymotrypsin. The intermediate frequency of ribonuclease A (3279 cm -1 ) reflects a mixed content (21% α-helix, 34.7% β -sheet). Myoglobin and cytochrome C had maximum frequencies of 3292 and 3293 cm -1, respectively. No correlations were found in the literature for the amide B region. The only noticeable trend in this study was that the amide B bands are broader for cytochrome C and myoglobin, the primarily α-helical proteins, than those for ribonuclease A and chymotrypsin, the primarily β-sheet proteins Absorbance x Energy (cm -1 ) Figure An enlargement of the amide A and B region of four protein spectra are shown where myoglobin is represented by the orange spectrum, cytochrome C is represented by the red spectrum, ribonuclease A is represented by the green spectrum, and chymotrypsin is represented by the blue spectrum. 42

61 1.3 Conclusions Single-pass attenuated total reflection Fourier-transform infrared (ATR-FT-IR) microscopy provides a novel sample geometry that has distinct advantages over current technology for obtaining protein spectra. Statistical analysis indicates that the single-pass ATR-FT-IR method is more reproducible than transmission FT-IR spectroscopy since the pathlength in ATR is fixed (albeit wavelength dependent) while the pathlength in transmission geometry is dependent upon the spacer used. Moreover, experiments performed in solution will be more reproducible than those that require D 2 O exchange since the extent of the exchange can vary from sample to sample. The spectral range for observation of protein spectra shown here is greater than any previous FT-IR study. This is the first report of simultaneous observation of amide A, B, I, II and III demonstrating a correlation in each band. The increase in information content due to the observation of multiple bands requires further study for application to protein secondary structure prediction that is discussed in the second paper in this series. The enhancement of spectral information from using this technique has already proven useful for the study of protein and peptide conformation. (38-40) 1.5 References 1. Zhang, C.T. and R. Zhang, A Graphic Approach to Evaluate Algorithms of Secondary Structure Prediction. Journal of Biomolecular Strucutre and Dynamics, (5): p Teichmann, S.A., et al., Fast Assignment of Protein Strucutres to Sequences using the Intermediate Sequence Library PDB-ISL. Bioinformatics, (2): p

62 3. Susi, H. and D. Byler, Resolution-Enhanced Fourier Transform Infrared Spectroscopy of Enzymes. Methods in Enzmology, : p Byler, D. and H. Susi, Examination of the Secondary Structure of Proteins by Deconvolved FTIR Spectra. Biopolymers, : p Jencks, W., Infrared Measurements in Aqueous Media. Methods in Enzymology, (125): p Douseeau, F. and M. Pezolet, Determination of the Secondary Structure Content of Proteins in Aqueous Solutions from their Amide I and Amide II Infrared Bands. Comparison between Classical and Partial Least-Squares Methods. Biochemistry, : p Purcell, J. and H. Susi, Solvent Denaturation of proteins as Observed by Resolution- Enhanced Fourier Transform Infrared Spectroscopy. Journal of Biochemical and Biophysical Methods, : p Miyazawa, T., T. Shimanouchi, and S. Mizushima, Characteristic Infrared Bands of Monosustituted Amides. Journal of Chemical Physics, (2): p Miyazawa, T., Perturbation Treatment of the Characteristic Vibrations of Polypeptide Chain in Various Configurations. Journal of Chemical Physics, (6): p Krimm, S., Infrared Spectra and Chain Conformation of proteins. Journal of Molecular Biology, : p

63 11. Krimm, S. and Y. Abe, Intermolecular Interaction Effects in the Amide I Vibrations of β Polypeptides. Proceedings of the National Academy of Sciences, (10): p Griffiths, P.R., Fourier Transform Infrared Spectroscopy. Science, : p Baello, B., P. Pancoska, and T. Keiderling, Enhanced Prediction Accuracy of Protein Secondary Structure Using Hydrogen Exchange Fourier Transform Infrared Spectroscopy. Analytical Biochemistry, : p Susi, H., The Strength of Hydrogen Bonding: Infrared Spectroscopy. Methods in Enzymology, (22): p Lehninger, A.L., D.L. Nelson, and M.M. Cox, Principles of Biochemistry. 2nd ed. 1993, New York: Worth Publishers. 16. Holde, K.E.v., W.C. Johnson, and P.S. Ho, Principles of Physical Biochemistry. 1998, Upper Saddle River: Prentice Hall. 17. Pelton, J.T. and L.R. McLean, Spectroscopic Methods for Analysis of Protein Secondary Strucutre. Analytical Biochemistry, : p Keiderling, T., Vibational Circular Dichroism of Peptides and Proteins, in Infrared and Raman Spectroscopy of Biological Materials, B. Yan and H. Gremlich, Editors. 2001, Marcel Dekkar. 19. Skoog, D.A., F.J. Holler, and T.A. Nieman, Principles of Instrumental Analysis. 5th ed. 1998, Philadelphia: Saunders College Publishing Harcourt Brace College Publishers. 45

64 20. McHale, J.L., Molecular Spectroscopy. 1st ed. 1999, Upper Saddle River: Prentice Hall. 21. Atkins, P., Physical Chemistry. 5th ed. 1994, New York: W. H. Freeman and Company. 22. Kumosinski, T. and J. Unruh, eds. Global-Secondary-Structure Analysis of Proteins in Solution. ACS Symposium Series 576, ed. T. Kumosinski and M. Liebman. Vol. Molecular Modeling: From Virtual Tools to Real Problems Susi, H., Infrared Spectroscopy - Conformation. Methods in Enzymology, (26): p Bramanti, E., et al., Qualitative and Quantitative Analysis of the Secondary Strucutre of Cytochrome C Langmuir-Blodgett Films. Biopolymers, (2): p Parker, F.S., Applications of Infrared Spectroscopy in Biochemistry, Biology, and Medicine. 1971, New York: Plenum Press. 26. Chittur, K., FTIR/ATR for Protein Adsorption to Biomaterial Surfaces. Biomaterials, : p Powell, J., F. Wasacz, and R. Jakobsen, An Algorithm for the Reproducible Spectral Subtraction of Water from the FT-IR Spectra of Proteins in Dilute Solutions and Adsorbed Monolayers. Applied Spectroscopy, (3): p Nabet, A. and M. Pezolet, Two-Dimensional FT-IR Spectroscopy: A Powerful Method to Study the Secondary Structure of Proteins Using H-D Exchange. Applied Spectroscopy, (4): p

65 29. Dousseau, F., M. Therrien, and M. Pezolet, On the Spectral Subtraction of Water from the FT-IR Spectra of Aqueous Solutions of Proteins. Applied Spectroscopy, (3): p Jongh, H., E. Goormaghtigh, and J. Ruysschaert, The Different Molar Absorptivitiesof the Secondary Structure Types in the Amide I Region: An Attenuated Total Reflection Infrared Study on Globular Proteins. Analytical Biochemistry, : p Oberg, K. and A. Fink, A New Attenuated Total Reflectance Fourier Transform Infrared Spectroscopy Method for the Study of Proteins in Solution. Analytical Biochemistry, : p Axelsen, P. and M. Citra, Orientational Order Determination By Internal Reflection Infrared Spectroscopy. Progress in Biophysics and Molecular Biology, (3): p Sommer, A.J. and M. Hardgrove, Attenuated total Internal Reflection Microspectroscopy for the Analysis of Trace Solutes in Aqueous Solutions. Vibrational Spectroscopy, : p Han, L., et al., Chemical Sensors Based on Surface-Modified Sol-Gel-Coated Infrared Waveguides. Applied Spectroscopy, (1): p Fujii, T. and Y. Miyahara, Infrared ATR Spectroscopy of Substrates in Aqueous Solution Using Cyroenrichment and Its Application in Enzyme-Activity Assays. Applied Spectroscopy, (1): p

66 36. Vedantham, G., et al., A Holistic Approach for Protein Secondary Strucutre Estimation from Infrared Spectra in Solutions. Analytical Biochemistry, : p Lewis, L. and A. Sommer, Attenuated Total Internal Reflection Microspectroscopy of Isolated Particles: An Alternative Approach to Current Methods. Applied Spectroscopy, (4): p Smith, B.M. and S. Franzen, Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in H2O Solution. Analytical Chemistry, (16): p Pop, C., et al., Removal of the Pro-Domain Does Not Affect the Conformation of the Procaspace-3 Dimer. Biochemistry, : p Smith, B.M., L. Oswald, and S. Franzen, Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Prediction of Protein Secondary Structure. Analytical Chemistry, (14): p

67 Single-Pass Attenuated Total Reflection Fourier-Transform Infrared Spectroscopy for the Prediction of Protein Secondary Structure 2.1 Introduction The substantial number of protein gene sequences determined from the completion of the human genome project provides a motive for the development of rapid protein secondary structure determination methods. (1) The importance of protein secondary structure prediction methods lies with the determination of protein function for the study of both biological pathways and the mechanism of disease. (1, 2) Since a mere 3% of the determined protein gene sequences have known secondary structure (1), there is a tremendous need for methods that rapidly classify proteins and that monitor protein interactions. Protein secondary structure refers to the organization of amino acid residues in a polypeptide chain and is predominantly comprised of α-helical and β-sheet motifs. (3, 4) Several experimental methods exist for protein secondary structure determination such as circular dichroism (CD), Fourier-Transform infrared (FTIR) spectroscopy, nuclear magnetic resonance (NMR), Raman spectroscopy and X-ray diffraction. The overwhelming majority of threedimensional coordinates currently available in the protein data bank (PDB) were elucidated from either NMR or X-ray diffraction. The techniques of NMR and X-ray diffraction used to determine the coordinates of three-dimensional structure have not yet been applied to a large part of the proteome. (1, 5) Therefore, CD, Raman spectroscopy, and FTIR spectroscopy are routinely used to rapidly classify proteins according to secondary structure motifs. (1, 5) Using these three key methods, spectral based correlations of proteins with known secondary structure are used to construct calibration models for secondary structure prediction of 49

68 proteins with unknown three-dimensional coordinates Assignment of Protein Secondary Structure Proteins have varying secondary structure content such as the protein lysozyme shown in Figure 2.1. Upon observation, it is evident that lysozyme is primarily α-helical but there are β-sheet and random coil structural motifs present. As mentioned earlier, structure that cannot be assigned as either α-helix or β-structure is often referred to as random coil. Each type of secondary structure present yields an amide component at a characteristic frequency. For example, the amide I line shape is typically that of overlapping amide I contributions from the respective secondary structures. This concept is illustrated in Figure 2.2 where the area under each fit is representative to the percentage of given secondary structure. Figure 2.1. The structure of lysozyme downloaded from the Protein Data Bank (PDB). 50

69 Figure 2.2. The amide I band in the mid-infrared spectrum of the protein lysozyme. This figure represents the contributions from various secondary structures. (6) FTIR has been successfully used for secondary structural analysis largely based on the examination of amide I which results from C=O stretching. (6-13) Correlations between amide I and II band frequencies, principally amide I, and secondary structure are well established for proteins in both and D 2 O solution. (6, 7, 9, 14, 15) Amide I bands occur at approximately 1650 cm -1 for primarily α-helical structures, whereas, the amide I band for primarily β-sheet proteins is shifted to lower frequency to approximately 1630 cm -1. (6, 7, 9, 14, 15) In addition, primarily β-sheet proteins have a weak component at cm - 1. Amide II bands occur at 1550 cm -1 for α-helical proteins and 1530 cm -1 for β-sheets (6, 9, 14, 15) but the secondary structure correlations in this region are not well understood. Few correlations exist for the amide III, IV, A, and B regions. Rather than the technique of 51

70 frequency assignment for the determination of protein secondary structure, multivariate techniques such as multiple linear regression (MLR), partial least squares (PLS), and principal component regression (PCR) have been used to yield a more quantitative assessment. (16-19) In this study, the focus was the development of a multivariate calibration model for the prediction of protein secondary structure. The development of such a model involves training and validation phases. The development of a representative spectral library, one that includes all anticipated sources of signal variability, is crucial since the only variability that can be recognized is that which is included in the model. Once an adequate spectral library with sufficient variability is obtained, the prediction power of the model will be suitable for secondary structure determination of unknown proteins. Casein and immunoglobulin (IgG) have unidentified secondary structure content that will be predicted upon completion of the training and validation phases. PCR and PLS has been applied to a library of single-pass ATR-FTIR protein spectra in solution to predict α-helical and β-sheet content. Due to the small number of protein spectra, a bootstrap method was applied to enlarge the data set. Once an optimum training and test set were constructed, PCR and PLS were applied. GA optimization was performed on the models to improve the accuracy and robustness of the protein secondary structure prediction. The GA optimized wavenumber selection and the number of principal component factors included in the model. A data set that was not involved in the construction of the multivariate regression model, validation set, was used to validate the model. Upon finding an ideal model, the secondary structure content was determined for the 52

71 proteins casein and IgG. Thus, this study provides an accurate and rapid methodology for the prediction of α-helical and β-sheet content of single-pass ATR-FTIR protein spectra. 2.2 Experimental Details The proteins listed in Table 2.1 were prepared without further purification to a final concentration of approximately 3mM in. Table 2.1. Proteins used in this study as well as the number of single-pass ATR-FT-IR spectra included in the spectral library for each protein. Protein Number of Spectra in Library α-casein 8 Caspase 6 α-chymotrypsin 16 Chymotrypsinogen Did not use Conalbumin 3 Concanavalin A 14 Cytochrome C 8 Elastase 6 Glutathione Reductase 7 Hemoglobin 27 Human Serum Albumin 5 Immunoglobulin (IgG) 3 Lactalbumin 30 Lactoglobulin 30 Lysozyme 33 Myoglobin 33 Myosin 4 Papain 30 Pepsin 35 Ribonuclease A 11 Ribonuclease B 19 Subtilisin 8 Trypsin 13 53

72 Trypsin Inhibitor 18 Trypsinogen 10 In the experimental apparatus, the Ge crystal is at the focus of a Cassagranian objective in a UMA500 microscope (Digilab). A µL sample was injected onto the Teflon block using a Wheaton pipette. The protein spectra were recorded at ambient temperature and averaged over 64 scans on a Digilab FTS 6000 FTIR spectrometer equipped with a liquid nitrogen-cooled MCT in a UMA500 microscope. The protein spectra were recorded with a resolution of 2 cm -1 in the range of cm -1. Background spectra were obtained subsequently. Blowing a steady stream of N 2 gas over the Teflon block gently dehydrated the protein samples. Spectra were recorded continuously after the sample was deposited onto the Teflon block until a concentrated protein gel had formed onto the Ge internal reflective element (IRE). The Ge IRE was rinsed with and allowed to dry prior to loading a subsequent protein sample. Throughout this study, the protein samples are referred to as being in the hydrated state, the intermediate state, and the gel state. As the names imply, the hydrated state refers to the protein sample when it is first deposited into the Teflon reservoir. The gel state refers to dehydrated sample and intermediate states are observed during the 15 to 30 minutes required to form the gel state. Spectral enhancement is seen upon formation of the intermediate and gel states. Once the peak intensities no longer increased, the gel state was achieved and numerous replicate spectra were acquired. All spectral data were acquired using the software package Win-IR-Pro v2.97 (Digilab). The spectral range of cm -1 was used for protein analysis. Data analysis was performed using the software package Igor-Pro v3.12. There was no need for 54

73 subtraction when using the single-pass ATR-FTIR technique since the protein sample was allowed to dehydrate, forming a concentrated gel on the Ge IRE. Once the protein spectra were acquired via single-pass ATR-FTIR, they were watervapor subtracted, baseline-corrected, and consolidated into a protein library. The numbers of spectra for each protein in the spectral library are given in Table 2.1. Since the protein spectra were acquired continuously, any change in spectra due to denaturation could be observed directly. Denaturation was not a consistent problem with any protein other than chymotrypsinogen. None of the spectra for chymotrypsinogen were included in the spectral library. Suitable protein spectra were transferred to the software package, MATLAB v. 5.3 in order to construct principal component regression and partial least squares calibration models for the prediction of protein secondary structure. Since the number of spectra varied, there were some instances where the protein spectra were bootstrapped. Large characteristic data sets are required for multivariate analysis, particularly for the development of multivariate regression models. Small data sets that yield sparsely populated principal component clusters in multivariate space can be expanded by the application of the bootstrap re-sampling method to yield more densely populated principal component clusters. In this study a parametric bootstrap technique was used to enlarge small data sets for a better estimation of the protein secondary structure content Principal Component Analysis Determination and quantitation of protein secondary structure has been further developed using techniques such as multivariate calibration. (17, 20, 21) Factor analysis, 55

74 principal component analysis, eigenvector projections, and singular value decomposition are techniques employed to analyze multivariate data. These methods are similar, thus using statistical nomemclature are classified as principal component analysis (PCA). (22) When performing PCA, a data matrix consisting of objects and variables is used. Objects are denoted with the subscript n, where n reresents the number of rows in the data matrix. Objects are simply varying or replicate samples. Variables are along each column, m, in the data matrix. Variables can be related to the concentration of constituents in a sample, such as peak height and area. Other variables such as electrochemical measurements, rate constants, refractive indices, and thermodynamic data can be utilized. (22) Spectral data is commonly digitized, thus, each wavelength or wavenumber is a variable. (23) When choosing variables, it is significant that the given variables pertain to all objects in the data matrix and one variable can represent more than one feature. It is not advantageous to include an IR peak for a carbonyl bond if it is not revelent to all of the objects. 56

75 Figure 2.3. Schematic of digitized spectral data The goal of PCA is to determine relationships between objects and variables in a data set. (22) The data matrix is of high dimensional space that is difficult to visualize. The method of PCA reduces the data set to a lower dimensional space. Data reduction is achieved by decomposing the data and discarding insignificant principal components. The term principal component can be thought of as a dimension or a factor. In some cases, factors represent real chemical properties. (23) An example would be chromatography data of a mixture; thus, a factor will be present for each constituent. (24) 57

76 In the following example, the data matrix has a row for each sample and a column for each wavelength. The number of eigenvectors generated is dependent upon n or m whichever is smaller. (23) Each eigenvector has an eigenvalue, which is the measure of importance of the associated eigenvector. The eigenvector associated with the largest and most important eigenvalue, in a least-squares sense has the greatest possible variance in the data. The data matrix is decomposed into a matrix of column mode eigenvectors, U, the singular values, S, and the row mode eigenvectors, V, by the singular value decomposition function where D represents the data matrix. This method rotates the data in multidimensional space in order to align the linear combinations of the original data with the direction of the most variance. 58

77 Figure 2.4. Representation of the singular value decomposition function for a (25 x 250) data matrix (24) The singular value decomposition function (SVD) is used as an example since it is the most accurate of the principal component analysis methods. The reason being that the columnmode eigenvectors represent the variance between objects and the row-mode eigenvectors represent variance between variables. Although the singular values matrix has the same dimensionality as the data set, a diagonal is present where all other elements of the matrix are zero. The diagonal has n elements and the squares of these elements are the eigenvalues. The number of eigenvalues represents the total number of factors. An eigenvalue represents 59

78 the amount of variance preserved by the corresponding eigenvector. Variance can be considered as information content, thus a portion of the factors can account for the majority of information contained in the data set. As seen in Table 2.2, five factors account for 99.75% of the information contained in the data set. The remaining 25 factors can be disregarded as noise. It is noteworthy that eigenvectors are often referred to as principal components. Table 2.2. Illustration of the variance associated with each eigenvalue (24) Principal Component Number Eigenvalue of Cov(X) % Variance Captured By This PC % Variance Captured Total x x x x x x x x x x x x x x x x x x x x x x x x x

79 The number of principal components or factors, k, represents the number of eigenvectors that account for the majority of the variance for a given data set. By producing a principal component model, the data matrix is projected from a n-dimensional, space onto a low dimensional, k-dimensional, sub-space. Thus, data reduction involves using chemically significant factors and disregarding the remaining factors. Choosing the correct number of factors is an important step in building a principal component model. Including too many factors is the most encountered problem. If too many factors are used, the data can become overly trained (refer to Figure 2.5). 61

80 Absorbance Concentration (mm) Absorbance Concentration (mm) Figure 2.5. The first plot (a) is overly trained, whereas the second plot (b) is ideally trained. As seen in Figure 2.5a, when a model is over-trained, the prediction power is lost. Although the calibration curve in Figure 2.5a fits the calibration data better, Figure 2.5b is a better calibration curve for predicting unknown concentrations. 62

81 Principal Component Analysis: Mathematical Details The data matrix, D, contains individual element denoted as d nm. The new values are denoted as x nm (k) where k represents the number of factors used in the model. The new values, x nm, are calculated as follows using the reduced principal component model. x x M x 1,1 2,1 n, m x 1,2 L L L L L L L xn, m L = L xn, m T U ( n k ) S ( k k ) V( k, m) (2.1) The values of x nm are calculated via Equation 2.2, k ( k) d nm evnm x nm = n=1 (2.2) where ev denotes the associated eigenvector. In a least squares sense, the residual error between the reproduced data based on k principal components, x nm, and the reproduced data based on the first principal component, x nm (1) is calculated. e () x x () 1 = (2.3) nm 1 nm nm () 1 xnm d n ev m enm 1 1 = (2.4) The error is minimized via linear least squares where the derivative of the sum of squares is found and set to zero. n e 2 nk ev () 1 n= 1 1k = 2ev n 1k n= 1 d 2 n1 2 n n= 1 d n1 ev nk (2.5) n k n= 1 n 2 ev1 d n1 = d n1 evnk (2.6) n= 1 63

82 Expressed in matrix notation, the following equation results: T T 1 X ev1d1 d1 d = (2.7) The term X is the complete reproduced data matrix using all principal components, which can be written as follows: T 1 T 2 X = d 1 ev + d ev + K+ d 2 m ev (2.8) T m The eigenvalues, λ k, can be substituted into Equation 2.8, (23) T X d1 = λ 1ev1 (2.9) Using the variance-covariance matrix, Z, which is determined by Equation (24) T Z = D D (2.10) Thus, Equation 2.11 is used to solve for the first principal component and the associated eigenvalue. Zev = (2.11) 1 λ 1ev1 The second principal component is found by an analogous method where the residual error is as follows: e nm ( 2) x x (2 nm ) = (2.12) nm Once the residual error is found, it is minimized by setting the derivative of the sum of squares to zero while keeping e nm (1) constant. n e 2 nk ev ( 2) n= 1 2k = 2ev n 2k n= 1 d 2 n2 2 n n= 1 d n2 e nm () 1 (2.13) 64

83 n n 2 2k d n2 = n= 1 n= 1 n2 ( 2) ev d e (2.14) nm Using the error matrix, E, E = D d ev (2.15) 1 1 the second principal component and the associated eigenvalue are found via the following relationship. (23) T [ Z λ 1ev1ev1 ] ev 2 = λ 2ev 2 (2.16) In Figure 2.6, the object given in 3-dimensional space represents an entire mid-infrared spectrum. 65

84 Figure 2.6. Schematic of one point representing the data of an object (22) In Figure 2.7, 25 objects are present that represent 25 mid-infrared spectra. Objects having resembling data, as in the given in Figure 2.7, tend to cluster about a mean or a centroid. 66

85 Figure 2.7. Clustering of the 25 objects in the given example The method of PCA describes the position and the expanse of these objects with relation to the mean in n-dimensional space (in this illustration, 3-dimensional space). The first eigenvector accounts for the majority of the information content, thus it will pass through the greatest concentration of data points. The first eigenvector, or principal component, is the best linear fit through the data cluster. (22) The coefficients of the line are referred to as loadings. (22) There is a loading for each variable m; thus each is denoted as 67

86 PC 1m. All m loadings constitute a row vector referred to as the first principal component (PC), or first eigenvector. Figure 2.8 represents how the first PC is found. Figure 2.8. Representation of the first principal component (22) When the objects are projected onto PC 1, the principal component scores result. When projecting the principal component score onto the line, the score is simply the displacement from the mean. (22) 68

87 Figure 2.9. Description of a principal component score Subsequent principal components are found by, removing the first PC from the data and calculating a new mean. Once again the best linear fit through the data objects is obtained. By removing the first principal component, the second principal component is orthogonal to the first and likewise for the remaining principal components. (22) 69

88 Figure Representation of how the second principal component is found (22) Wavelength Selection The mathematical model that defines multivariate spectral data follows Beer- Lambert s law defined by Equation

89 A = k c + k c k c ,1 1 1,2 2 1,m m A = k c + k c k c n n,1 1 n,2 2 n,m m (2.17) A represents response (e.g., absorbance), k represents the molar absorption coefficients, c represents concentration, n represents the number of components, and m represents the number of wavelengths. In order to accurately describe complex protein spectra, wavelengths at which there is minimal overlap, thus maximal selectivity, are chosen. Were it possible, selection of n fully selective wavelengths would produce n molar absorption coefficient vectors that are mutually orthogonal. Wavelength selection in multivariate analysis is an attempt to improve the precision and accuracy of models by maximizing such measures as selectivity and/or sensitivity. Using all of the wavelengths, m, in a spectrum provides the best precision, provided that random measurement errors are identically and normally distributed throughout the spectral range. Absorption spectra seldom follow this pattern since regions of high and low signal-to-noise ratios and regions of nonlinear response may be found. By eliminating regions such as these, wavelength selection can increase precision or the accuracy of the model. In order to construct a representative model, often there must be a compromise between improving the accuracy without a significant loss in precision. Wavelength selection is a difficult task that involves the simultaneous optimization of both the precision and the accuracy of the multivariate model. For wavelength selection, the number of wavelengths, m, to be used in the multivariate model should lie in the range n < m < M, where n indicates the number of 71

90 components in the model and M represents the total number of wavelengths in the spectrum. Finding the optimum number of wavelengths, m, as well as the optimum subset of wavelengths from every possible subset of m wavelengths produces a global solution but can be quite overwhelming in terms of computational load. If the number of possible combinations of wavelengths were limited, the computational processing would be lessened at the expense of risking a local solution. Stepwise elimination (SE), simulated annealing (SA), and genetic algorithms (GA s) are techniques for wavelength selection. (25) SE is a deterministic local search heuristic, whereas SA and GA s are probabilistic global search heuristics. Genetic algorithms have been proven superior to methods such as simulated annealing and stepwise elimination for wavelength selection in multivariate calibration analysis in comparative studies. (25) Principal Component Regression Principal component regression (PCR) is a multivariate technique that predicts protein secondary structure from the line-shapes of particular regions within the protein midinfrared spectrum, particularly the amide I and II regions. Since the spectroscopy-structure correlation is based upon line shapes, band intensity that refers to protein concentration is not of particular interest. In the current application, PCR models were developed to establish a correlation between spectral data to that of known protein secondary structure content, particularly α-helical and β-sheet content. There is a training phase, a testing phase, and a validation phase. PCR begins with decomposing the training set via the singular value decomposition (SVD) function as seen in Equation 2.1. Upon computing the regression model, the regression vector, b, is computed by the following equation, 72

91 b = V (m x k) x S -1 (k x k) x U T (k x n) x cstd (n x c) (2.18) where c std is the matrix of known α-helix and β-sheet content. Using the regression vector, the predicted α-helix and β-sheet content are computed for the training and test sets. cpred = Astd (n x m) x b (m x c) (2.19) Once the model has been optimized, the α-helix and β-sheet content of the validation set is calculated to test the robustness of the model. If an adequate validation is achieved, then one can be assured that a global model has been obtained. Thus, the model can be used to predict the content of unknowns via Equation (24) cpred = Aunk x b (2.20) The MATLAB code used to perform PCR is available in the Appendix Partial Least Squares The partial least squares (PLS) method is similar to principal component regression. The primary difference between the two is that PLS involves the construction of two separate calibration models for each column in the secondary structure content matrix. The spectral data, A std, are decomposed into a matrix of scores, T, and loadings, P. Astd (n x m) = T (n x k) x P T (k x m) + E (2.21) In addition, the secondary structure content matrix was also decomposed into a matrix of scores, U, and loadings, Q. Cstd (n x m) = U (n x k) x Q T (k x m) + F (2.22) 73

92 The matrices E and F correspond to the residuals, where the goal of PLS is to model the components such that the elements within the E and F matrices are close to zero. The scores for A std and C std are related via Equation 2.23 and Equation 2.24 calculates a matrix of regression vectors, B. U = T x W (2.23) B = P x (P T P) -1 x W x Q T (2.24) Since this research deals with the prediction of α-helix and β-sheet content, the matrix B should consist of two regression vectors. Equation 2.25 then uses the matrix of regression vectors to predict the secondary structure content of unknown proteins. (24) C unk = A unk x B (2.25) The Bootstrap The particular bootstrap method used was developed by Gemperline and Smith (26) and re-samples from the inside model space. The terminology "inside model space" first adopted by Van Der Voet, et al. (27) refers to the multidimensional space defined by a principal component model that employs the k largest principal components. The complementary space defined by the residuals of the k factor principal component model is referred to as the outside model space. First developed by Efron in 1979, the bootstrap is a method for obtaining estimates of statistical parameters and of the uncertainty in these statistical parameters (28, 29) based upon re-sampling from an empirical distribution. (30-36) In this study, the application of the bootstrap re-sampling procedure is reported for improving the robustness of the PCR model. 74

93 The inside model space bootstrapping method involved the re-sampling of the column-mode eigenvectors, U. The following steps were performed to implement this unique method. First, the original data set was decomposed via the truncated singular value decomposition function as given in Equation 2.1. The complete matrix of column modeeigenvectors, U, was repeated sequentially until N, the number of bootstrap samples, was satisfied. The order of factors in each column of U was then randomized independently of all other columns to produce U*. Finally a new bootstrapped matrix, A*, of spectra was generated from Equation A * (N x m)= U* (N x k) x S (k x k) x V T (k x m) (2.26) This process was repeated J times, J was equal to either 5 or 10 in this study, and the resulting spectra were averaged together. In order to compensate for the effect of averaging, the averaged data was multiplied by the square root of J. The bootstrap method used does not introduce any new sample variability, since the bootstrapped data has the same sample distribution as the original data set. Thus, the spectral line shapes are preserved in the bootstrapped data set and the only significant difference was the peak intensities. The MATLAB code used to perform the bootstrap is available in the Appendix The Genetic Algorithm In this study, the genetic algorithm (GA) was used to optimize both wavenumber selection and the number of principal component factors to be included in the multivariate regression model. The GA is a popular optimization technique that employs a probabilistic, non-local search heuristic that was inspired by Darwin s theory of natural selection. (37, 38) The GA manipulates binary strings known as chromosomes, which contain genes that encode 75

94 experimental parameters. An initial population of random binary strings is produced giving an n x m data matrix, where n is the number of individuals in the initial population and m is the length of each chromosome. The multivariate regression models specified by these chromosomes are constructed, tested, and ranked according to the desired figure of merit. The best individuals have the greatest possibility of surviving in the GA. The chromosomes of the best individuals are recombined to produce offspring chromosomes with even better genetic material (refer to Figure 2.11). cross-over point new generation Figure Representation of two parent binary strings undergoing cross-over at two positions to produce two offspring Mutations are allowed to occur in the population at a very low rate. The mutations generally result in models that are worse; however, occasionally mutations can produce a change that results in a better model that is then incorporated in the evolutionary process of producing a new population. A number of different groups have reported the use of genetic algorithms as a tool for wavelength selection (25, 37-42) and the determination of principal component factors to be used (43) in multivariate calibration. 76

95 In the GA method optimization procedure, one chromosome contained sufficient information to completely specify the parameters needed for calibration. Each chromosome contained two types of genes. The first gene represented the wavenumbers to be used in the principal component regression model. The wavenumber selection gene was a 1 x m vector of randomly generated numbers between 0 to1 where m represents the maximum number of wavenumbers to be included in the PCR model. Random numbers were rounded to either the ceiling or the floor (1 or 0). A bit position equal to 1 signified including this representative wavenumber, whereas, a 0 signified omitting the corresponding wavenumber. Using this procedure, one would expect an average of 50% of the wavenumbers bit positions being coded as 1. One also wants to avoid generating a chromosome that codes for the inclusion of too few wavenumbers to be included in the PCR model. Thus, features were added to offset the probability of having 50% of the wavenumbers as well as require a minimum of 1/5 of the wavenumbers to be included in the PCR model. The second gene contained eighteen bits and encoded the number of principal components to be used for building the principal component model. The gene contained eighteen bits to signify one to eighteen principal components. Only one bit was allowed to be true and the number of principal components was based upon the position of the true bit. A maximum of eighteen factors was employed to prevent over training of the principal component model. The training and test sets were used for monitoring the GA optimization. The figure of merit was the average of the standard error of calibration (SEC) and the standard error of prediction (SEP). The best 50% of the chromosomes were selected as parents to produce a 77

96 new population. The remaining chromosomes were discarded. The discarded chromosomes were replaced with a new set, equal in number, of randomly generated chromosomes. An offspring population of chromosomes was produced by recombining chromosomes from the parents at a given number of random crossover points and by introducing a random mutation rate of 5%. The crossover sites were randomly generated from the bit positions that were involved in wavenumber selection. After selection of the random crossover points, the resulting pieces of chromosomes were randomly shuffled and recombined to generate the offspring population. Mutations were introduced in the offspring chromosomes to produce point mutations in 5% of the chromosomes in each generation. The offspring chromosomes were then translated into their respective wavenumber regions and number of principal component factors and the resulting principal component regression models were computed. The training set was reclassified using the new models and the offspring were ranked to find the individuals producing the best SEC and SEP. Fifty percent of the fittest offspring were used for the next generation in addition to randomly generated chromosomes, as described earlier. The process of producing new generations of chromosomes was repeated until the number of given, in this case 20, generations had been satisfied. When the GA optimization routine was completed, the last generation of chromosomes was returned to the user. The MATLAB code used to perform the GA is available in the Appendix. 78

97 2.3 Results and Discussion The purpose of this study was the development of a multivariate regression model for α-helix content and β-sheet content prediction of single-pass ATR-FTIR data. The development of an accurate model focused on four issues. The first issue dealt with whether to use PCR or PLS as the regression method. The second issue was the development of representative training, test, and validation sets. Each of two spectral libraries constructed, denoted spectral library 1, for β-sheet prediction, and spectral library 2, for α-helical prediction, in this study consisted of a training set, a test set, an independent validation set, and an additional data set of unknowns. The third focus was on which wavenumber regions to include in the multivariate regression model. After two spectral libraries and two sets of regression coefficients were found, the fourth focus was to develop a third library and a third set of regression coefficients for the simultaneous prediction of α-helix content and β-sheet content prediction of single-pass ATR-FTIR data. Initially the proteins caspase, chymotrypsin, concanavalin, cytochrome C, elastase, glutathione reductase, lactalalbumin, lysozyme, myoglobin, myosin, papain, ribonuclease A, subtilisin, trypsin inhibitor, and trypsin were included in the training and test sets. The validation set consisted of the proteins lactalbumin and pepsin. Both PCR and PLS were applied to the training, test, and validation sets in the region from 600 to 4200 cm -1 with the exception of the CO 2 region. Both methods yielded poor validation results, however, PCR was slightly the better of the two regression methods (refer to Table 2.3). In addition, PLS is a much more time-consuming technique in that the computation time was on the order of a 79

98 magnitude greater than the PCR technique. Thus, it was concluded that PCR rather than PLS should be used in this study. Table 2.3. Initial prediction results using both PCR and PLS Actual α- helix content Predicted α-helix content using PCR Residuals Actual β- sheet content Predicted β-sheet content using PCR Residuals Validation Set Lactoglobulin Pepsin Actual α- helix content Predicted α-helix content using PLS Actual β- sheet content Predicted β-sheet content using PLS Validation Set Lactoglobulin Pepsin In spectral library 1, the proteins caspase, chymotrypsin, concanavalin, cytochrome C, elastase, glutathione reductase, lactalbumin, lysozyme, myoglobin, myosin, papain, ribonuclease A, subtilisin, trypsin inhibitor, trypsin, and trypsinogen were included in both the training and test sets. The validation set within spectral library 1 contained the proteins hemoglobin, lactoglobulin, pepsin, and ribonuclease B. The construction of the training and test sets involved bootstrapping the original protein spectra 10 times to make a total of 100 spectra. The bootstrapped data set was then split into odd numbered bootstrapped spectra (training set) and even numbered spectra (test set). This enabled an equal number of representative protein spectra to be included in the training set since the number of original protein spectra varied (Table 2.1). The validation set was also bootstrapped to make a total of 100 spectra, however the odd numbered bootstrapped spectra were kept while the even numbered spectra were discarded. It is common to use bootstrapped data as test and 80

99 validation sets but not for the training set. The rationale for the use of bootstrapped data in the training set was that mid-infrared spectra of proteins in solution are only moderately reproducible in the amide A, amide B, and the C-H stretching regions due to the varying levels of hydration observed by the single-pass ATR-FTIR method. Thus, this type of variability must be introduced to the calibration model for adequate prediction. This was achieved by performing the bootstrap, since the resulting bootstrapped data set had a greater population density of the varying levels of hydration in the amide A and B regions. The GA was applied to optimize the number of factors as well as the wavenumbers in the region from 600 to 4200 cm -1 with the exception of the CO 2 region. The resulting PCR parameters yielded reasonable training and test set predictions however the validation results needed improvement. Further optimization of the calibration model was performed where only specific regions were used. Surprisingly, the best regions were cm -1 and cm -1. This wavenumber region included amide I, amide II, amide A, and amide B, and the C-H stretching region. In the literature, primarily the amide I and II regions are used in multivariate models to predict secondary structure content. (17, 18, 44, 45) The GA was applied to optimize the PCR parameters in the wavenumber regions cm -1 and cm -1. The validation results from the GA optimized PCR parameters are given in Table 2.4 and a plot of a sample validation spectrum at the 580 GA optimized wavenumbers is given in Figure The prediction results from the above model resulted in good β-sheet predictions but the α-helix content predictions needed improvement. In order to enhance the PCR model for α-helix prediction, the training, test, and validation sets were modified and the inclusion of 81

100 different amide regions was investigated. In spectral library 2 the proteins caspase, chymotrypsin, concanavalin, elastase, glutathione reductase, lactalalbumin, lysozyme, myoglobin, myosin, papain, ribonuclease A, subtilisin, trypsin inhibitor, and trypsin were included in both the training and test sets. The validation set within spectral library 2 contained the proteins cytochrome C, lactoglobulin, pepsin, and ribonuclease B. The training, test, and validation sets were bootstrapped as indicated earlier. It was determined that inclusion of the amide IV region, 720 to 760 cm -1, in addition to the amide I, amide II, amide A, amide B, and the C-H stretching regions, good validation results were achieved for α-helix content prediction. GA optimization within the amide I, II, IV, A, and B, along with the C-H stretching regions resulted in a model using 120 wavenumbers and 18 factors. These results were the best for α-helix content prediction and are given in Table 2.4. A plot of two sample validation spectra at the 120 GA optimized wavenumbers is given in Figure

101 Table 2.4. The two separate PCR methods that yielded the best validation results for α-helix prediction and β-sheet prediction, respectively. α-helix Prediction Library: Spectral Library 2 Regions: Amide I, Amide II, Amide IV, Amide A, Amide B, and C-H stretching region Factors: 18 Wavenumbers: 120 Predicted α-helix Validation Set Actual α-helix content content using PCR Cytochrome C Lactoglobulin Pepsin Ribonuclease B β-sheet Prediction Library: Spectral Library 1 Regions: Amide I, Amide II, Amide A, Amide B, and C-H stretching region Factors: 18 Wavenumbers: 580 Predicted β- Validation Set Actual β-sheet content sheet content using PCR Hemoglobin Lactoglobulin Pepsin Ribonuclease B

102 Absorbance Wavenumber Figure Plot of the single-pass ATR-FTIR spectra of the proteins concanavalin A (solid) and cytochrome C (dashed) at the GA optimized wavenumbers. 84

103 g y E ner Wavenumber Figure Plot of the single-pass ATR-FTIR spectra of the proteins pepsin (solid) and lactoglobulin (dashed) at the GA optimized wavenumbers. Spectral library 1 along with the use of the amide I, amide II, amide A, amide B, and the C-H stretching regions yielded the best β-sheet content predictions. The average absolute error of calibration for this model was 2.0% for β-sheet content and the average absolute error of prediction for the test set was 2.3% for β-sheet content. This is evident in this case since the average absolute error for the validation results was 2.3% for β-sheet content. Spectral library 2 along with the use of the amide I, II, and IV regions yielded in the best α- helix content predictions. The average absolute error of calibration for this model was 1.9% for α-helix content and the average absolute error of prediction for the test set was 1.9% for α-helix content. For both spectral libraries, the best validation results did not coincide with the best training and test set predictions. The average absolute error in this case was 1.7% for α-sheet content. Two proteins with unknown secondary structure content, casein and 85

104 immunoglobulin (IgG), were bootstrapped to produce a total of 50 spectra. The secondary structure content of casein and IgG were predicted using the first model for β-sheet content prediction and the second model for α-helix content prediction. The α-helix content of these proteins was determined to be 24.2 and 8.4%, whereas the β-sheet content was determined to be -2.7 and 29.3%, respectively. Two spectral libraries were constructed, spectral library 1, for β-sheet prediction, and spectral library 2, for α-helical prediction, for secondary structure prediction of single-pass ATR-FT-IR protein spectra. However, one solution for both α-helix content and β-sheet content prediction of single-pass ATR-FTIR data was desired. Further refinement of the spectral data resulted in a third spectral library which consisted of the following proteins in the training and test sets: caspase, conalbumin, concanavalin, human serum albumin, hemoglobin, lactalbumin, lysozyme, myoglobin, myosin, papain, pepsin, ribonuclease B, subtilisin, and trypsin inhibitor. The validation set consisted of spectra for the proteins chymotrypsin, cytochrome C, and ribonuclease A. The spectra in the training, test, and validation sets were bootstrapped and the regression parameters for the training and test sets were GA optimized. The best set of regression parameters was 18 factors and 143 wavenumbers within the amide I-III, A, and B and the C-H stretching regions (refer to Figure 2.14). 86

105 Absorbance Wavenumber Figure Plot of the single-pass ATR-FTIR spectra of the proteins hemoglobin (solid) and papain (dashed) at the GA optimized wavenumbers. The third trial resulted in the best overall predictions. The detailed results are given in Tables Table 2.5. Training set predictions results for the third spectral library and the third set of regression parameters. True Alpha- Helix Content Predicted Alpha- Helix Content Residuals True Beta- Sheet Content Predicted Beta- Sheet Content Residuals Training Set Caspase Conalbumin Concanavalin Human Serum Albumin Hemoglobin Lactalbumin

106 Lysozyme Myoglobin Myosin Papain Pepsin Ribonuclease B Subtilisin Trypsin Inhibitor Trypsinogen AVERAGE ERROR Table 2.6. Test set predictions results for the third spectral library and the third set of regression parameters. True Alpha- Helix Content Predicted Alpha- Helix Content Residuals True Beta- Sheet Content Predicted Beta- Sheet Content Residuals Test Set Caspase Conalbumin Concanavalin Human Serum Albumin Hemoglobin Lactalbumin Lysozyme Myoglobin Myosin Papain Pepsin Ribonuclease B Subtilisin Trypsin Inhibitor Trypsinogen AVERAGE ERROR

107 Table 2.7. Validation set predictions results for the third spectral library and the third set of regression parameters. True Alpha- Helix Content Predicted Alpha- Helix Content Residuals True Beta- Sheet Content Predicted Beta- Sheet Content Residuals Validation Set Chymotrypsin Cytochrome C Ribonuclease A AVERAGE ERROR The secondary structure content of casein and IgG were predicted using the third model for α-helix content prediction and β-sheet content prediction. The α-helix content of these proteins was determined to be 44.5 and 13.9%, whereas the β-sheet content was determined to be 31.8 and 44.0%, respectively. These predictions are a lot different from those given for the first and second PCR models. To further validate the results given from the third model, the α-helix content and β-sheet content was calculated for bovine serum albumin. The α- helix content was determined to be 47.3% and the β-sheet content was determined to be - 4.4%. Bovine serum albumin is homologous to human serum albumin. A calculation performed using the homology modeling toolbox within the program Insight II found bovine serum albumin to be 72.2% identical to human serum albumin. This would result in a α- helix content of 48.9% and a b-sheet content of 0%. The residuals are 1.6% and -4.4% for the α-helix content and β-sheet content, respectively. Due to this result as well as the validation results, the third model was established as the most accurate. 89

108 2.4 Conclusions This study establishes the proof of principle that the spectra of proteins in solution acquired by single-pass ATR-FTIR can provide estimates of α-helical and β-sheet content comparable to studies that use other FTIR methods. Multivariate models have been applied primarily to the amide I and II regions in both transmission FTIR and multi-pass ATR-FTIR protein spectra. (17, 18, 44, 45) Spectral enhancement of these and other amide bands occurs when using the single-pass ATR-FTIR method since the protein sample is continuously monitored by FTIR in various hydration states. (46) The single-pass ATR-FTIR method permits the inclusion of more amide regions in the calibration model but raises the issue of protein denaturation. The resulting α-helical and β-sheet predictions, comparable to methods given in the literature on transmission FTIR and multi-pass ATR-FTIR spectra, substantiate that protein denaturation does not occur on the time scale required for secondary structure determination by single-pass ATR-FTIR. This study produced better α-helical and β-sheet predictions than those in previous studies. (6, 16-19, 44) The improved predictions are likely due to the inclusion of a greater number of spectral regions, which can be acquired using the single-pass ATR-FTIR technique. (46) The multivariate analysis methods of the three spectral libraries combined with the single-pass ATR-FTIR technique suggest future work that can be based upon existing techniques which sort proteins into classes. (47) The methods developed in this study can then be used to determine secondary structure with greater accuracy in each class. The fact that single-pass ATR-FTIR has the potential to be automated suggests that spectral libraries sorted by class may become an important tool for proteomic analysis. 90

109 2.5 References 1. Pelton, J.T. and L.R. McLean, Spectroscopic Methods for Analysis of Protein Secondary Strucutre. Analytical Biochemistry, : p Maggio, E.T. and K. Ramnarayan, Drug Discovery Today, : p Lehninger, A.L., D.L. Nelson, and M.M. Cox, Principles of Biochemistry. 2nd ed. 1993, New York: Worth Publishers. 4. Holde, K.E.v., W.C. Johnson, and P.S. Ho, Principles of Physical Biochemistry. 1998, Upper Saddle River: Prentice Hall. 5. Keiderling, T., Vibational Circular Dichroism of Peptides and Proteins, in Infrared and Raman Spectroscopy of Biological Materials, B. Yan and H. Gremlich, Editors. 2001, Marcel Dekkar. 6. Susi, H. and D. Byler, Resolution-Enhanced Fourier Transform Infrared Spectroscopy of Enzymes. Methods in Enzmology, : p Susi, H., Infrared Spectroscopy - Conformation. Methods in Enzymology, (26): p Jencks, W., Infrared Measurements in Aqueous Media. Methods in Enzymology, (125): p Krimm, S., Infrared Spectra and Chain Conformation of proteins. Journal of Molecular Biology, : p Krimm, S. and Y. Abe, Intermolecular Interaction Effects in the Amide I Vibrations of β Polypeptides. Proceedings of the National Academy of Sciences, (10): p

110 11. Baello, B., P. Pancoska, and T. Keiderling, Enhanced Prediction Accuracy of Protein Secondary Structure Using Hydrogen Exchange Fourier Transform Infrared Spectroscopy. Analytical Biochemistry, : p Miyazawa, T., T. Shimanouchi, and S. Mizushima, Characteristic Infrared Bands of Monosustituted Amides. Journal of Chemical Physics, (2): p Miyazawa, T., Perturbation Treatment of the Characteristic Vibrations of Polypeptide Chain in Various Configurations. Journal of Chemical Physics, (6): p Susi, H., D. Byler, and J. Purcell, Estimation of β-structure Content of Proteins by Means of Deconvolved FTIR Spectra. Journal of Biochemical and Biophysical Methods, : p Parker, F.S., Applications of Infrared Spectroscopy in Biochemistry, Biology, and Medicine. 1971, New York: Plenum Press. 16. Lee, D.C., et al., Determination of Protein Secondary Structure Using Factor Analysis of Infrared Spectra. Biochemistry, : p Douseeau, F. and M. Pezolet, Determination of the Secondary Structure Content of Proteins in Aqueous Solutions from their Amide I and Amide II Infrared Bands. Comparison between Classical and Partial Least-Squares Methods. Biochemistry, : p Baumruk, V., P. Pancoska, and T. Keiderling, Predictions of Secondary Strucutre using Statistical Analyses of Electronic and Vibrational Circular Dichroism and 92

111 Fourier Transform Infrared Spectra of Proteins in. Journal of Molecular Biology, : p Vedantham, G., et al., A Holistic Approach for Protein Secondary Strucutre Estimation from Infrared Spectra in Solutions. Analytical Biochemistry, : p Chittur, K., FTIR/ATR for Protein Adsorption to Biomaterial Surfaces. Biomaterials, : p Pribic, R., Principal Component Analysis of Fourier Transform Infrared and/or Circular Dichroism Spectra of Proteins Applied in a Calibration of Protein Secondary Structure. Analytical Biochemistry, : p Kowalski, B.R., Chemometrics: Mathematics and Statistics in Chemistry. 1983, Dordrecht: D. Reidel Publishing Company Malinowski, E.R., Factor Analysis in Chemistry. 1991, New York: John Wiley and Sons, Inc Gemperline, P., Chemometrics Short Course, Lucasius, C.B., M.L.M. Beckers, and G. Kateman, Genetic algorithms in wavelength selection: a comparative study. Analytica Chimica Acta, : p Smith, B. and P. Gemperline, Bootstrap methods for assessing the performance of Near-infrared pattern classification techniques. Journal of Chemometrics, : p Voet, H.V.D., M.J. Coenegracht, and J.B. Hemel, Analytica Chimica Acta, : p

112 28. Efron, B., Bootstrap Methods: Another Look at the Jackknife. Ann. Stat., : p Efron, B. and R.J. Tibshirani, An Introduction to the Bootstrap. 1993, New York: Chapman & Hall. 30. Park, D. and T. Willemain, Computational Statistics and Data Analysis, : p Meinrath, G., Computer-Intensive Methods for Uncertainty Estimation in Complex Situations. Chemometrics and Intelligent Laboratory Systems, (2): p Milan, L. and J. Whittaker, Application of the Parametric Bootstrap to Models that Incorporate a Singular Value Decomposition. Journal of the Royal Statistical Society (Applied Statistics), (1): p Hinkley, D.V., Journal of the Royal Statistical Society, Series B (Methodological), : p Furusjo, E. and L. Danielsson, Uncertainty in Rate Constants Estimated from Spectral Data with Baseline Drift. Journal of Chemometrics, : p Efron, B., Journal of the Royal Statistical Society, Series B (Methodological), : p Conlin, A.K., E.B. Martin, and A.J. Morris, Confidence Limits for Contribution Plots. Journal of Chemometrics, : p Leardi, R., R. Boggia, and M. Terile, Genetic Algorithms as a Strategy for Feature Selection. Journal of Chemometrics, : p

113 38. Cong, P. and T. Li, Numeric Genetic Algorithm Part I. Theory. Analytica Chimica Acta, : p Leardi, R., Journal of Chemometrics, : p Bangalore, A.S., et al., Analytical Chemistry, : p Arcos, M.J., et al., Analytica Chimica Acta, : p Jouan-Rimbaud, D., et al., Analytical Chemistry, : p Depczynski, U., V.J. Frost, and K. Molt, Genetic Algorithms applied to the selection of factors in principal component regression. Analytica Chimica Acta, : p Wi, S., P. Pancoska, and T.A. Keiderling, Predictions of Protein Secondary Structures using Factor Analysis on Fourier Transform Infrared Spectra: Effect of Fourier Self-Deconvolution of the Amide I and Amide II Bands. Biospectroscopy, : p Pribic, R., van Stokkum, I. H. M., Chapman, D., Haris, P. I., and Bloemendal, M., Protein Secondary Structure from Fourier Transform Infrared and/or Circular Dichroism. Analytical Biochemistry, : p Smith, B.M. and S. Franzen, Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in H2O Solution. Analytical Chemistry, (16): p Sreerama, N. and R.W. Woody, Protein Secondary Structure from Circular- Dichroism Spectroscopy - Combining Variable Selection Principal and Cluster- 95

114 Analysis with Neural-Network, Ridge-Regression, and Self-Consistent Methods. Journal of Molecular Biology, (4): p

115 Investigation of Amino Acids and Their Role the Single-Pass ATR-FT-IR Spectra of Peptides and Proteins 3.1 Introduction An understanding of amino acids only furthers the comprehension of protein structure and function. The study of amino acids via FT-IR spectroscopy, particularly in the region from 1400 to 1800 cm -1, is well established in the literature. (1, 2) Amino acid side chains, in addition to amide I and amide II bands, absorb in this region. Thus, attention has been focused on the effects of amino acid side chain absorption on the amide I and amide II lineshapes. (1, 2) If amino acid side chains obscure this region, then estimates of secondary structure solely based on amide I and amide II integrated peak areas are inaccurate. It is reported in the literature that anywhere from 20% to 30% of the integrated amide I peak area is due to amino acid side chain absorption. (1, 2) This not a significant concern for the protein secondary structure prediction model presented in Chapter 2 since protein secondary structure is based on several regions rather than just amide I. However, this poses an interesting question to the affect of amino acid side chain absorption on the FT-IR spectra of both peptides and proteins and will be the primary focus of the research presented in this chapter. Since the single-pass ATR-FT-IR method presented in this research yields more spectral information than multi-pass ATR-FT-IR and transmission FT-IR, amino acid side chain absorption in other regions of the mid-infrared spectrum can be established. The acquisition of single-pass ATR-FT-IR spectra for the 20 common amino acids (given in Table 3.2) 97

116 coupled with simulation of theoretical amino acid spectra via normal mode analysis can establish amino acid absorption trends in other regions of the mid-infrared spectrum. With more spectral information, an amino acid recognition library can be constructed. The aim is to use experimental and the theoretical mid-infrared spectra to establish trends characteristic to each individual amino acid. Determination of frequencies unique to an individual amino acid can serve as a mode for amino acid recognition. The ultimate goal is to use this information to develop a classification model, also known as a pattern recognition model, which correctly differentiates each amino acid. With the development of such an amino acid classification model, the determination of peptide constituents is in due course desired Pattern Recognition Pattern recognition is a mathematical technique that predicts a property of an object based upon indirectly measured properties. (3, 4) This is an old concept that has many classic uses such as fingerprint identification, stock market analysis, weather forecasting, and speech recognition. Pattern recognition was first applied to analytical chemistry in the late 1960 s as a method for classifying samples. (3) Pattern recognition begins with the construction of a training set; in this study the training set is a collection of single-pass ATR- FT-IR amino acid spectra. Multivariate statistical analysis is performed on the training set to develop patterns that represent each class, or amino acid, in the training set. For pattern recognition to work properly, there must be a unique pattern for each class within the training set. (3) A set of unknown samples, referred to as the test set, is classified based upon comparison of the patterns defined by the training set. (5) 98

117 The training data set is an n x m matrix, where m represents the number of measurements made for each sample, n. Thus, a data matrix has n vectors, known as pattern vectors, in m- dimensional space. For illustration, the coordinates of a pattern vector in m-dimensional space can be expressed in a new coordinate system defined by a set of orthonormal basis vectors in the lower dimensional sub-space by using linear methods. Linear methods produce projections where the axes used to plot the data are linear combinations of the original data matrix. (4) As seen in Figure 3.1, the projection of the pattern vector a onto the plane defined by x and y is given as a. To find the coordinates of any vector in a normalized basis, one simply forms the inner product as shown in Equations 3.1 and 3.2. The new vector a therefore has the coordinates given by Equations 3.1 and 3.2 in the two dimensional plane defined by x and y. 99

118 Figure 3.1. Projection of a point in a higher dimensional space onto a 2-dimensional subspace (6) a = a T y (3.1) 1 a 2 = a T y (3.2) This is illustrated in Figure 3.2 for a test set of the amino acid alanine where each circle in the plot represents a single-pass ATR-FTIR spectrum projected into a two dimensional subspace. 100

119 n e t m pon C o p al i P rinc Principal Component 1 Figure 3.2. Scatter plot of principal component 1 versus principal component 2 for the amino acid alanine where each circle in the plot represents a single-pass ATR-FTIR spectrum projected into a two dimensional subspace. The red ellipse represents the 95% confidence interval and the blue ellipse represents the 99% confidence interval. The spectra of alanine tend to cluster about a mean, known as the centroid. Confidence intervals have been drawn in Figure 3.2 where the outer ellipse will enclose 99% (blue) of the measurements and the inner ellipse will enclose 95% (red) of the measurements. An unknown sample is classified based upon its probability density as either belonging to a class (a member) or not belonging to a class (a non-member or an outlier). A non-member may belong to a class, but not to the class in question and an outlier does not necessarily belong to any class. (7) As seen in Figure 3.2, one sample is within the 95% and the 99% confidence interval while another is outside the 99% confidence interval. These two samples are examples of either a non-member or an outlier. 101

120 3.1.2 Mahalanobis Distance When performing the Mahalanobis distance classification method, it is assumed that the population for each class follows a multivariate normal distribution. When a class model is produced, the result is its centroid, a single point, x ij, in multidimensional space and its variance-covariance matrix, Σ j, which describes the scatter and orientation of the training set. The generalized squared distance between the ith sample and the centroid for a given class is defined in Equation 3.3 where is Σ the variance-covariance matrix of the training set. ( x µ ) Σ 1 ( x ) T d (3.3) 2 = µ i i i The variance-covariance matrix is a p x p matrix, where p represents the number of dimensions, and it describes the variance associated with each measured variable, p. The variance, σ ii, and the covariance, σ ij, are defined by the following equations: ( x x ) n 1 = σ ii, ki, i n 1 k = 1 2 (3.4) ( x x )( xk j x j ) 1 = (3.5) 1 1 n σ i, j n k = k, i i, The true centroid, µ, and the true variance-covariance matrix, Σ, of a class population are unknown and must be estimated via the mean vector, x, and the variance-covariance matrix, S, from a sample of size n, respectively. (6) Substituting these estimates into Equation 3.3, the Mahalanobis distance for that sample can be calculated by Equation 3.6. (5) 102

121 D 2 i = T 1 ( x x) S ( x x) i i (3.6) In this study, the mean-corrected training set of single-pass ATR-FT-IR amino acid spectra is decomposed into a matrix of scores and a matrix of eigenvectors. The vector of scores is trimmed to a number of factors, k, which adequately describes the variance of the training amino acid data set. A vector of k scores, u i, gives the location of the ith sample in the principal component subspace. Substituting u i for sample x i, the variance-covariance matrix is defined by Equation 3.7. S 1 = u u u u k T ( ) ( i ) ( i ) n 1 j = 1 (3.7) The principal component scores are orthogonal thus the variance-covariance matrix is a diagonal matrix (Equation 3.8). The mean of the principal component scores for each factor is equal to zero since the data was mean-corrected (Equation 3.9). Taking these two factors into consideration, the variance-covariance calculation can be simplified. s jj = n 1 ( ) ( tij ) n 1 i= 1 2 (3.8) s i j i, ( j ) = 0 (3.9) Therefore, the Mahalanobis distance for the ith sample in the class population is calculated via Equation (5) 103

122 D 2 i 1 T = u is u i (3.10) k 2 u 2 i, j D i = ( n 1) (3.11) s j= 1 j, j The Mahalanobis distance class model is an ellipsoidal shaped cluster with the population mean at the centroid. The distance, D 2 i, has a χ 2 distribution with k degrees of freedom. The probability, pr, of a sample in the class population belonging to that class is estimated using Hotellings T 2 -test. This test is used to calculate the confidence level for accepting the null hypothesis, which states that a sample observation is a probable value for the centroid (u i = µ), versus the alternate hypothesis (u i µ). ( n k) k( n 1) 2 pr = 1 P D > F k, n k ( α ) (3.12) In Equation 3.12, α is the probability level for critical values of F, k is the number of factors, and n is the number of training samples. The probability of class membership for the given sample can be estimated by using its distance and the appropriate degrees of freedom to numerically integrate the F-distribution. (5, 7) A Matlab subroutine, f_dist.m, located in the MATLAB toolbox is used to make this computation. The probability of class membership for a sample is predicted using its distance and the appropriate degrees of freedom, n-k-1, to numerically integrate the F-distribution. A probability is estimated for each class in the training model. In this study, 20 classes are used in the training model; therefore, 20 probabilities are computed for each sample. The sample is assigned to the class that yields the highest probability, e.g. refer to Table

123 Table 3.1. Example indicating how a sample is assigned to a given class based upon its probability density True Class Probability Density of Belonging to Class 1 (alanine) Probability Density of Belonging to Class 2 (arginine) Probability Density of Belonging to Class 3 (asparagine) Found Class Experimental Details Single-Pass ATR-FT-IR Analysis The amino acids and peptides listed in Table 3.2 were purchased from Sigma and were prepared without further purification to a final concentration of approximately 10-mM in. 105

124 Table 3.2. Amino acids and peptides used in this study as well as the number of single-pass ATR-FT-IR spectra included in the amino acid spectral library. Amino Acid Number of Spectra Used in Amino Acid Library Abbreviation Alanine 4 ALA Arginine 5 ARG Asparagine 6 ASN Aspartic Acid 11 ASP Cysteine 5 CYS Glutamine 6 GLN Glutamic Acid 5 GLU Glycine 5 GLY Histidine 5 HIS Isoleucine 5 ILE Leucine 5 LEU Lysine 5 LYS Methionine 7 MET Phenylalanine 5 PHE Proline 2 PRO Serine 5 SER Threonine 8 THR Tryptophan-Leucine 12 TRP Tyrosine 5 TYR Valine 10 VAL Alanine-Aspartic Acid 10 ALAASP Tryptophan-Leucine 20 TRPLEU Alanine-Proline-Glycine 15 ALAPROGLY Phenylalanine-Glycine-Glycine 10 PHEGLYGLY The experimental apparatus used has been described in the Experimental Details of Chapter 2. A µL amino acid sample was injected onto the Teflon block using a Wheaton pipette. The spectra were recorded at ambient temperature, were averaged over 64 scans, and were recorded with a resolution of 2 cm -1 in the range of cm -1. Background spectra 106

125 were obtained subsequently. Amino acid spectra were recorded continuously after the sample was deposited onto the Teflon block until a concentrated gel had formed onto the Ge internal reflective element (IRE). The Ge IRE was rinsed with and allowed to dry prior to loading a subsequent protein sample Normal Mode Analysis DMol3 was used to perform density functional theory (DFT) calculations for the 20 amino acids listed in Table 3.2 and for charged lysine, aspartic acid, and glutamic acid. An SGI/Cray Origin 2400 located at the North Carolina Supercomputer Center (NCSC) was used for all calculations. The twenty-three amino acids were geometry optimized using the generalized gradient approximation (GGA) of Perdew and Wang (8) as implemented in DMol3 (Accelerys Inc.). (9, 10) Optimization was performed until a convergence criterion of less than 10-6 a.u. change in the energy and a change of less than Bohrs per iteration was satisfied. Optimization was followed by a vibrational frequency calculation done by the finite difference method. (9, 10) In addition to the frequency calculation, the corresponding intensities were determined via a method that uses the Mulliken charge to estimate the change in the dipole moment. (11) The results of the vibrational frequency calculation were then converted into Gaussian waveforms (G(ν) = 1/ 2πσ exp{-(ν-ν 0 ) 2 /2σ 2 } using σ = 10 cm - 1. A second set of calculations was performed on the 23 amino acids; however, a dielectric continuum was used to simulate amino acids in water solution. 107

126 3.2.3 Amino Acid Classification Routine All amino acid spectral data were acquired using the software package Win-IR-Pro v2.97 (Digilab). The spectral range of cm -1 was used for analysis. The singlepass ATR-FT-IR spectra were water vapor subtracted, baseline-corrected, and consolidated into an amino acid library using the software package Igor-Pro v3.12. The amino acid spectra were bootstrapped yielding 100 spectra for each amino acid. Half were used in the training set whereas the other half was used in the test set. Upon construction of the training and test sets, additional single-pass ATR-FT-IR amino spectra for alanine and tryptophan were acquired and used as a validation set. The program classify.m was written to perform the pattern recognition procedure (available in Appendix). The training and test sets were passed to the function in the form of an n x m matrix where each row, n, contains the absorbance values over a given wavelength range. In order to distinguish which single-pass ATR-FT-IR spectrum belongs to a given class, a column vector that identifies the class of the particular spectrum was also passed to the program classify.m. The training sets were used to produce statistical models and the test sets were used to test the effectiveness of the statistical models. Given a probability threshold, the classifying program calculated the probability densities of samples in both the training and test sets. The probability densities for the training and test sets are calculated via a subroutine, known as pr_d.m that is given in Appendix. The probability densities were determined from the Mahalanobis distance method; the corresponding code, mahald_2.m is available in the Appendix. Since there were 1000 samples and 20 different classes in both the training and test sets, the probability densities for the training set were returned in a 1000 x 20 matrix. The column index of the 108

127 maximum probability density for a given sample identified the sample class. Since the training set and the test set classifications were known, a confusion matrix was constructed that represented the percentage of samples correctly and incorrectly classified. The columns in the confusion matrix indicate the true class and the rows in the confusion matrix indicate the found class. Table 3.3. Example of a confusion matrix true found Alanine Arginine Asparagine Aspartic Acid Alanine 100% Arginine 0 98% 2% 0 Asparagine 0 5% 95% 0 Aspartic Acid % 3.3 Results and Discussion Experimental and Theoretical Amino Acid Spectra The acquisition of single-pass ATR-FT-IR spectra and the simulation of two sets of amino acid spectra, where one had no solvent and the other used a dielectric continuum to mimic the amino acid in solution, were successful. The high and low frequency regions of the single-pass ATR-FT-IR spectra along with the simulation spectra are presented in Figures located in section 6.10 of the Appendix. In addition to the plots, frequency tables were constructed that compare the theoretical results. The resulting eigenvectors from the normal mode analysis of the amino acids without solvent were mapped to the resulting eigenvectors of the amino acids with the dielectric continuum (refer Tables in section 6.11 of the Appendix). The mapping of the two sets of eigenvectors provided insight on the effect of solvent on the peak frequencies. Additionally, the high correlation of one set 109

128 of eigenvectors to another validated the method of using a dielectric continuum to mimic solvent Amino Acid Classification Results Upon completion of the amino acid spectral library, a classification routine based upon the Mahalanobis distance method was implemented. Initially the spectral range from 700 to 2250 cm -1 and 2400 to 4200 cm -1 was used. Wavenumbers 2251 to 2399 cm -1 were excluded since CO 2 absorbs in this region. Since each amino acid spectrum was remarkably unique, classification was relatively easy. When using the given spectral range and 16 factors, 100% classification was achieved for 19 of the 20 amino acids. Only 98% of the phenylalanine test samples were correctly classified where the remaining 2% were incorrectly classified as alanine. Both the amino acid single-pass ATR-FT-IR spectral library and the vibrational frequency calculations indicated that the lower frequency range ( cm -1 ) had the most significant and distinguishing spectral information. Thus, the classification procedure was performed using the spectral region from 700 to 1800 cm -1. When 3 to 16 factors were included in the classification model, 100% correct classification was achieved for all 20 amino acids. The dependency of wavenumber selection on correct classification rates was explored by using the wavenumbers from the experimental and theoretical data that were exclusive to a single amino acid. For example methionine has a peak at 953 cm -1 whereas no of the other amino acids have a peak within ±4 cm -1. Careful wavenumber selection, such as this, enhances the selectivity of the model. Forty-five wavenumbers were found to be unique and were used in the classification model. Once again 100% correct classification was 110

129 achieved for all amino acids when 16 to 3 factors were used. A validation set consisting of the amino acids alanine and tryptophan were classified using the 45 wavenumbers and 16 factors. The alanine validation spectra were correctly classified but not all of the tryptophan validation spectra were correctly classified. The classification procedure was repeated using three factors rather than 16 and 100% correct classification was achieved for both alanine and tryptophan Peptide Classification Results The classification model, which utilized 45 wavenumbers and 3 factors, was applied to the four peptides listed in Table 3.2. Disappointingly, the peptides were not classified as any of their constituents. In fact, the calculated Mahalanobis distance from the centroid of the given peptide to that of each individual amino acid was so excessive that the probabilities for each peptide belonging to any class of amino acid was zero. Even with a minimal number of wavenumbers and factors in the classification model, the formation of a peptide bond significantly affected the amino acid side chain frequencies such that they were unrecognizable. Thus, the idea that a peptide or a protein mid-infrared spectrum in the region from 1400 to 1800 cm -1 is the sum of its constituent amino acids side chains was determined to be void. Further analysis of the peptides indicated that in the presence of an amide bond, the amino acid side chains peaks shifted such that they are no longer recognizable relative to the original amino acid spectrum. The dipeptide alanine-aspartic acid single-pass ATR-FT-IR spectra were combined with alanine and aspartic acid single-pass ATR-FT-IR spectra. These three sets of spectra were decomposed via the singular value decomposition function to 111

130 determine if the component spectra sum to the peptide spectrum, unfortunately this was not the case. The first principal component from this analysis represented the dipeptide whereas the second and third eigenvectors represented alanine and aspartic acid, respectively. The results are given in Figures and indicate that the original amino acid spectra are obscured in the dipeptide. The second principal component trace differs from that of the alanine spectrum in the region from 1150 to 1300 cm -1 and 1450 to 1550 cm -1. The third component trace slightly differs from that of the aspartic acid spectrum, more specifically the peaks at 1350 cm -1 and 1425 cm -1 are shifted. Although the ability to determine the amino acid composition of peptides is ideal, it cannot be achieved by eigenvector analysis, such as the classification technique presented earlier, since the first eigenvector accounted for 94.90% of the information content of the system. The two principal components that represented the amino acids accounted for only 3.56% and 1.53% of the information content, respectively. 112

131 Absorbance Wavenumber Figure 3.3. The black dashed single-pass ATR-FT-IR spectrum represents the dipeptide alanine-aspartic acid and the red solid spectrum represents the first principal component in the ALA-ASP, ALA, and ASP data set Absorbance Wavenumber Figure 3.4. Black dashed single-pass ATR-FT-IR spectrum represents alanine and the blue solid spectrum represents the second principal component in the ALA-ASP, ALA, and ASP data set. 113

132 Absorbance Wavenumber Figure 3.5. The black dashed single-pass ATR-FT-IR spectrum represents aspartic acid and the magenta solid spectrum represents the third principal component in the ALA-ASP, ALA, and ASP data set. The single-pass ATR-FT-IR spectrum of phenylalanine-glycine-glycine peptide is plotted in Figure 3.6 along with the first principal component trace, which was acquired by decomposing a data set containing phenylalanine-glycine-glycine spectra, phenylalanine spectra, and glycine spectra. Although the difference between the phenylalanine-glycineglycine spectrum and the first principal component is remarkable, the second and third principal components resemble phenylalanine and glycine respectively (Figures 3.7 and 3.8). In this system, the first eigenvector accounted for 87.35% of the information content of the system. The two principal components that represented the amino acids only accounted for 8.95% and 3.69% of the information content, respectively. Eigenvector analysis of the dipeptide aspartic acid-leucine and the tripeptide phenylalanine-glycine-glycine validated 114

133 that the peptide bond obscures the amino acid side chain peak frequencies as well as significantly masks these peaks Absorbance Wavenumber Figure 3.6. The black dashed single-pass ATR-FT-IR spectrum represents the tripeptide phenylalanine-glycine-glycine and the red solid spectrum represents the first principal component in the PHE-GLY-GLY, PHE, and GLY data set. 115

134 Absorbance Wavenumber Figure 3.7. Black dashed single-pass ATR-FT-IR spectrum represents phenylalanine and the blue solid spectrum represents the second principal component in the PHE-GLY-GLY, PHE, and GLY data set Absorbance Wavenumber Figure 3.8. The black dashed single-pass ATR-FT-IR spectrum represents glycine and the magenta solid spectrum represents the third principal component in the PHE-GLY-GLY, PHE, and GLY data set. 116

135 3.4 Conclusion The single-pass ATR-FT-IR spectra of the 20 common amino acids were compared to theoretical spectra calculated from normal mode analysis that employed density functional theory. The peak frequencies of the two sets of theoretical spectra differed significantly but eigenvector mapping showed that there was high correlation between the two data sets. Despite the high correlation between the two theoretical data sets, the normal mode analysis that employed the dielectric continuum more closely resembled the experimental spectra. The theoretical spectra, in both cases, resembled the high frequency region of the single-pass ATR-FT-IR spectrum more than the low frequency region. The theoretical peak frequencies combined with the experimental spectra aided in the selection of wavenumbers unique to individual amino acids. From this, a classification method was developed that employed the Mahalanobis distance method. Although the determination of the amino acid composition of peptides was unsuccessful, a routine for the rapid identification of amino acids was established. 3.5 References 1. Venyaminov, S.Y. and N.N. Kalnin, Quantitative IR spectrophotometry of Peptide Compounds in Water () Solutions. I. Spectral Parameters of Amino Acid Residue Absorption Bands. Biopolymers, : p Rahmelow, K., W. Hubner, and T. Ackermann, Infrared Absorbances of Protein Side Chains. Analytical Biochemistry, : p Massart, D.L., et al., Chemometrics: a textbook. 1988, Asterdam: Elsevier

136 4. Sharaf, M.A., D.L. Illman, and B.R. Kowalski, Chemometrics. 1986, New York: John Wiley and Sons. 5. Shah, N.K. and P.J. Gemperline, Combination of the Mahalanobis Distance and Residual Variance Pattern Recognition Techniques for Classification of Near- Infrared Spectra. Analytical Chemistry, : p Gemperline, P., Chemometrics Short Course, Gemperline, P.J., et al., Near-IR Detection of Polymorphism and Process-Related Substances. Analytical Chemistry, : p Perdew, J.P., et al., Atoms, molecules, solids and surfaces - Applications of the generalized gradient approximation for exchange and correlation. Phys. Rev. B, : p Delley, B., J. Chem. Phys., : p Delley, B., From molecules to solids with the DMol(3) approach. J. Chem. Phys., : p Franzen, S., The Effect of a Charge Relay on the Vibrational Frequencies of Carbonmonoxy Iron Porphine Adducts. J. Am. Chem. Soc., : p Franzen, S., An electrostatic model for the frequency shifts in the carbonmonoxy stretching band of myoglobin: correlation of hydrogen bonding and the Stark tuning rate. J. Am. Chem. Soc., in Press. 118

137 13. Franzen, S., Use of Periodic Boundary Conditions to Calculate Accurate β-sheet Frequencies Using Density Functional Theory

138 Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Conformational Analysis of Peptides 4.1 Introduction Conformational analysis of peptides that are functionally relevant yields critical information on interactions in native proteins that affect folding, targeting and processing. Such studies are also crucial for the analysis of bioconjugates used in drug delivery. Comparisons of peptides in free solution and those associated with proteins have been made by a variety of methods, such as NMR, X-ray crystallography, circular dichroism (CD), fluorescence, FT-IR and molecular modeling. (1-4) Although infrequently used, the singlepass ATR-FT-IR method has wide application in the study of biopolymers. Thus, this investigation will focus on the application of single-pass ATR-FT-IR for the study of peptide-protein interactions in both a procaspase-3 system and a BSA-peptide system Procaspase-3 Cellular death, known as apoptosis, is triggered by caspase activation. (5-7) One interesting model system for study by single-pass ATR-FT-IR is procaspase-3, which is one of fourteen known caspases (7). As shown in Figure 4.1, procaspase-3, a pro-less variant, the pro-peptide, and a mixture of the pro-less variant and the pro-peptide were investigated. The pro-less variant does not include the 28-amino acid pro-peptide. This mutant was used to determine if the removal of the pro-peptide affects procaspase-3. Since structure is correlated with function, there are functional consequences for the study of the structure of the pro-peptide in free solution and of the bound pro-peptide by single-pass ATR-FTIR spectroscopy. 120

139 Procaspase-3 Pro Large Subunit Small Subunit Pro-less Variant Large Subunit Small Subunit 28 Amino Acid Pro-Peptide Pro Mixture of Pro-peptide and Pro-less Variant Pro + Large Subunit Small Subunit Figure 4.1. An illustration of procaspase-3, the pro-less variant, the pro-peptide, and the pro-peptide/pro-less variant mixture used in this study BSA Bioconjugates Five BSA-peptide bioconjugates (refer to Figure 4.2) will be investigated via single-pass ATR-FT-IR to yield insight on targeting function. Intracellular targeting BSA-peptide bioconjugates occurs by docking of the targeting peptides with certain cellular proteins; however, the nature of such interactions is poorly understood. (8) The goal of this study is to yield insight on intracellular targeting interactions. This will be accomplished by secondary structure determination of the five targeting peptides (given in Table 4.1) when in free solution and when bound to BSA. The targeting peptides are derived from the fiber protein of the adenovirus. They are widely used in the studies of DNA delivery for gene therapy. The conjugation of these peptides to BSA provides a new targeting vehicle for studies of intracellular targeting of proteins and protein conjugates with nanoparticles. 4.2 Experimental Details Procaspase-3, the pro-less variant, and the pro-peptide were prepared in the Biochemistry Department at North Carolina State University (refer to Figure 4.1). Specific 121

140 details of the sample preparation are described elsewhere. (6) The BSA-peptide bioconjugates used in this study were constructed as illustrated in Figure 4.2 using the peptide sequences given in Table 4.1. Further details of the BSA-peptide bioconjugates preparation are also given elsewhere. (8) O BSA NH 2 + O N O O N O O O O N BSA NH O + Peptide SH O O S Peptide BSA NH N O BSA-Peptide Bioconjugate Figure 4.2. A schematic of the BSA-peptide bioconjugates preparation where 3-maleimido benzoic acid N-hydroxysuccinimide ester (MBS) is added to BSA and is followed by the addition of a cysteine-terminated peptide. 122

141 Table 4.1. Listing of all the peptide sequences used in the BSA-peptide bioconjugates study. Peptide # Peptide Sequence Origin Molar Mass 3 -NH2-CGGFSTSLRARKA Adeno NLS NH2-CKKKKKKSEDEYPYVPN Adeno RME 7 -NH2-CKKKKKKKKKKKKKKKK RME, DNA binding NH2-KKKKKKKSEDEYPYVPNFSTSLRARKA #3 + # NH2-GALFLGAAGSTMGAWSQPKSKRKVC RME-NLS 2552 combo All single-pass ATR-FT-IR were recorded at ambient temperature and averaged over 64 scans on a Digilab FTS 6000 FTIR spectrometer equipped with a liquid nitrogen-cooled MCT in a UMA500 microscope. Further single-pass ATR-FT-IR experimental details are given in Chapter 2. All spectral data were acquired using the software package Win-IR-Pro v2.97 (Digilab) and data analysis was performed using Igor-Pro v Results Procaspase-3 Representative spectra of procaspase-3, the pro-less variant, and the pro-peptide are given in Figure 4.3. As seen, the spectrum of the pro-less variant slightly differs from that of procaspase-3. In the Amide I region, both spectra have peaks at 1636 and 1645 cm -1. The Amide II regions of both spectra have peaks at 1516 and 1542 cm -1. These signatures are representative of proteins with both α-helical and β-sheet secondary structure. The similarity 123

142 of the procaspase-3 and the pro-less variant spectra suggests that the presence of the propeptide only weakly affects the secondary structure of procaspase Absorbance x Energy (cm -1 ) Figure 4.3. The blue solid single-pass ATR-FT-IR spectrum represents procaspase-3, the red dashed spectrum represents the pro-less variant, and the green dotted spectrum represents the pro-peptide. The pro-peptide spectrum was added to the pro-less variant spectrum for a comparison with the spectrum of a mixture of the pro-peptide and the pro-less variant (refer to Figure 4.4). As shown, the amide I position and the line-shape of the amide II band is significantly different for the two spectra. When compared to procaspase-3, it appears that an intermediate of the mixture spectrum and of the summed spectrum would resemble the procaspase-3 spectrum. This suggests that the pro-peptide interacts weakly with the pro-less variant in the mixture. 124

143 60 Absorbance x Energy (cm -1 ) Figure 4.4. The addition of the pro-less variant and the pro-peptide spectra is given as the dotted orange spectrum whereas the magenta dashed spectrum is of the mixture (pro-less variant and pro-peptide). The solid blue spectrum is procaspase-3. The difference between the summed spectra and the spectrum of the mixture suggests that the pro-peptide experiences a conformational change when in the presence of the proless variant. This is further shown by a comparison of the bound pro-peptide to the propeptide in free solution shown in Figure 4.5 where the difference between procaspase-3 and the pro-less variant resulted in the spectrum of the bound pro-peptide. As shown, there is an obvious difference between the two spectra in Figure 4.5. The bound pro-peptide spectrum has signatures at 1542, 1636, and 1645 cm -1, whereas the pro-peptide in fee solution has signatures at 1545 and 1649 cm

144 15x10-3 Absorbance Energy (cm -1 ) Figure 4.5. The dotted red spectrum represents the difference between procaspase-3 and the pro-less variant. This spectrum represents the bound pro-peptide whereas the solid orange spectrum is of the pro-peptide in free solution. Spectral correlations show that the bound pro-peptide has β-sheet structure and the pro-peptide in free solution is random coil. The application of a principal component regression model for secondary structure prediction, as described in Chapter 2, yielded the same results. The Mahalanobis distance between procaspase-3 and all of the proteins available in the spectral library (described in Chapter 3) was calculated and the results demonstrated that procaspase-3 was most similar to concanavalin, a highly β-sheet protein. Thus, the library suited for β-sheet prediction was applied to the pro-peptide spectrum and the difference spectrum (procaspase-3 pro-less variant). The pro-peptide in free solution had little α-helical and even less β-sheet structure, 11.6% and 1.3%, respectively. The bound 126

145 pro-peptide had 3.0% α-helical and 7.3% β-sheet structure. The decrease in α-helical structure and the increase in β-sheet structure suggest that the bound pro-peptide has more β- sheet structure than the pro-peptide in free solution, which is more random coil. The reason for the small amount of secondary structure determined by the PCR method may arise from the fact that the spectral library was based on studies of proteins, not short peptides such as the pro-peptide. Qualitatively it appears that there is significantly more secondary structure for the bound pro-peptide BSA-Peptide Bioconjugates Homology modeling performed in InsightII indicated that HSA and BSA are significantly similar in secondary structure (refer to Figure 4.6) where BSA was found to be 72.2% homologous to HSA. This results in 50.1% α-helix and 0.0% β-sheet for the secondary structure of BSA. The secondary structure prediction model developed in our laboratory also yielded results very comparable to those given for HSA, 52.1% α-helix and 7.0% β-sheet. 127

146 Figure 4.6. The magenta structure is human serum albumin (HSA) obtained from the Protein Data Bank (PDB ID 1BM0) and the blue structure is a homology model of bovine serum albumin (BSA) Although the protein spectral library has been trained for protein analysis, 0% α- helical content was predicted for all peptides given in Table 4.1, which is consistent with the results from the AGADIR algorithm. (9) The secondary structure prediction model developed in this study predicted extremely high β-sheet structure more than likely due to the formation of β-sheet aggregates. The secondary structure predictions for constructs #3, #5, #7, #10, and #11 given in Table 4.2 indicated that several peptides have adhered to BSA since there was a significant drop in β-sheet structure for all five constructs. 128

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Spring 2016 Protein Structure February 7, 2016 Introduction to Protein Structure A protein is a linear chain of organic molecular building blocks called amino acids. Introduction to Protein Structure Amine

More information

Introduction to proteins and protein structure

Introduction to proteins and protein structure Introduction to proteins and protein structure The questions and answers below constitute an introduction to the fundamental principles of protein structure. They are all available at [link]. What are

More information

Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in H 2 O Solution

Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in H 2 O Solution Anal. Chem. 2002, 74, 4076-4080 Single-Pass Attenuated Total Reflection Fourier Transform Infrared Spectroscopy for the Analysis of Proteins in H 2 O Solution Brandye M. Smith and Stefan Franzen* Department

More information

Amino Acids. Review I: Protein Structure. Amino Acids: Structures. Amino Acids (contd.) Rajan Munshi

Amino Acids. Review I: Protein Structure. Amino Acids: Structures. Amino Acids (contd.) Rajan Munshi Review I: Protein Structure Rajan Munshi BBSI @ Pitt 2005 Department of Computational Biology University of Pittsburgh School of Medicine May 24, 2005 Amino Acids Building blocks of proteins 20 amino acids

More information

Activities for the α-helix / β-sheet Construction Kit

Activities for the α-helix / β-sheet Construction Kit Activities for the α-helix / β-sheet Construction Kit The primary sequence of a protein, composed of amino acids, determines the organization of the sequence into the secondary structure. There are two

More information

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I

Biochemistry - I. Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I Biochemistry - I Prof. S. Dasgupta Department of Chemistry Indian Institute of Technology, Kharagpur Lecture 1 Amino Acids I Hello, welcome to the course Biochemistry 1 conducted by me Dr. S Dasgupta,

More information

Copyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings

Copyright 2008 Pearson Education, Inc., publishing as Pearson Benjamin Cummings Concept 5.4: Proteins have many structures, resulting in a wide range of functions Proteins account for more than 50% of the dry mass of most cells Protein functions include structural support, storage,

More information

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points.

This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is worth 2 points. MBB 407/511 Molecular Biology and Biochemistry First Examination - October 1, 2002 Name Social Security Number This exam consists of two parts. Part I is multiple choice. Each of these 25 questions is

More information

Biomolecules: amino acids

Biomolecules: amino acids Biomolecules: amino acids Amino acids Amino acids are the building blocks of proteins They are also part of hormones, neurotransmitters and metabolic intermediates There are 20 different amino acids in

More information

BIO 311C Spring Lecture 15 Friday 26 Feb. 1

BIO 311C Spring Lecture 15 Friday 26 Feb. 1 BIO 311C Spring 2010 Lecture 15 Friday 26 Feb. 1 Illustration of a Polypeptide amino acids peptide bonds Review Polypeptide (chain) See textbook, Fig 5.21, p. 82 for a more clear illustration Folding and

More information

Molecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions.

Molecular Biology. general transfer: occurs normally in cells. special transfer: occurs only in the laboratory in specific conditions. Chapter 9: Proteins Molecular Biology replication general transfer: occurs normally in cells transcription special transfer: occurs only in the laboratory in specific conditions translation unknown transfer:

More information

Chemical Nature of the Amino Acids. Table of a-amino Acids Found in Proteins

Chemical Nature of the Amino Acids. Table of a-amino Acids Found in Proteins Chemical Nature of the Amino Acids All peptides and polypeptides are polymers of alpha-amino acids. There are 20 a- amino acids that are relevant to the make-up of mammalian proteins (see below). Several

More information

Introduction to Protein Structure Collection

Introduction to Protein Structure Collection Introduction to Protein Structure Collection Teaching Points This collection is designed to introduce students to the concepts of protein structure and biochemistry. Different activities guide students

More information

Multiple-Choice Questions Answer ALL 20 multiple-choice questions on the Scantron Card in PENCIL

Multiple-Choice Questions Answer ALL 20 multiple-choice questions on the Scantron Card in PENCIL Multiple-Choice Questions Answer ALL 20 multiple-choice questions on the Scantron Card in PENCIL For Questions 1-10 choose ONE INCORRECT answer. 1. Which ONE of the following statements concerning the

More information

So where were we? But what does the order mean? OK, so what's a protein? 4/1/11

So where were we? But what does the order mean? OK, so what's a protein? 4/1/11 So where were we? We know that DNA is responsible for heredity Chromosomes are long pieces of DNA DNA turned out to be the transforming principle We know that DNA is shaped like a long double helix, with

More information

Properties of amino acids in proteins

Properties of amino acids in proteins Properties of amino acids in proteins one of the primary roles of DNA (but far from the only one!!!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids

More information

Protein Secondary Structure

Protein Secondary Structure Protein Secondary Structure Reading: Berg, Tymoczko & Stryer, 6th ed., Chapter 2, pp. 37-45 Problems in textbook: chapter 2, pp. 63-64, #1,5,9 Directory of Jmol structures of proteins: http://www.biochem.arizona.edu/classes/bioc462/462a/jmol/routines/routines.html

More information

Proteins consist in whole or large part of amino acids. Simple proteins consist only of amino acids.

Proteins consist in whole or large part of amino acids. Simple proteins consist only of amino acids. Today we begin our discussion of the structure and properties of proteins. Proteins consist in whole or large part of amino acids. Simple proteins consist only of amino acids. Conjugated proteins contain

More information

PHAR3316 Pharmacy biochemistry Exam #2 Fall 2010 KEY

PHAR3316 Pharmacy biochemistry Exam #2 Fall 2010 KEY 1. How many protons is(are) lost when the amino acid Asparagine is titrated from its fully protonated state to a fully deprotonated state? A. 0 B. 1 * C. 2 D. 3 E. none Correct Answer: C (this question

More information

Review II: The Molecules of Life

Review II: The Molecules of Life Review II: The Molecules of Life Judy Wieber BBSI @ Pitt 2007 Department of Computational Biology University of Pittsburgh School of Medicine May 24, 2007 Outline Introduction Proteins Carbohydrates Lipids

More information

Amino acids. (Foundation Block) Dr. Essa Sabi

Amino acids. (Foundation Block) Dr. Essa Sabi Amino acids (Foundation Block) Dr. Essa Sabi Learning outcomes What are the amino acids? General structure. Classification of amino acids. Optical properties. Amino acid configuration. Non-standard amino

More information

Objective: You will be able to explain how the subcomponents of

Objective: You will be able to explain how the subcomponents of Objective: You will be able to explain how the subcomponents of nucleic acids determine the properties of that polymer. Do Now: Read the first two paragraphs from enduring understanding 4.A Essential knowledge:

More information

Chapter 3: Amino Acids and Peptides

Chapter 3: Amino Acids and Peptides Chapter 3: Amino Acids and Peptides BINF 6101/8101, Spring 2018 Outline 1. Overall amino acid structure 2. Amino acid stereochemistry 3. Amino acid sidechain structure & classification 4. Non-standard

More information

Lecture 4: 8/26. CHAPTER 4 Protein Three Dimensional Structure

Lecture 4: 8/26. CHAPTER 4 Protein Three Dimensional Structure Lecture 4: 8/26 CHAPTER 4 Protein Three Dimensional Structure Summary of the Lecture 3 There are 20 amino acids and only the L isomer amino acid exist in proteins Each amino acid consists of a central

More information

Chemistry 121 Winter 17

Chemistry 121 Winter 17 Chemistry 121 Winter 17 Introduction to Organic Chemistry and Biochemistry Instructor Dr. Upali Siriwardane (Ph.D. Ohio State) E-mail: upali@latech.edu Office: 311 Carson Taylor Hall ; Phone: 318-257-4941;

More information

Gentilucci, Amino Acids, Peptides, and Proteins. Peptides and proteins are polymers of amino acids linked together by amide bonds CH 3

Gentilucci, Amino Acids, Peptides, and Proteins. Peptides and proteins are polymers of amino acids linked together by amide bonds CH 3 Amino Acids Peptides and proteins are polymers of amino acids linked together by amide bonds Aliphatic Side-Chain Amino Acids - - H CH glycine alanine 3 proline valine CH CH 3 - leucine - isoleucine CH

More information

Lecture 3: 8/24. CHAPTER 3 Amino Acids

Lecture 3: 8/24. CHAPTER 3 Amino Acids Lecture 3: 8/24 CHAPTER 3 Amino Acids 1 Chapter 3 Outline 2 Amino Acid Are Biomolecules and their Atoms Can Be Visualized by Two Different Ways 1) Fischer projections: Two dimensional representation of

More information

Polarization and Circular Dichroism (Notes 17)

Polarization and Circular Dichroism (Notes 17) Polarization and Circular Dichroism - 2014 (Notes 17) Since is vector, if fix molec. orient., E-field interact (absorb) with molecule differently when change E-orientation (polarization) Transitions can

More information

Raghad Abu Jebbeh. Amani Nofal. Mamoon Ahram

Raghad Abu Jebbeh. Amani Nofal. Mamoon Ahram ... 14 Raghad Abu Jebbeh Amani Nofal Mamoon Ahram This sheet includes part of lec.13 + lec.14. Amino acid peptide protein Terminology: 1- Residue: a subunit that is a part of a large molecule. 2- Dipeptide:

More information

Proteins consist of joined amino acids They are joined by a Also called an Amide Bond

Proteins consist of joined amino acids They are joined by a Also called an Amide Bond Lecture Two: Peptide Bond & Protein Structure [Chapter 2 Berg, Tymoczko & Stryer] (Figures in Red are for the 7th Edition) (Figures in Blue are for the 8th Edition) Proteins consist of joined amino acids

More information

A Chemical Look at Proteins: Workhorses of the Cell

A Chemical Look at Proteins: Workhorses of the Cell A Chemical Look at Proteins: Workhorses of the Cell A A Life ciences 1a Lecture otes et 4 pring 2006 Prof. Daniel Kahne Life requires chemistry 2 amino acid monomer and it is proteins that make the chemistry

More information

LAB#23: Biochemical Evidence of Evolution Name: Period Date :

LAB#23: Biochemical Evidence of Evolution Name: Period Date : LAB#23: Biochemical Evidence of Name: Period Date : Laboratory Experience #23 Bridge Worth 80 Lab Minutes If two organisms have similar portions of DNA (genes), these organisms will probably make similar

More information

Judy Wieber. Department of Computational Biology. May 27, 2008

Judy Wieber. Department of Computational Biology. May 27, 2008 Review II: The Molecules of Life Judy Wieber BBSI @ Pitt 2008 Department of Computational Biology University it of Pittsburgh School of Medicine i May 27, 2008 Outline Introduction Proteins Carbohydrates

More information

Distribution of the amino acids in Nature Frequency in proteins (%)

Distribution of the amino acids in Nature Frequency in proteins (%) Distribution of the amino acids in ature Amino Acid Frequency in proteins (%) Leucine Alanine Glycine erine Valine Glutamic acid Threonine Arginine Lysine Aspartic acid Isoleucine Proline Asparagine Glutamine

More information

HOMEWORK II and Swiss-PDB Viewer Tutorial DUE 9/26/03 62 points total. The ph at which a peptide has no net charge is its isoelectric point.

HOMEWORK II and Swiss-PDB Viewer Tutorial DUE 9/26/03 62 points total. The ph at which a peptide has no net charge is its isoelectric point. BIOCHEMISTRY I HOMEWORK II and Swiss-PDB Viewer Tutorial DUE 9/26/03 62 points total 1). 8 points total T or F (2 points each; if false, briefly state why it is false) The ph at which a peptide has no

More information

Introduction to Proteomics Dr. Sanjeeva Srivastava Department of Biosciences and Bioengineering Indian Institute of Technology - Bombay

Introduction to Proteomics Dr. Sanjeeva Srivastava Department of Biosciences and Bioengineering Indian Institute of Technology - Bombay Introduction to Proteomics Dr. Sanjeeva Srivastava Department of Biosciences and Bioengineering Indian Institute of Technology - Bombay Lecture 01 Introduction to Amino Acids Welcome to the proteomic course.

More information

Food protein powders classification and discrimination by FTIR spectroscopy and principal component analysis

Food protein powders classification and discrimination by FTIR spectroscopy and principal component analysis APPLICATION NOTE AN53037 Food protein powders classification and discrimination by FTIR spectroscopy and principal component analysis Author Ron Rubinovitz, Ph.D. Thermo Fisher Scientific Key Words FTIR,

More information

Section 1 Proteins and Proteomics

Section 1 Proteins and Proteomics Section 1 Proteins and Proteomics Learning Objectives At the end of this assignment, you should be able to: 1. Draw the chemical structure of an amino acid and small peptide. 2. Describe the difference

More information

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

CHAPTER 29 HW: AMINO ACIDS + PROTEINS CAPTER 29 W: AMI ACIDS + PRTEIS For all problems, consult the table of 20 Amino Acids provided in lecture if an amino acid structure is needed; these will be given on exams. Use natural amino acids (L)

More information

The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5

The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5 Key Concepts: The Structure and Function of Large Biological Molecules Part 4: Proteins Chapter 5 Proteins include a diversity of structures, resulting in a wide range of functions Proteins Enzymatic s

More information

1. (38 pts.) 2. (25 pts.) 3. (15 pts.) 4. (12 pts.) 5. (10 pts.) Bonus (12 pts.) TOTAL (100 points)

1. (38 pts.) 2. (25 pts.) 3. (15 pts.) 4. (12 pts.) 5. (10 pts.) Bonus (12 pts.) TOTAL (100 points) Moorpark College Chemistry 11 Spring 2010 Instructor: Professor Torres Examination #5: Section Five May 4, 2010 ame: (print) ame: (sign) Directions: Make sure your examination contains TWELVE total pages

More information

Different levels of protein structure

Different levels of protein structure Dr. Sanjeeva Srivastava Proteins and its function Amino acids: building blocks Different levels of protein structure Primary, Secondary, Tertiary, Quaternary 2 Proteomics ourse PTEL 1 Derived from Greek

More information

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A

Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Biological systems interact, and these systems and their interactions possess complex properties. STOP at enduring understanding 4A Homework Watch the Bozeman video called, Biological Molecules Objective:

More information

Fundamentals of Organic Chemistry CHEM 109 For Students of Health Colleges

Fundamentals of Organic Chemistry CHEM 109 For Students of Health Colleges Fundamentals of Organic Chemistry CHEM 109 For Students of Health Colleges Credit hrs.: (2+1) King Saud University College of Science, Chemistry Department CHEM 109 CHAPTER 9. AMINO ACIDS, PEPTIDES AND

More information

Macromolecules of Life -3 Amino Acids & Proteins

Macromolecules of Life -3 Amino Acids & Proteins Macromolecules of Life -3 Amino Acids & Proteins Shu-Ping Lin, Ph.D. Institute of Biomedical Engineering E-mail: splin@dragon.nchu.edu.tw Website: http://web.nchu.edu.tw/pweb/users/splin/ Amino Acids Proteins

More information

Practice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate?

Practice Problems 3. a. What is the name of the bond formed between two amino acids? Are these bonds free to rotate? Life Sciences 1a Practice Problems 3 1. Draw the oligopeptide for Ala-Phe-Gly-Thr-Asp. You do not need to indicate the stereochemistry of the sidechains. Denote with arrows the bonds formed between the

More information

Structure of proteins

Structure of proteins Structure of proteins Presented by Dr. Mohammad Saadeh The requirements for the Pharmaceutical Biochemistry I Philadelphia University Faculty of pharmacy Structure of proteins The 20 a.a commonly found

More information

2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry

2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry Dr. Sanjeeva Srivastava 1. Fundamental of Mass Spectrometry Role of MS and basic concepts 2. Ionization Sources 3. Mass Analyzers 4. Tandem Mass Spectrometry 2 1 MS basic concepts Mass spectrometry - technique

More information

AP Bio. Protiens Chapter 5 1

AP Bio. Protiens Chapter 5 1 Concept.4: Proteins have many structures, resulting in a wide range of functions Proteins account for more than 0% of the dry mass of most cells Protein functions include structural support, storage, transport,

More information

Bioinformatics for molecular biology

Bioinformatics for molecular biology Bioinformatics for molecular biology Structural bioinformatics tools, predictors, and 3D modeling Structural Biology Review Dr Research Scientist Department of Microbiology, Oslo University Hospital -

More information

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination # 5: Section Five May 7, Name: (print)

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination # 5: Section Five May 7, Name: (print) Moorpark College Chemistry 11 Fall 2013 Instructor: Professor Gopal Examination # 5: Section Five May 7, 2013 Name: (print) Directions: Make sure your examination contains TEN total pages (including this

More information

PROTEINS. Building blocks, structure and function. Aim: You will have a clear picture of protein construction and their general properties

PROTEINS. Building blocks, structure and function. Aim: You will have a clear picture of protein construction and their general properties PROTEINS Building blocks, structure and function Aim: You will have a clear picture of protein construction and their general properties Reading materials: Compendium in Biochemistry, page 13-49. Microbiology,

More information

Methionine (Met or M)

Methionine (Met or M) Fig. 5-17 Nonpolar Fig. 5-17a Nonpolar Glycine (Gly or G) Alanine (Ala or A) Valine (Val or V) Leucine (Leu or L) Isoleucine (Ile or I) Methionine (Met or M) Phenylalanine (Phe or F) Polar Trypotphan (Trp

More information

AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS

AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS AMINO ACIDS STRUCTURE, CLASSIFICATION, PROPERTIES. PRIMARY STRUCTURE OF PROTEINS Elena Rivneac PhD, Associate Professor Department of Biochemistry and Clinical Biochemistry State University of Medicine

More information

Noninvasive Blood Glucose Analysis using Near Infrared Absorption Spectroscopy. Abstract

Noninvasive Blood Glucose Analysis using Near Infrared Absorption Spectroscopy. Abstract Progress Report No. 2-3, March 31, 1999 The Home Automation and Healthcare Consortium Noninvasive Blood Glucose Analysis using Near Infrared Absorption Spectroscopy Prof. Kamal Youcef-Toumi Principal Investigator

More information

Levels of Protein Structure:

Levels of Protein Structure: Levels of Protein Structure: PRIMARY STRUCTURE (1 ) - Defined, non-random sequence of amino acids along the peptide backbone o Described in two ways: Amino acid composition Amino acid sequence M-L-D-G-C-G

More information

a) The statement is true for X = 400, but false for X = 300; b) The statement is true for X = 300, but false for X = 200;

a) The statement is true for X = 400, but false for X = 300; b) The statement is true for X = 300, but false for X = 200; 1. Consider the following statement. To produce one molecule of each possible kind of polypeptide chain, X amino acids in length, would require more atoms than exist in the universe. Given the size of

More information

9/6/2011. Amino Acids. C α. Nonpolar, aliphatic R groups

9/6/2011. Amino Acids. C α. Nonpolar, aliphatic R groups Amino Acids Side chains (R groups) vary in: size shape charge hydrogen-bonding capacity hydrophobic character chemical reactivity C α Nonpolar, aliphatic R groups Glycine (Gly, G) Alanine (Ala, A) Valine

More information

Q1: Circle the best correct answer: (15 marks)

Q1: Circle the best correct answer: (15 marks) Q1: Circle the best correct answer: (15 marks) 1. Which one of the following incorrectly pairs an amino acid with a valid chemical characteristic a. Glycine, is chiral b. Tyrosine and tryptophan; at neutral

More information

Polypeptides. Dr. Mamoun Ahram Summer, 2017

Polypeptides. Dr. Mamoun Ahram Summer, 2017 Polypeptides Dr. Mamoun Ahram Summer, 2017 Resources This lecture Campbell and Farrell s Biochemistry, Chapters 3 (pp.72-78) and 4 Definitions and concepts A residue: each amino acid in a (poly)peptide

More information

H C. C α. Proteins perform a vast array of biological function including: Side chain

H C. C α. Proteins perform a vast array of biological function including: Side chain Topics The topics: basic concepts of molecular biology elements on Python overview of the field biological databases and database searching sequence alignments phylogenetic trees microarray data analysis

More information

1-To know what is protein 2-To identify Types of protein 3- To Know amino acids 4- To be differentiate between essential and nonessential amino acids

1-To know what is protein 2-To identify Types of protein 3- To Know amino acids 4- To be differentiate between essential and nonessential amino acids Amino acids 1-To know what is protein 2-To identify Types of protein 3- To Know amino acids 4- To be differentiate between essential and nonessential amino acids 5-To understand amino acids synthesis Amino

More information

The Structure and Function of Macromolecules

The Structure and Function of Macromolecules The Structure and Function of Macromolecules Macromolecules are polymers Polymer long molecule consisting of many similar building blocks. Monomer the small building block molecules. Carbohydrates, proteins

More information

Chapter 5: Structure and Function of Macromolecules AP Biology 2011

Chapter 5: Structure and Function of Macromolecules AP Biology 2011 Chapter 5: Structure and Function of Macromolecules AP Biology 2011 1 Macromolecules Fig. 5.1 Carbohydrates Lipids Proteins Nucleic Acids Polymer - large molecule consisting of many similar building blocks

More information

(30 pts.) 16. (24 pts.) 17. (20 pts.) 18. (16 pts.) 19. (5 pts.) 20. (5 pts.) TOTAL (100 points)

(30 pts.) 16. (24 pts.) 17. (20 pts.) 18. (16 pts.) 19. (5 pts.) 20. (5 pts.) TOTAL (100 points) Moorpark College Chemistry 11 Spring 2009 Instructor: Professor Torres Examination # 5: Section Five April 30, 2009 ame: (print) ame: (sign) Directions: Make sure your examination contains TWELVE total

More information

PROTEINS. Amino acids are the building blocks of proteins. Acid L-form * * Lecture 6 Macromolecules #2 O = N -C -C-O.

PROTEINS. Amino acids are the building blocks of proteins. Acid L-form * * Lecture 6 Macromolecules #2 O = N -C -C-O. Proteins: Linear polymers of amino acids workhorses of the cell tools, machines & scaffolds Lecture 6 Macromolecules #2 PRTEINS 1 Enzymes catalysts that mediate reactions, increase reaction rate Structural

More information

Proteins: Structure and Function 2/8/2017 1

Proteins: Structure and Function 2/8/2017 1 Proteins: Structure and Function 2/8/2017 1 outline Protein functions hemistry of amino acids Protein Structure; Primary structure Secondary structure Tertiary structure Quaternary structure 2/8/2017 2

More information

The three important structural features of proteins:

The three important structural features of proteins: The three important structural features of proteins: a. Primary (1 o ) The amino acid sequence (coded by genes) b. Secondary (2 o ) The interaction of amino acids that are close together or far apart in

More information

2. Which of the following is NOT true about carbohydrates

2. Which of the following is NOT true about carbohydrates Chemistry 11 Fall 2011 Examination #5 For the first portion of this exam, select the best answer choice for the questions below and mark the answers on your scantron. Then answer the free response questions

More information

Lipids: diverse group of hydrophobic molecules

Lipids: diverse group of hydrophobic molecules Lipids: diverse group of hydrophobic molecules Lipids only macromolecules that do not form polymers li3le or no affinity for water hydrophobic consist mostly of hydrocarbons nonpolar covalent bonds fats

More information

Short polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer

Short polymer. Dehydration removes a water molecule, forming a new bond. Longer polymer (a) Dehydration reaction in the synthesis of a polymer HO 1 2 3 H HO H Short polymer Dehydration removes a water molecule, forming a new bond Unlinked monomer H 2 O HO 1 2 3 4 H Longer polymer (a) Dehydration reaction in the synthesis of a polymer HO 1 2 3

More information

Secondary Structure. by hydrogen bonds

Secondary Structure. by hydrogen bonds Secondary Structure In the previous protein folding activity, you created a hypothetical 15-amino acid protein and learned that basic principles of chemistry determine how each protein spontaneously folds

More information

Midterm 1 Last, First

Midterm 1 Last, First Midterm 1 BIS 105 Prof. T. Murphy April 23, 2014 There should be 6 pages in this exam. Exam instructions (1) Please write your name on the top of every page of the exam (2) Show all work for full credit

More information

Secondary Structure North 72nd Street, Wauwatosa, WI Phone: (414) Fax: (414) dmoleculardesigns.com

Secondary Structure North 72nd Street, Wauwatosa, WI Phone: (414) Fax: (414) dmoleculardesigns.com Secondary Structure In the previous protein folding activity, you created a generic or hypothetical 15-amino acid protein and learned that basic principles of chemistry determine how each protein spontaneously

More information

Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation

Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation Transient β-hairpin Formation in α-synuclein Monomer Revealed by Coarse-grained Molecular Dynamics Simulation Hang Yu, 1, 2, a) Wei Han, 1, 3, b) Wen Ma, 1, 2 1, 2, 3, c) and Klaus Schulten 1) Beckman

More information

Amino Acids and Proteins (2) Professor Dr. Raid M. H. Al-Salih

Amino Acids and Proteins (2) Professor Dr. Raid M. H. Al-Salih Amino Acids and Proteins (2) Professor Dr. Raid M. H. Al-Salih 1 Some important biologically active peptides 2 Proteins The word protein is derived from Greek word, proteios which means primary. As the

More information

Human Biochemistry Option B

Human Biochemistry Option B Human Biochemistry Option B A look ahead... Your body has many functions to perform every day: Structural support, genetic information, communication, energy supply, metabolism Right now, thousands of

More information

For questions 1-4, match the carbohydrate with its size/functional group name:

For questions 1-4, match the carbohydrate with its size/functional group name: Chemistry 11 Fall 2013 Examination #5 PRACTICE 1 ANSWERS For the first portion of this exam, select the best answer choice for the questions below and mark the answers on your scantron. Then answer the

More information

CHAPTER 21: Amino Acids, Proteins, & Enzymes. General, Organic, & Biological Chemistry Janice Gorzynski Smith

CHAPTER 21: Amino Acids, Proteins, & Enzymes. General, Organic, & Biological Chemistry Janice Gorzynski Smith CHAPTER 21: Amino Acids, Proteins, & Enzymes General, Organic, & Biological Chemistry Janice Gorzynski Smith CHAPTER 21: Amino Acids, Proteins, Enzymes Learning Objectives: q The 20 common, naturally occurring

More information

Amino Acids. Amino Acids. Fundamentals. While their name implies that amino acids are compounds that contain an NH. 3 and CO NH 3

Amino Acids. Amino Acids. Fundamentals. While their name implies that amino acids are compounds that contain an NH. 3 and CO NH 3 Fundamentals While their name implies that amino acids are compounds that contain an 2 group and a 2 group, these groups are actually present as 3 and 2 respectively. They are classified as α, β, γ, etc..

More information

Find this material useful? You can help our team to keep this site up and bring you even more content consider donating via the link on our site.

Find this material useful? You can help our team to keep this site up and bring you even more content consider donating via the link on our site. Find this material useful? You can help our team to keep this site up and bring you even more content consider donating via the link on our site. Still having trouble understanding the material? Check

More information

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination #5: Section Five December 7, Name: (print) Section:

Moorpark College Chemistry 11 Fall Instructor: Professor Gopal. Examination #5: Section Five December 7, Name: (print) Section: Moorpark College Chemistry 11 Fall 2011 Instructor: Professor Gopal Examination #5: Section Five December 7, 2011 Name: (print) Section: alkene < alkyne < amine < alcohol < ketone < aldehyde < amide

More information

Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000).

Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000). Lecture 2: Principles of Protein Structure: Amino Acids Why study proteins? Proteins underpin every aspect of biological activity and therefore are targets for drug design and medicinal therapy, and in

More information

Page 8/6: The cell. Where to start: Proteins (control a cell) (start/end products)

Page 8/6: The cell. Where to start: Proteins (control a cell) (start/end products) Page 8/6: The cell Where to start: Proteins (control a cell) (start/end products) Page 11/10: Structural hierarchy Proteins Phenotype of organism 3 Dimensional structure Function by interaction THE PROTEIN

More information

Green Segment Contents

Green Segment Contents Green Segment Contents Parts Reference Guide Green Segment 1 8 2 6 3 4 5 7 1. Amino Acid Side Chain Chart shows the properties and atomic structure of side chains. 2. Amino Acid Side Chains affect protein

More information

Lecture 10 More about proteins

Lecture 10 More about proteins Lecture 10 More about proteins Today we're going to extend our discussion of protein structure. This may seem far-removed from gene cloning, but it is the path to understanding the genes that we are cloning.

More information

Photochemical Applications to the Study of Complexity Phospholipid Bilayer Environments

Photochemical Applications to the Study of Complexity Phospholipid Bilayer Environments Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2006 Photochemical Applications to the Study of Complexity Phospholipid Bilayer Environments Christopher John

More information

1. Describe the relationship of dietary protein and the health of major body systems.

1. Describe the relationship of dietary protein and the health of major body systems. Food Explorations Lab I: The Building Blocks STUDENT LAB INVESTIGATIONS Name: Lab Overview In this investigation, you will be constructing animal and plant proteins using beads to represent the amino acids.

More information

Protein Folding LARP

Protein Folding LARP Protein Folding LARP Version: 1.0 Release: April 2018 Amplyus 2018 minipcr TM Protein Folding LARP (Live Action Role Play) Summary Materials In this activity, students will role play to make a folded protein

More information

Ionization of amino acids

Ionization of amino acids Amino Acids 20 common amino acids there are others found naturally but much less frequently Common structure for amino acid COOH, -NH 2, H and R functional groups all attached to the a carbon Ionization

More information

Protein Structure Klemens Wild, BZH,

Protein Structure Klemens Wild, BZH, Protein Structure recommended books Proteins protein definition From gr. proteios (superior, erstrangig) 1836 JJ Berzelius Functions: structural, enzymes, muscle, transport immune system, Linear polymer

More information

Amino acids & Protein Structure Chemwiki: Chapter , with most emphasis on 16.3, 16.4 and 16.6

Amino acids & Protein Structure Chemwiki: Chapter , with most emphasis on 16.3, 16.4 and 16.6 Amino acids & Protein Structure Chemwiki: Chapter 16. 16.1, 16.3-16.9 with most emphasis on 16.3, 16.4 and 16.6 1 1. Most jobs (except information storage) in cells are performed by proteins. 2. Proteins

More information

Dr. Nafith Abu Tarboush

Dr. Nafith Abu Tarboush 11 Dr. Nafith Abu Tarboush July 3 rd 2013 Marah Dannoun Continuation of amino acids applications in life other than tryptophan and tyrosine: 1. Glutamate (glutamic acid): it s modified to give monosodium

More information

(65 pts.) 27. (10 pts.) 28. (15 pts.) 29. (10 pts.) TOTAL (100 points) Moorpark College Chemistry 11 Spring Instructor: Professor Gopal

(65 pts.) 27. (10 pts.) 28. (15 pts.) 29. (10 pts.) TOTAL (100 points) Moorpark College Chemistry 11 Spring Instructor: Professor Gopal Moorpark College Chemistry 11 Spring 2012 Instructor: Professor Gopal Examination # 5: Section Five May 1, 2012 Name: (print) GOOD LUCK! Directions: Make sure your examination contains TWELVE total pages

More information

Protein structure. Dr. Mamoun Ahram Summer semester,

Protein structure. Dr. Mamoun Ahram Summer semester, Protein structure Dr. Mamoun Ahram Summer semester, 2017-2018 Overview of proteins Proteins have different structures and some have repeating inner structures, other do not. A protein may have gazillion

More information

Amino acids. Dr. Mamoun Ahram Summer semester,

Amino acids. Dr. Mamoun Ahram Summer semester, Amino acids Dr. Mamoun Ahram Summer semester, 2017-2018 Resources This lecture Campbell and Farrell s Biochemistry, Chapters 3 (pp.66-76) General structure (Chiral carbon) The amino acids that occur in

More information

Paper No. 01. Paper Title: Food Chemistry. Module-16: Protein Structure & Denaturation

Paper No. 01. Paper Title: Food Chemistry. Module-16: Protein Structure & Denaturation Paper No. 01 Paper Title: Food Chemistry Module-16: Protein Structure & Denaturation The order of amino acids in a protein molecule is genetically determined. This primary sequence of amino acids must

More information

Lecture 11 AMINO ACIDS AND PROTEINS

Lecture 11 AMINO ACIDS AND PROTEINS Lecture 11 AMINO ACIDS AND PROTEINS The word "Protein" was coined by J.J. Berzelius in 1838 and was derived from the Greek word "Proteios" meaning the first rank. Proteins are macromolecular polymers composed

More information