Proteomics of body liquids as a source for potential methods for medical diagnostics Prof. Dr. Evgeny Nikolaev Institute for Biochemical Physics, Rus. Acad. Sci., Moscow, Russia. Institute for Energy Problems of Chemical Physics Rus. Acad. Sci., Moscow, Russia.
Relative Abundance High throughput proteome analyses by tandem mass spectrometry methods Proteins Peptides Mass Spec digestion HPLC/MS Protein & Peptide Identifications Protein DB Parent and fragment ion intensities S14_1 #3422 RT: 52.14 AV: 1 NL: 4.69E2 T: ITMS + c ESI d w Full ms2 600.81@cid35.00 [155.00-1215.00] 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 194.95 274.16 470.18 520.75 527.33 639.26 715.26 358.11 726.34 548.03 340.02 403.14 664.22 790.48 MS/MS 843.29 927.13 936.21 1006.34 1026.22 5 0 257.04 1096.17 1168.44 200 300 400 500 600 700 800 900 1000 1100 1200 m/z Mascot MS/MS Spectra
Problem of methods based on MS/MS identification - Sensitivity lost informative are only MS/MS spectra, whose intensity is at least ~10-fold lower than intensity of MS spectra - There is no possibility to detect all peptides in one run - Extra time for fragment spectra measurements causes longer chromatography time (application of UPLC is questionable for some types of MS instruments)
Relative amounts of new peptide identifications during several consecutive LC-MS runs with the same sample. 3 2 1
The other possibility in proteomics usage of high mass measurement accuracy mass spectrometry
(From Alan Marshall NHMFL)
Ion cyclotron resonance mass spectrometer can measure masses with sub ppm accuracy Linear ion trap Линейная ионная ловушка FTMS Data Magnet Магнит 7 T IR ИК-лазер laser Electron gun Электронная пушка
Other mass spectrometers with high accuracy of mass measurements are available now Orbitraps Q-TOFs. BRUKER microtof-qii Mass accuracy 1-2 ppm (intern. calib.), 5 ppm (extern. calib.) Resolution 20 000-60 000 FWHM Rate of mass spectra measurements >20 Hz
At accuracy level of 1 ppm elementary composition of peptide with mass up to 600 Da and amino acid composition of peptide with mass up to 500 Da could be determined almost unambiguously It is not enough for peptide identification!
Accurate mass tag retention time Dick Smith group (PNNL). Besides we have another tag - LC retention time Accurate mass tag together with retention time Can identify peptide practically unambiguously!
RT: 46.10-80.40 120 100 80 60 40 20 0 120 100 80 60 LC reproducibility-agilent 1100 55.89 60.73 64.54 57.46 65.97 46.69 49.23 58.02 49.88 54.17 68.81 69.41 72.79 62.53 55.66 60.45 64.35 75.42 76.66 78.29 NL: 1.07E6 Base Peak F: FTMS + p ESI Full ms [ 350.00-2000.00] MS urine_1-5_0-1ul_150min NL: 1.44E6 Base Peak F: FTMS + p ESI Full ms [ 350.00-2000.00] MS urine2nd 40 20 0 120 100 80 60 40 20 0 120 100 57.23 65.78 46.48 49.02 57.91 49.68 75.24 51.57 55.05 66.77 68.65 69.41 72.61 62.37 76.47 55.65 60.64 64.33 57.25 65.93 46.56 49.09 57.94 51.54 55.18 69.23 75.26 68.73 69.52 62.38 72.61 76.43 55.90 78.14 78.17 NL: 1.83E6 Base Peak F: FTMS + p ESI Full ms [ 350.00-2000.00] MS urine3thd NL: 5.89E5 Base Peak MS urine4th 80 60 40 20 0 60.93 64.46 58.17 66.16 75.42 46.66 49.97 60.13 68.81 72.76 47.48 51.58 54.27 69.35 64.00 74.20 76.56 78.44 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80
...TGLYCESQTPRSLTLGIEPVSPTSLRVGLQRYVQLRSLR... Vasorin (Homo Sapiens protein) trypsinolyses SLTLGIEPVSPTSLR TGLYCESQTPR Fragment (463-477) from Vasorin VGLQR YVQLR SLR LC- FTICR identification LC-MS/MS (e.g. with ion trap) y9 y7 y8 522.5 525.0 m/z 450 500 550 600 650 m/z Accurate measured mass: 1568.8768 validation b6 y5 y4 b7 y6 b9 b8 b10 y10 b11 y11 y12 b12 b13 b14 y13 200 600 1,000 1,400 1,800 m/z Putative mass tag from Homo Sapiens: SLTLGIEPVSPTSLR Calculated mass (1568.8773) And measured retention time Validated accurate mass tag (SLTLGIEPVSPTSLR)
Thus, the general idea is to create using MS/MS a data base for accurate mass tags and retention times as a reference base for quantitative proteomics
Analyses of urine proteome Urine is available in large quantities ideal analyte for noninvasive diagnostic. Possibility of biomarker discovery is attracting a big attention. 1500 proteins!!! (from Mann s group Adachi et al. Genome Biology 2006, Volume 7, Issue 9, Article R80 )
Accurate mass tag retention time approach Lab Lab FT MS Clinic ESI Q-TOF ESI TOF
Statistics of the collected AMT tags in urine proteom 233 LC-MS (liquid chromatography coupled with mass spectrometry) runs totally: (80% of men and 20% of women) and 25 samples from each of 6 long term isolation experiments volunteers (during 19 weeks) have been collected so far. The number of peptides in the database 2758 The number of urine proteins in the database 840
Two kinds of sample donors People from street and people in special conditions.
General blood analysis Examination of internist Blood pressure measurement Current control for urogenital and other pathology including kidney pathology, prostatitis, arterial hypertension, diabetes Decision to include a person to the study group Analysis of archival information from medical records Control for treatment with diuretics and excessive consumption of fluids
Data recorded for each sample 1. Number 2. Name, 3. Date of birth 4. Sex 5. Date of urine collection 6. Time of urine collection 7. Current smoking status (+/-) 8. Sample volume 9. Clinical parameters (other diseases) 10.Results of testing for bilirubin, urobilinogen, ketones, glucose, protein, blood, nitrite, ph, specific gravity, leukocytes
For healthy people data base subset we need urine samples from persons under well controlled diet and having healthy lifestyle? In this case we can test urine temporal variability and polymorphism
Those are people participating In long term isolation experiments in the frame of space research programs. April- July 2009.
Ground based experimental facility
April- July 2009
Urine collection Centrifugation Sample concentration Amicon Ultra Ultracel-15 3 k Desalting and major protein removal Carboxymethylation and trypsinolyses LC MS analyses
Search engine: Mascot Database: IPI.Human v.3.52 Parent Tolerance: ± 5.0 PPM (Monoisotopic) Fragment Tolerance: ± 0.50 Da (Monoisotopic) Fixed Modifications: Carbamidomethyl (C) Variable Modifications: Oxidation(M) Digestion Enzyme: Trypsin Max Missed Cleavages: 2 Instrument type: Ion-trap
What is in the DB Run, in which this peptide was identified Peptide sequence What protein does this sequence belong to Mascot score Modifications Measured mass Theoretical mass Measured charge RT, when the peptide began to elute from the column RT, when the peptide finished elution
Retention time normalization Normalization time scale alignment for series of experiments Several types of normalization are possible: - By some added calibrant external calibration (e.g. Cytochrome C) - By theoretically predicted RTs - By peptides that are always present in your samples (for example, peptides of digestion enzyme, etc.) We have chosen the last one, as it is rather robust and doesn t require any additional sample treatment. RTs are renormalized every time a new run is added to the database.
NETs sorted by RT Normalization for runs without MS/MS HPLC is considered to be linear, so different masses should retain elution order from run to run. We can use pivots and look for the same sequence of masses in the run without MS/MS, our goal is to find the longest common subsequence. Run 5 without MS/MS Average NET for a peptide 2330.9 1150.3 1 2 3 1150.3 1024.5 878.1 1575.1 1024.5 1575.1 758.1 No MS/MS run peak list sorted by RT 758.1 Elution sequence of known masses is retained 2330.9 1150.3 878.1 1575.1 758.1 1024.5 1150.3 1024.5 1575.1 758.1
t2 (min) Normalization of runs without MS/MS t2 (min) 70 60 50 40 30 20 10 RT correlation between all masses within 5 ppm y = 0.8988x + 3.704 R 2 = 0.7774 20 25 30 35 40 45 50 55 60 65 70 t1 (min) 70 2 runs of different urine samples performed with 1 day interval Plotted are RTs of all the masses matching with 5ppm tolerance Correlation coefficient of linear least squares fit is only 0.7774, which is bad RT correlation of the longest common subsequence 60 Correlation coefficient of linear least squares fit is 0.9996, which means we have an almost perfectly linear correlation between 2 datasets 50 40 30 20 10 y = 0.9838x + 0.8356 R 2 = 0.9996 20 25 30 35 40 45 50 55 60 65 70 t1 (min)
The total number of identified proteins in the database during its creation/filling stage. Vertical blue arrows show steps of equal protein count increase, the length of horizontal arrows parallel to the abscissa axis is proportional to the time needed to identify an additional protein. 700 600 500 400 300 200 100 0 0 20 Number 40 of LC-MS 60runs 80 100 120
Smokers vs. non-smokers urine proteome
Current statistics of urinary proteome database 233 LC-MS (liquid chromatography coupled with mass spectrometry) runs totally: 102 with samples from smokers, 131 with samples from non-smokers. Using all peptides Peptides Proteins Non-smokers 2527 762 Smokers 1893 627 Total 2758 840
Using all peptides Peptides Proteins Non-smokers 2527 762 Smokers 1893 627 Total 2758 840 Peptides Proteins 865 1662 231 213 549 78 40% 35%
Using all peptides Peptides Proteins Odd 2232 445 Even 2306 467 Non-smokers 2535 506 Peptides Proteins 229 2003 303 49 406 61 20% 21%
Using all peptides Peptides Proteins Selection1 1723 365 Selection2 1588 337 Smokers 1894 400 Peptides Proteins 171 1417 306 63 302 35 25% 25%
Differences in the numbers of observed proteins in urine of smokers and nonsmokers participating in particular biological process Transport, homophilic cell adhesion, lipid metabolic process, inflammatory response, innate immune response, epidermis development, defense response!!!!!!!