Supplementary Note Details of the patient populations studied Strengths and weakness of the study

Supplementary Note Details of the patient populations studied TVD and NCA patients. Patients were recruited to the TVD (triple vessel disease) group who had significant coronary artery disease (defined as a reduction of more than 50% in the intralumenal diameter) of all three major coronary arteries (left anterior descending, circumflex and right coronary arteries). The symptoms of angina had been stable for at least one month and no patient had suffered a myocardial infarction in the preceding three months. Patients were recruited to the NCA (normal coronary artery) group who had chest pain and a positive exercise electrocardiogram (the Bruce protocol was used, where the presence of at least 1 mm of horizontal or downward sloping ST segment depression at 80 ms after the J point is considered positive), but normal coronary angiograms (judged by two independent observers). NCA patients with hypertension, diabetes mellitus and valvular heart disease or left ventricular hypertrophy were excluded. Consecutive patients presenting at Papworth Hospital (Cambridgeshire, UK) who met the above criteria for either the TVD or NCA group were recruited to the study. The clinical data for these patient groups are shown in Table 1. After generation of the NMR spectra, two samples were identified as substantial outliers: the 1 H-NMR spectra from these subjects were found to be atypical by visual inspection and were removed before subsequent data analysis. Severity study. For the study of extent of coronary heart disease (CHD), patients were recruited according to the same criteria, except that patients with more than 50% stenosis of one, two or all three coronary arteries (assessed by two independent observers) were recruited and females were excluded. The clinical data for these patient groups are shown in Table 2. Blood samples from these patients were drawn into Diatube H tubes, and platelet-poor plasma was prepared as previously described. Aliquots of plasma were stored at 80 ºC until assayed. After generation of the NMR spectra, no samples were excluded from the data analysis. Strengths and weakness of the study It is important to note that consecutive patients arriving at Papworth Hospital were recruited to our cohort, selected only on the basis of coronary artery disease status. The cohort therefore represents the age, sex and racial background bias of heart disease sufferers within the regional population served by this hospital. We have shown that, particularly after application of orthogonal signal correction (OSC) to remove orthogonal variation, it is possible to diagnose coronary artery disease status in the presence of this background variation. If we had used highly selected populations, matched for as many parameters as possible (such as sex, age, blood pressure and so forth) this would have emphasized the differences due to artery status by removing other major sources of variation. To find utility in the clinic, a diagnostic technique must be shown to have sensitivity and specificity when applied to an unselected population. One strength of our study design, therefore, is that it does not overemphasize the diagnostic power of the test by studying highly selected populations. We have also shown that metabonomic analysis can distinguish individuals with CHD of differing severity. Using the crude parameter of number of major coronary vessels with

>50% stenosis, we demonstrated that both principal components analysis (PCA) and partial least squares-discriminate analysis (PLS-DA) were capable of categorizing CHD patients on the basis of severity. The failure to achieve complete separation of the classes is as likely to reflect the crude nature of our severity designations based solely on coronary angiography as on any lack of power in the metabonomic analysis to discriminate individuals. Nevertheless, none of the conventional risk factors measured in these subjects (including age, blood pressure, lipoprotein levels or clotting parameters) differed between the severity classes, even in a cross-sectional analysis, and hence we were completely unable to distinguish individuals within the population on the basis of CHD severity. This demonstrates the extent to which metabonomics improves upon conventional risk factor analysis, and emphasizes the importance of the information content of the NMR spectrum: similar diagnostic power would unlikely be achieved by applying pattern recognition methodology to existing risk factor data sets. The use of multiparametric analysis of high information density data sets has increased dramatically in recent years. NMR spectroscopy is not the only technique that can be used for such analysis: genomics, proteomics and other chemical analysis methods, such as mass spectrometry, have also been used 1,2. Application of pattern recognition technology to proteomic data sets has recently been shown to provide useful diagnostic information about prostate cancer 3. Compared to techniques such as proteomics, NMR spectroscopy does offer the advantage of extremely high reproducibility (with coefficients of variation below 1% for spectra of replicate samples generated on two different days). Such reproducibility is not yet routinely available for proteomics or gene expression profiling. Nevertheless, it seems likely that even more refined diagnostic information could be gathered by integrating metabolic, protein and genetic profiles, albeit with more difficult sample preparation and at greater cost than for a simple analysis of the proton-nmr spectrum of serum. Further studies will be required to determine to what extent the genetic and metabolic datasets contain independent diagnostic power, or whether much of the information content is overlapping. Furthermore, it is presently unclear whether the methodology used in the present study represents the optimal compromise between accuracy and simplicity: for example, other workers have described NMR spectroscopic methods for analyzing lipoprotein composition 4,5 that may also be useful for the diagnosis of cardiovascular disease, although they are unlikely to improve on the >90% sensitivity and specificity of the method reported here. There are numerous parameters affecting both the acquisition of the data (for example, the NMR pulse sequence, diffusion editing, selection of resolution and simple 1D-NMR spectra, bin widths and so forth) and the subsequent pattern analysis (application of a single round of OSC, selection of PLS-DA for classification and so forth). Further studies will be required to compare different methodological approaches for their relative diagnostic sensitivity and specificity in various clinical applications. One weakness of our study design is the small number of subjects studied, although the diagnostic sensitivity and specificity of the technology will only increase with larger numbers of subjects used to build the diagnostic model. However, widespread application of this technology will depend on the demonstration of clinical utility among large groups of subjects in a multi-center prospective trial. Such studies are already underway. The aim of the present study, however, is to investigate the science that underpins the technology, rather than to provide clinical trial data to support its use. This study (with about 150 individuals analyzed) is small as a clinical trial, but relatively large as a scientific study. It is certainly large enough: the statistical analysis makes it clear that the

diagnostic power we have demonstrated is very unlikely to be due to chance. Our cohort, though small, has been extensively investigated: we have data on almost 100 clinical parameters (from mean corpuscular hemoglobin concentration to plasminogen activator inhibitor (PAI-1) levels). Such extensive data would never be collected in a large multicenter validating study, yet this extensive clinical data is essential for our conclusions: using it, we can show that the significantly improved diagnostic capability of our test compared with known risk factor analysis derives mostly from the information in the proton-nmr spectra rather than from the power of the pattern recognition methodology. Table 1 Clinical parameters for the NCA and TVD patients NCA TVD Age (years) 57.2 ± 9.0 64.1 ± 7.2 Sex: Male (n) Female (n) 7 23 34 2 Previous myocardial infarction 1 19 Blood pressure: systolic (mmhg) 141 ± 22 138 ± 23 Diastolic (mmhg) 78 ± 12 75 ± 12 Current smokers (n) 2 1 Urea (mm) 5.0 ± 1.2 5.6 ± 1.6 Creatinine (µm) 93 ± 14 108 ± 18 Glucose (mm) 5.2 ± 0.6 5.6 ± 0.9 Total cholesterol (mm) 5.9 ± 1.1 6.2 ± 0.8 HDL-cholesterol (mm) 1.1 ± 0.2 0.8 ± 0.2 LDL-cholesterol (mm) 4.3 ± 1.1 4.5 ± 0.7 Total cholesterol:hdl-c ratio 5.8 ± 1.8 8.3 ± 1.9 Triglycerides (mm) 1.5 ± 1.2 2.1 ± 1.1 PAI-1 (ng ml -1 ) 37.9 ± 17.4 49.1 ± 16.6 Total protein (g.l -1 ) 70.4 ± 6.3 69.4 ± 4.0 Albumin (g.l -1 ) 38.6 ± 3.2 37.4 ± 2.6

Globulin (%) 45 ± 5 46 ± 4 Values are mean ± s.d., except for triglyceride, which is median ± interquartile range. n, Number of patients; NCA, normal coronary artery; TVD, triple vessel disease; HDL, high-density lipoprotein; LDL, low-density lipoprotein; PAI-1, plasminogen activator inhibitor. Table 2 Clinical parameters for subjects with stenosis of one, two or three coronary vessels 1 Vessel disease 2 Vessel disease 3 Vessel disease Height (m) 1.76 ± 0.07 1.80 ± 0.05 1.78 ± 0.06 Weight (kg) 83.5 ± 14.7 91.1 ± 10.0 86.7 ± 9.6 BMI (kg.m 2 ) 26.77 ± 4.01 28.07 ± 3.55 27.32 ± 2.22 Erythrocyte count 4.64 ± 0.35 4.54 ± 0.55 4.66 ± 0.25 Hemoglobin (g.dl 1 ) 13.9 ± 0.82 13.53 ± 1.52 13.54 ± 0.95 Hematocrit 0.418 ± 0.026 0.410 ± 0.053 0.409 ± 0.025 MCV (fl) 90.2 ± 4.3 90.2 ± 4.3 87.7 ± 5.3 MCHC (g.dl 1 ) 30.1 ± 1.6 29.8 ± 1.5 29.1 ± 2.0 Platelets (10 9 /l) 210 ± 45 210 ± 27 214 ± 57 White blood cell count 6.30 ± 1.21 6.74 ± 1.74 6.22 ± 1.50 Neutrophils (10 9 /l) 3.63 ± 0.89 4.09 ± 1.77 3.61 ± 1.14 Lymphocytes (10 9 /l) 1.88 ± 0.52 1.84 ± 0.55 1.79 ± 0.44 Monocytes (10 9 /l) 0.53 ± 0.14 0.51 ± 0.17 0.53 ± 0.14 Eosinophils (10 9 /l) 0.21 ± 0.12 0.19 ± 0.12 0.16 ± 0.10 Basophils (10 9 /l) 0.02 ± 0.01 0.02 ± 0.01 0.02 ± 0.01 Fibrinogen (g/l 1 ) 3.52 ± 0.86 3.76 ± 1.01 3.57 ± 0.84 Clotting time: PT test (s) 13.6 ± 0.9 13.6 ± 1.2 13.7 ± 0.8 APTT test (s) 29.0 ± 2.9 30.1 ± 4.0 30.2 ± 3.1

Sodium (mm) 140 ± 2 139 ± 2 140 ± 2 Potassium (mm) 4.1 ± 0.3 4.1 ± 0.2 4.2 ± 0.3 Urea (mm) 6.1 ± 1.7 6.6 ± 1.4 6.1 ± 1.3 Creatinine (µm) 104 ± 10 103 ± 10 107 ± 11 Total protein (g.l 1 ) 72 ± 4 72 ± 6 72 ± 3 Albumin (g.l 1 ) 42 ± 3 41 ± 4 42 ± 3 Immunoglogulins (g.l 1 ) 31 ± 4 30 ± 5 30 ± 3 Bilirubin (µm) 9 ± 4 11 ± 4 10 ± 4 ALT (units per l) 19 ± 6 23 ± 10 22 ± 8 ALP (units per l) 183 ± 41 178 ± 39 173 ± 41 γgt (units per l) 12.1 ± 7.0 14.0 ± 10.3 12.9 ± 7.5 Glucose (mm) 5.8 ± 1.3 5.9 ± 1.4 6.1 ± 2.3 HbA1c 5.6 ± 0.5 5.9 ± 1.3 6.3 ± 0.6 Total cholesterol (mm) 5.3 ± 0.9 5.6 ± 1.4 5.2 ± 0.9 LDL-cholesterol (mm) 3.3 ± 0.8 3.6 ± 1.3 3.2 ± 0.9 HDL-cholesterol (mm) 1.01 ± 0.23 0.97 ± 0.17 1.04 ± 0.34 Triglycerides (mm) 2.0 ± 1.1 2.2 ± 1.0 2.1 ± 0.8 For each parameter, the values are mean ± s.d., except for triglyceride, which is reported as median ± interquartile range. BMI, body mass index; MCV, mean corpuscular volume; MCHC, mean corpuscular hemoglobin concentration; PT, prothrombin time; APTT, partial thromboplastin time; ALT, alanine aminotransferase; ALP, alkaline phosphatase; γgt, γ-glutamyl transferase; HbA1c, glycated hemoglobin. Table 3 Summary of spectral regions influencing separation of NCA samples from TVD samples Bucket region (δ) Assignment Chemical shift (δ) and multiplicity NMR spectral intensity in TVD versus NCA 1.30 Lipid, mainly VLDL (CH 2 ) n 1.29 (m) Increased

1.22 Lipid CH 3 CH 2 CH 2 1.22 (m) Decreased 1.26 Lipid, mainly LDL CH 3 CH 2 (CH 2 ) n 1.26 (m) 1.25 (m) Increased 1.34 Lipid CH 2 CH 2 CH 2 CO 1.32 (m) Increased 3.22 Choline N(CH 3 ) 3 + 3.21 (s) Decreased 0.86 Lipid: LDL CH 3 (CH 2 ) n VLDL CH 3 CH 2 CH 2 C= 0.84 (t) 0.87 (t) Increased 0.9 Cholesterol C21 0.91 (d) Increased 0.82 Lipid CH 3 Cholesterol C26 and C27 0.84 (t) 0.84 (d,d) Decreased 2.02 Lipid CH 2 C=C 2.00 (m) Increased 1.58 Lipid, mainly VLDL CH 2 CH 2 CO 1.57 (m) Increased 2.22 Lipid CH 2 CO 2.23 (m) Increased 1.98 Lipid CH 2 C=C 1.97 (m) Decreased Supplementary References 1. Shaw, A. et al. Adv. Biochem. Eng. Biotechnol. 66, 83 113 (2000). 2. Jungblut, P. et al. Electrophoresis 20, 2100 2010 (1999). 3. Petricoin, E. 3 rd et al. J. Natl. Cancer Inst. 94, 1576 1578 (2002). 4. Otvos, J., Jeyerajah, E., Bennett, D. & Krauss, R. Clin. Chem. 38, 1632 1638 (1992). 5. Serrai, H. et al. NMR Biomed. 11, 273 280 (1998).