Supplementary Materials. We randomly divided the original dataset S into K subsets. In the k-th cross-validation, the k-th subset

Similar documents
Association of Rheumatoid Arthritis Risk Alleles with Response to Anti-TNF Biologics: Results from the CORRONA Registry and Meta-analysis

Overlap of disease susceptibility loci for rheumatoid arthritis and juvenile idiopathic arthritis

Genetic Susceptibility to Rheumatoid Arthritis: An Emerging Picture

INTRODUCTION. Human Molecular Genetics, 2009, Vol. 18, No doi: /hmg/ddp177 Advance Access published on April 9, 2009

The Genetic Epidemiology of Rheumatoid Arthritis. Lindsey A. Criswell AURA meeting, 2016

Genetics of rheumatoid arthritis: what have we learned?

Genetic markers of rheumatoid arthritis susceptibility in anti-citrullinated peptide antibody negative patients

Lack of association of IL-2RA and IL-2RB polymorphisms with rheumatoid arthritis in a Han Chinese population

The inflammatory disease-associated variants in IL12B and IL23R are not associated with rheumatoid arthritis

Retrospective Genetic Analysis of Efficacy and Adverse Events in a Rheumatoid Arthritis Population Treated with Methotrexate and Anti-TNF-α

NIH Public Access Author Manuscript Nat Genet. Author manuscript; available in PMC 2009 October 6.

Cover Page. The handle holds various files of this Leiden University dissertation.

Results. Introduction

NIH Public Access Author Manuscript Arthritis Rheum. Author manuscript; available in PMC 2010 March 1.

GENETIC STUDIES OF THE HLA LOCUS IN RHEUMATIC DISEASES

TRAF1 C5 as a Risk Locus for Rheumatoid Arthritis A Genomewide Study

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Association of susceptible genetic markers and autoantibodies in rheumatoid arthritis

Evaluation of the rheumatoid arthritis susceptibility loci HLA-DRB1, PTPN22, OLIG3/TNFAIP3, STAT4 and TRAF1/C5 in an inception cohort

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

Lisbeth Ärlestig, 1 Mohammed Mullazehi, 2 Heidi Kokkonen, 1 Joacim Rocklöv, 1 Johan Rönnelid, 2 Solbritt Rantapää Dahlqvist 1 EXTENDED REPORT

Association of PTPN22 rs Polymorphism with Rheumatoid Arthritis and Celiac Disease in Khuzestan Province, Southwestern Iran

Association of Single Nucleotide Polymorphisms (SNPs) in CCR6, TAGAP and TNFAIP3 with Rheumatoid Arthritis in African Americans

Correlation between IL-4 gene polymorphismas well as its mrna expression and rheumatoid arthritis

Chapter 3. ANNALS OF THE RHEUMATIC DISEASES 2008; 67(9): doi: /ard

Genetics of ankylosing spondylitis and rheumatoid arthritis: where are we at currently, and how do they compare?

Systemic lupus erythematosus (SLE) is the prototypic. The Genetic Contribution to Systemic Lupus Erythematosus. Lindsey A. Criswell, M.D., M.P.H.

Cover Page. The handle holds various files of this Leiden University dissertation.

Comparison of discrimination methods for the classification of tumors using gene expression data

Original Article Genetic variants of STAT4 are associated with ankylosing spondylitis susceptibility and severity in a Chinese Han population

White Paper Guidelines on Vetting Genetic Associations

Genome-wide Association Analysis Applied to Asthma-Susceptibility Gene. McCaw, Z., Wu, W., Hsiao, S., McKhann, A., Tracy, S.

ANALYSIS OF IL17 AND IL17RA POLYMORPHISMS IN SPANISH PSORIASIS PATIENTS: ASSOCIATION WITH RISK FOR DISEASE.

Common and different genetic background for rheumatoid arthritis and coeliac disease

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

Sebastien Viatte, 1 Jonathan Massey, 1 John Bowes, 1 Kate Duffus, 1 arcogen Consortium, Stephen Eyre, 1 Anne Barton, 2 and Jane Worthington 2

Risk Alleles for Systemic Lupus Erythematosus in a Large Case-Control Collection and Associations with Clinical Subphenotypes

A genome-wide association study identifies vitiligo

Genetic and Expression Analysis of CASP7 Gene in a European Caucasian Population with Rheumatoid Arthritis

Introduction to Genetics and Genomics

Field wide development of analytic approaches for sequence data

# For the GWAS stage, B-cell NHL cases which small numbers (N<20) were excluded from analysis.

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

ARD Online First, published on September 8, 2005 as /ard

Non-HLA genes PTPN22, CDK6 and PADI4 are associated with specific autoantibodies in HLA-defined subgroups of rheumatoid arthritis

Genetics and the Path Towards Targeted Therapies in Systemic Lupus

Y. Chen, D.L. Mattey. Clinical and Experimental Rheumatology 2012; 30:

High density genetic mapping identifies new susceptibility loci for rheumatoid arthritis

Supplementary Online Content

10/27/2013. Biological Insights from Genetics of Rheumatoid Arthritis Contribute to Drug Discovery. Yukinori Okada, MD, PhD.

Replication of genetic loci outside the HLA conferring susceptibility to anti-ccp negative rheumatoid arthritis.

FEP Medical Policy Manual

Supplementary webappendix

Professor of Rheumatology. University of Sheffield (United Kingdom) Research and Teaching Clinical. Professor of Rheumatology

During the hyperinsulinemic-euglycemic clamp [1], a priming dose of human insulin (Novolin,

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

RA Risk Variants at CD28, PRDM1, and CD2/CD58. Genetic variants at CD28, PRDM1, and CD2/CD58 are associated with rheumatoid arthritis risk

Association of forkhead box J3 (FOXJ3) polymorphisms with rheumatoid arthritis

Genetics and Genomics in Medicine Chapter 8 Questions

Host Genomics of HIV-1

Specificity of the STAT4 Genetic Association for Severe Disease Manifestations of Systemic Lupus Erythematosus

Introduction to the Genetics of Complex Disease

Predictive Factors for Outcome of Rheumatoid Arthritis. Michael van der Linden

New Enhancements: GWAS Workflows with SVS

Assessing Accuracy of Genotype Imputation in American Indians

STAT4 and the Risk of Rheumatoid Arthritis and Systemic Lupus Erythematosus

IL2RA is associated with persistence of rheumatoid arthritis

Terao et al. Arthritis Research & Therapy (2015) 17:104 DOI /s

Citation for published version (APA): Romanos, J. (2011). Genetics of celiac disease and its diagnostic value. Groningen: s.n.

Immunogenetics of juvenile idiopathic arthritis: A comprehensive review

A Candidate Gene Approach Identifies. the TRAF1/C5 region as a risk factor

Dan Koller, Ph.D. Medical and Molecular Genetics

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

ROLE OF GENES AND ENVIRONMENT FOR THE DEVELOPMENT OF RHEUMATOID ARTHRITIS - RESULTS FROM THE SWEDISH EIRA STUDY

Statistical Genetics : Gene Mappin g through Linkag e and Associatio n

Extended Abstract prepared for the Integrating Genetics in the Social Sciences Meeting 2014

Anti-citrullinated peptide autoantibodies, human leukocyte antigen shared epitope and risk of future rheumatoid arthritis: a nested case control study

Abstract. for DNA extraction. Serum is also stored. The patient completes a Health Assessment Questionnaire (HAQ) [2] adapted for British use [3].

Supplementary Figures

Myoglobin A79G polymorphism association with exercise-induced skeletal muscle damage

Special Report. Genome-wide association studies and musculoskeletal diseases. Future Rheumatology

Evaluating the role of the 620W allele of protein tyrosine phosphatase PTPN22 in Crohn s disease and multiple sclerosis

NIH Public Access Author Manuscript Ann Rheum Dis. Author manuscript; available in PMC 2014 June 01.

Rare Variant Burden Tests. Biostatistics 666

Association between atopic dermatitis-related single nucleotide polymorphisms rs and psoriasis vulgaris in a southern Chinese cohort

Familial Risks and Heritability of Rheumatoid Arthritis

CS2220 Introduction to Computational Biology

Interaction Involving Amino Acids in HLA Proteins and Smoking in Rheumatoid Arthritis

Received: 27 May 2003 Revisions requested: 26 Jun 2003 Revisions received: 14 Aug 2003 Accepted: 19 Aug 2003 Published: 1 Oct 2003

Cover Page. The handle holds various files of this Leiden University dissertation.

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

FTO Polymorphisms Are Associated with Obesity But Not with Diabetes in East Asian Populations: A Meta analysis

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Abstract. Too et al. Arthritis Research & Therapy 2012, 14:R89

Heritability and genetic correlations explained by common SNPs for MetS traits. Shashaank Vattikuti, Juen Guo and Carson Chow LBM/NIDDK

Leveraging Interaction between Genetic Variants and Mammographic Findings for Personalized Breast Cancer Diagnosis

Transcription:

Supplementary Materials Supplementary Methods K-fold cross-validation procedure We randomly divided the original dataset S into K subsets. In the k-th cross-validation, the k-th subset was left out to be the validation set, denoted by S k Va, while the other K-1 subsets were used as the training set, denoted by S k Tr. We implemented the forward selection algorithm, using S k Tr to build a series of prediction models, M k 1, M k 2,..., and applied these models to the validation dataset, S k Va, to estimate the predicted AUC values. The predicted AUC values from the K-fold cross-validation were averaged at each level of model complexity, yielding a series of averaged predicted AUC values from the cross-validation, denoted by (AUC 1, AUC 2,...). The level at which the averaged AUC value stopped increasing was used as the appropriate complexity level for the final prediction model built on the original dataset S. Cross-validation procedure with bootstrap aggregating (bagging) As an alternative to the K-fold cross-validation procedure, we also propose a cross-validation procedure with bootstrap aggregating (bagging) to further improve the method s robustness and power. Bagging was first introduced [1] to reduce an estimator s variance with little cost in bias. Its basic principle is to obtain an aggregated predictor by generating multiple predictors from bootstrap replicate samples of the data. The bagging method works especially well when the prediction method is unstable (i.e., small changes in the learning set result in large changes in prediction). Recently, Petersena et al [2] introduced a cross-validated bagging procedure by integrating bagging into the cross-validation procedure. They showed that an appropriate bias-variance trade-off for the parameter 1

of interest can be achieved by conducting the cross-validation at the level of the bagging estimator itself (i.e., the cross-validated bagging estimator) [1]. Based on this concept, we can use cross-validation with bagging instead of K-fold cross-validation in the forward ROC method. To do this, we randomly divide the original dataset S into K subsets. In the k-th cross-validation, we draw B bootstrap samples (B k 1, B k 2,..., B k B ) from the training set S k Tr. For the b-th bootstrap sample, we implement the forward selection algorithm to build a series of prediction models, M b 1, M b 2,..., and then apply these models to the validation dataset, S k Va, to estimate the predicted AUC values. The predicted AUC values from the B bootstrap samples are then averaged at each level of model complexity, yielding a series of bagging estimators of the predicted AUC values for the k-th cross-validation, denoted by (AUC k 1, AUC k 2,...). The K results are then averaged to provide overall cross-validated bagging AUC estimators, denoted by (AUC 1, AUC 2,...). The level with the highest average AUC value is used as the complexity level for the final prediction model built on the original dataset S. Cross-validation with bagging is a computationally intensive procedure in which a number of bootstrap samples are needed for each fold of cross-validation. Assuming time t is required for the K-fold cross-validation, cross-validation with bagging needs time B t, where B is the number of bootstrap samples in each cross-validation. Despite its heavy computational requirement, cross-validation with bagging has the potential to improve the method s performance and lead to a more stable and more accurate predictive genetic test. 2

Supplementary table 1 Summary of the simulation settings Simulation scenarios I and II Supplementary Simulation 2 Supplementary Simulation 3 Risk Inheritance variants a mode b Odds ratios Interaction c Number of Noise loci d Setting1 10 G e : a, b, c G: 1.2, 1.3, 1.5 a e: 1.7 Setting2 15 E e A, R, D : e E: 1.4 b c: 1.4 Setting3 20 Setting1 a, b, c A, R, D 2, 2.5, 2.6 Setting2 Setting3 a, b, c, e, f, g a, b, c, e, f, g, h, i, j, k a, b, c, e, f, A, R, D, A, R, D A, R, D, A, R, D, A, R, D, R A, A, A, A, R, 1.5, 1.7, 1.8, 1.6, 1.6, 1.9 1.6, 1.9, 1.8, 1.6, 1.7, 1.9,1.6, 1.5, 1.6, 1.9 1.5, 4.0, 1.2, 1.2, 1.3 No interaction No interaction No interaction a f: 2.5 b e: 2.4 a The minor allele frequencies of risk loci were generated randomly from a uniform distribution that 20 20 20 0 ranged from 0.1 to 0.5. b A, R and D represent additive, recessive and dominant modes of inheritance, respectively. c Numbers listed in the Interaction column measure the risk to individuals who carry risk alleles of interacting loci and/or environmental factors vs. all other individuals. For the a e interaction in simulation scenario I, we assumed that individuals who were exposed to the environmental risk level and carried the risk allele a (i.e., high risk individuals) had a 1.7 times higher risk of disease than all other individuals. Similarly, for the b c interaction, we assumed that individuals who carried two risk alleles at locus b and one or more risk alleles at locus c had a 1.4 times higher risk of disease than all other individuals. d The allele frequencies of the non-causal loci were generated randomly from a uniform distribution that ranged from 0.1 to 0.9. e G represents genetic risk factors and E represents environmental risk factors. 3

Supplementary Simulations Supplementary simulation 1 We evaluated the effect of cross-validation with bagging on the forward ROC method using simulation scenarios I and II. The details of the simulation settings are listed in supplementary table 1. For each replicate, 50 bootstrap samples were generated for the bagging cross-validation procedure. The results are summarized in supplementary table 2. In simulation I, we found that the proposed forward ROC method has an overall similar performance, whether using 10-fold cross-validation or using cross-validation with bagging. When there were a small number of loci, 10-fold cross-validation performed slightly better than cross-validation with bagging. However, with an increasing number of noise loci, cross-valuation with bagging attained higher AUC means and smaller mean square errors (MSEs) than 10-fold cross-validation. For instance, when there were 20 noise loci, cross-validation with bagging led to an AUC mean of 0.6000, a 0.25% increase from 10-fold cross-validation. In terms of computational efficiency, 10-fold cross-validation and cross-validation with bagging required 1.5 minutes and 50 minutes, respectively, to complete one simulation replicate on a computer equipped with 4G memory. In simulation II we found, when the missing rate increased, that cross-validation with bagging performed better than 10-fold cross-validation. For instance, with 10% missing data, the forward ROC method using cross-validation with bagging had an AUC mean of 0.6102, a 0.39% increase from using 10-fold cross-validation. 4

Supplementary table 2 A comparison of bagging cross-validation and 10-fold cross validation. Scenario I Forward ROC using 10-fold CV Forward ROC using Bagging CV Number MEAN a BIAS SD b MSE c MEAN BIAS SD MSE 10 Noise Loci 0.6198-0.0312 0.0305 0.0019 0.6178-0.0333 0.0297 0.0020 15 Noise Loci 0.6018-0.0492 0.0343 0.0036 0.6036-0.0474 0.0329 0.0033 20 Noise Loci 0.5985-0.0525 0.0398 0.0043 0.6000-0.0510 0.0379 0.0040 Scenario II Forward ROC using 10-fold CV Forward ROC using Bagging CV Missing % MEAN BIAS SD MSE MEAN BIAS SD MSE 0 0.6198-0.0312 0.0305 0.0019 0.6178-0.0333 0.0297 0.0020 10 0.6078-0.0432 0.0317 0.0029 0.6102-0.0408 0.0311 0.0026 15 0.6007-0.0504 0.0327 0.0036 0.6044-0.0467 0.0323 0.0032 a AUC estimator, b standard deviation, c mean square error Supplementary simulation 2 The random forest (RF) method is a powerful tool for high-dimensional risk prediction [3]. RFs have several unique features, such as being capable of uncovering interactions among genes and/or environmental factors with lower marginal effects [4]. We conducted a simulation to compare the proposed forward ROC method with the RF method. In the simulation, we set the proportion of disease-susceptibility loci to 1/8, 1/4 and 1/3. The detailed settings of the simulation are described in supplementary table 1. The true AUC in all three settings was set to be approximately 0.69. The RF method was performed using the R package randomforest version 4.5 with a forest size of 500 trees. 1000 replicates were simulated, each consisting of 1000 cases and 1000 controls. The simulation results are summarized in supplementary table 3. We found that, when the proportion of disease-susceptibility loci is small (i.e., 1/8) and the effect size is strong, the forward ROC method attained a better performance than the RF method, with an increase of 2.17% in the AUC mean. When the proportion of disease-susceptibility loci increased to 1/4, the two methods had a similar performance, with AUC means of 0.6538 for forward ROC and 0.6561 for RF. With a 5

further increase in the proportion of disease-susceptibility loci and a decrease in the effect size, RFs tended to capture more disease-susceptibility loci and attained higher classification accuracy than the forward ROC method. For instance, when the proportion of disease-susceptibility loci increased to 1/3, the RF method performed better than the forward ROC method, with an increase of 5.82% in the AUC mean. Supplementary table 3 Summary of results from supplementary simulation 2 Proportion of disease-loci 1/8 1/4 1/3 Mean MSE Mean MSE Mean MSE Forward ROC 0.6679 0.0007 0.6538 0.0021 0.6365 0.0060 Random Forest (RF) 0.6537 0.0014 0.6561 0.0018 0.6736 0.0017 Supplementary simulation 3 Some diseases may be caused by a small set of loci with large effect sizes. For instance, five SNPs have been detected as being associated with age-related macular degeneration (AMD), and were combined to explain approximately half of the classical sibling AMD risk [5]. We conducted a simulation under such a disease scenario to assess the performance of the forward ROC method, CART and the allele counting method. In this simulation, we introduced five disease-susceptibility loci and 2 two-way interactions, without including any noise loci (For details, see supplementary table 1). The true AUC was set at 0.8. We compared the forward ROC method with the CART and allele counting methods, based on 1000-replicate simulations. The predicted AUC was calculated from the evaluation sets and the results are summarized in supplementary table 4. We found that the proposed forward ROC method performed better than the other two methods, with a higher AUC mean and a lower MSE. The forward ROC method attained a predicted AUC of 0.7892, which was close to the true AUC of 0.8. The CART and allele counting methods attained AUC values of 0.7633 and 0.7366, respectively. The standard deviations of the AUCs for all three methods were relatively small: 0.0114, 0.0132 and 6

0.0118 for the forward ROC method, CART and the allele counting method, respectively. Supplementary table 4 Summary of results from supplementary simulation 3 MEAN a BIAS SD b MSE c Forward ROC 0.7892-0.0122 0.0114 0.0003 CART 0.7633-0.0381 0.0132 0.0016 Allele counting 0.7366-0.0647 0.0118 0.0043 a AUC estimator. b standard deviation. c mean square error. Wellcome Trust RA GWAS Dataset RA cases from the Wellcome Trust study were recruited by the Arthritis Research Campaign Epidemiology Unit and met the standard clinical criteria for RA. One-half of the controls were chosen from the 1958 British Birth Cohort and the other half came from the UK Blood Service controls. All individuals were genotyped using the Affymetrix 500K chip. We excluded samples of low quality (e.g., low DNA quality). The final samples for analysis comprised 2938 RA cases and 1860 controls. After removing low quality single nucleotide polymorphisms (SNPs) (e.g., those with low allele frequencies), 460,547 SNPs remained for analysis. Supplementary table 5 Summary of the 35 RA associated loci reported from different studies SNP Gene Chromosome Source Reference rs11162922 IFI44 1 500K [6] rs6684865 MMEL1 1 500K [6] rs3890745 MMEL1-TNFRSF14 1 500K [7] rs2240340 PADI4 1 Imputed [8] rs2476601 PTPN22 1 Imputed [9] rs6679677 RSBN1 1 500K [6] rs1061622 TNFRSF1B 1 Imputed [8] rs3087243 CTLA4 2 500K [8] rs3738919 ITGAV 2 Imputed [10] rs7574865 STAT4 2 Imputed [11] rs3816587 ANAPC4 4 500K [6] 7

rs6822844 IL2-IL21 4 Imputed [12] rs3817964 HLA-DRB1 6 Imputed [13] rs660895 HLA-DRB1 6 Imputed [13] rs6910071 HLA-DRB1 6 Imputed [13] rs6457617 MHC 6 500K [6] rs2442728 MHC 6 Imputed [14] rs4678 MHC:VARS2L 6 Imputed [14] rs6920220 OLIG3-TNFAIP3 6 500K [6] rs10499194 OLIG3-TNFAIP3 6 Imputed [15] rs42041 CDK6 7 500K [7] rs2280714 IRF5 7 Imputed [16] rs11761231 N/A 7 500K [6] rs2812378 CCL21 9 500K [7] rs1953126 PHF19 9 Imputed [17] rs10818488 TRAF1 9 Imputed [18] rs3761847 TRAF1-C5 9 Imputed [19] rs2104286 IL2RA 10 500K [6] rs4750316 PRKCQ 10 500K [7] rs1678542 KIF5A-PIP4K2C 12 500K [7] rs1324913 KLF12 13 Imputed [20] rs9550642 N/A 13 500K [6] rs4810485 CD40 20 500K [7] rs2837960 N/A 21 500K [6] rs743777 C1QTNF6 22 500K [6] References 1 Breiman L: Bagging predictors. Machine Learning 1996;24:123-140. 2 Petersena ML, Molinaro AM, Sinisi SE, Van der Laan MJ: Cross-validated bagged learning. Journal of Multivariate Analysis 2007;98:1693-1704. 3 Breiman L: Random Forests. Machine Learning 2001;45:5-23. 4 Bureau A, Dupuis J, Falls K, Lunetta KL, Hayward B, Keith TP, Van EP: Identifying SNPs predictive of phenotype using random forests. Genet Epidemiol 2005;28:171-182. 5 Maller J, George S, Purcell S, Fagerness J, Altshuler D, Daly MJ, Seddon JM: Common 8

variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nat Genet 2006;38:1055-1059. 6 WTCCC: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:661-678. 7 Raychaudhuri S, Remmers EF, Lee AT, Hackett R, Guiducci C, Burtt NP, Gianniny L, Korman BD, Padyukov L, Kurreeman FA, Chang M, Catanese JJ, Ding B, Wong S, van der Helm-van Mil AH, Neale BM, Coblyn J, Cui J, Tak PP, Wolbink GJ, Crusius JB, van der Horst-Bruinsma IE, Criswell LA, Amos CI, Seldin MF, Kastner DL, Ardlie KG, Alfredsson L, Costenbader KH, Altshuler D, Huizinga TW, Shadick NA, Weinblatt ME, de VN, Worthington J, Seielstad M, Toes RE, Karlson EW, Begovich AB, Klareskog L, Gregersen PK, Daly MJ, Plenge RM: Common variants at CD40 and other loci confer risk of rheumatoid arthritis. Nat Genet 2008;40:1216-1223. 8 Plenge RM, Padyukov L, Remmers EF, Purcell S, Lee AT, Karlson EW, Wolfe F, Kastner DL, Alfredsson L, Altshuler D, Gregersen PK, Klareskog L, Rioux JD: Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am J Hum Genet 2005;77:1044-1060. 9 Lee AT, Li W, Liew A, Bombardier C, Weisman M, Massarotti EM, Kent J, Wolfe F, Begovich AB, Gregersen PK: The PTPN22 R620W polymorphism associates with RF positive rheumatoid arthritis in a dose-dependent manner but not with HLA-SE status. Genes Immun 2005;6:129-133. 10 Jacq L, Garnier S, Dieude P, Michou L, Pierlot C, Migliorini P, Balsa A, Westhovens R, Barrera P, Alves H, Vaz C, Fernandes M, Pascual-Salcedo D, Bombardieri S, Dequeker J, Radstake TR, Van RP, van de Putte L, Lopes-Vaz A, Glikmans E, Barbet S, Lasbleiz S, Lemaire I, Quillet P, Hilliquin P, Teixeira VH, Petit-Teixeira E, Mbarek H, Prum B, Bardin T, Cornelis F: The ITGAV rs3738919-c allele is associated with rheumatoid arthritis in the European Caucasian population: a family-based study. Arthritis Res Ther 2007;9:R63. 11 Orozco G, Alizadeh BZ, Delgado-Vega AM, Gonzalez-Gay MA, Balsa A, Pascual-Salcedo D, Fernandez-Gutierrez B, Gonzalez-Escribano MF, Petersson IF, van Riel 9

PL, Barrera P, Coenen MJ, Radstake TR, van Leeuwen MA, Wijmenga C, Koeleman BP, Alarcon-Riquelme M, Martin J: Association of STAT4 with rheumatoid arthritis: a replication study in three European populations. Arthritis Rheum 2008;58:1974-1980. 12 Zhernakova A, Alizadeh BZ, Bevova M, van Leeuwen MA, Coenen MJ, Franke B, Franke L, Posthumus MD, van Heel DA, van der Steege G, Radstake TR, Barrera P, Roep BO, Koeleman BP, Wijmenga C: Novel association in chromosome 4q27 region with rheumatoid arthritis and confirmation of type 1 diabetes point to a general risk locus for autoimmune diseases. Am J Hum Genet 2007;81:1284-1288. 13 Gorman JD, David-Vaudey E, Pai M, Lum RF, Criswell LA: Particular HLA-DRB1 shared epitope genotypes are strongly associated with rheumatoid vasculitis. Arthritis Rheum 2004;50:3476-3484. 14 Vignal C, Bansal AT, Balding DJ, Binks MH, Dickson MC, Montgomery DS, Wilson AG: Genetic association of the major histocompatibility complex with rheumatoid arthritis implicates two non-drb1 loci. Arthritis Rheum 2009;60:53-62. 15 Plenge RM, Cotsapas C, Davies L, Price AL, de Bakker PI, Maller J, Pe'er I, Burtt NP, Blumenstiel B, DeFelice M, Parkin M, Barry R, Winslow W, Healy C, Graham RR, Neale BM, Izmailova E, Roubenoff R, Parker AN, Glass R, Karlson EW, Maher N, Hafler DA, Lee DM, Seldin MF, Remmers EF, Lee AT, Padyukov L, Alfredsson L, Coblyn J, Weinblatt ME, Gabriel SB, Purcell S, Klareskog L, Gregersen PK, Shadick NA, Daly MJ, Altshuler D: Two independent alleles at 6q23 associated with risk of rheumatoid arthritis. Nat Genet 2007;39:1477-1482. 16 Han SW, Lee WK, Kwon KT, Lee BK, Nam EJ, Kim GW: Association of polymorphisms in interferon regulatory factor 5 gene with rheumatoid arthritis: a metaanalysis. J Rheumatol 2009;36:693-697. 17 Chang M, Rowland CM, Garcia VE, Schrodi SJ, Catanese JJ, van der Helm-van Mil AH, Ardlie KG, Amos CI, Criswell LA, Kastner DL, Gregersen PK, Kurreeman FA, Toes RE, Huizinga TW, Seldin MF, Begovich AB: A large-scale rheumatoid arthritis genetic study identifies association at chromosome 9q33.2. PLoS Genet 2008;4:e1000107. 10

18 Kurreeman FA, Padyukov L, Marques RB, Schrodi SJ, Seddighzadeh M, Stoeken-Rijsbergen G, van der Helm-van Mil AH, Allaart CF, Verduyn W, Houwing-Duistermaat J, Alfredsson L, Begovich AB, Klareskog L, Huizinga TW, Toes RE: A candidate gene approach identifies the TRAF1/C5 region as a risk factor for rheumatoid arthritis. PLoS Med 2007;4:e278. 19 Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, Liew A, Khalili H, Chandrasekaran A, Davies LR, Li W, Tan AK, Bonnard C, Ong RT, Thalamuthu A, Pettersson S, Liu C, Tian C, Chen WV, Carulli JP, Beckman EM, Altshuler D, Alfredsson L, Criswell LA, Amos CI, Seldin MF, Kastner DL, Klareskog L, Gregersen PK: TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. N Engl J Med 2007;357:1199-1209. 20 Julia A, Ballina J, Canete JD, Balsa A, Tornero-Molina J, Naranjo A, Alperi-Lopez M, Erra A, Pascual-Salcedo D, Barcelo P, Camps J, Marsal S: Genome-wide association study of rheumatoid arthritis in the Spanish population: KLF12 as a risk locus for rheumatoid arthritis susceptibility. Arthritis Rheum 2008;58:2275-2286. 11