Identifying Biologically Relevant Amino Acids in Immunogenetic Studies

Similar documents
Handling Immunogenetic Data Managing and Validating HLA Data

HLA Disease Associations Methods Manual Version (July 25, 2011)

HLA Mismatches. Professor Steven GE Marsh. Anthony Nolan Research Institute EBMT Anthony Nolan Research Institute

HLA-A * L

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

DEFINITIONS OF HISTOCOMPATIBILITY TYPING TERMS

HLA Amino Acid Polymorphisms and Kidney Allograft Survival. Supplemental Digital Content

Diversity and Frequencies of HLA Class I and Class II Genes of an East African Population

The Human Major Histocompatibility Complex

Completing the CIBMTR Confirmation of HLA Typing Form (Form 2005)

Profiling HLA motifs by large scale peptide sequencing Agilent Innovators Tour David K. Crockett ARUP Laboratories February 10, 2009

HLA and antigen presentation. Department of Immunology Charles University, 2nd Medical School University Hospital Motol

Significance of the MHC

The Major Histocompatibility Complex of Genes

HLA and antigen presentation. Department of Immunology Charles University, 2nd Medical School University Hospital Motol

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

Antigen Presentation to T lymphocytes

Documentation of Changes to EFI Standards: v 5.6.1

HLA and new technologies. Vicky Van Sandt

FONS Nové sekvenační technologie vklinickédiagnostice?

Validation of the MIA FORA NGS FLEX Assay Using Buccal Swabs as the Sample Source

New Enhancements: GWAS Workflows with SVS

A HLA-DRB supertype chart with potential overlapping peptide binding function

Basel - 6 September J.-M. Tiercy National Reference Laboratory for Histocompatibility (LNRH) University Hospital Geneva

Histocompatibility Evaluations for HSCT at JHMI. M. Sue Leffell, PhD. Professor of Medicine Laboratory Director

Significance of the MHC

Minimal Requirements for Histocompatibility & Immunogenetics Laboratory

ASSESSMENT OF THE RISK FOR TYPE 1 DIABETES MELLITUS CONFERRED BY HLA CLASS II GENES. Irina Durbală

Indian Journal of Nephrology Indian J Nephrol 2001;11: 88-97

New trends in donor selection in Europe: "best match" versus haploidentical. Prof Jakob R Passweg

the HLA complex Hanna Mustaniemi,

HLA Complex Genetics & Biology

Fondation Merieux J Craig Venter Institute Bioinformatics Workshop. December 5 8, 2017

IMMUNOLOGY. Elementary Knowledge of Major Histocompatibility Complex and HLA Typing

Role of NMDP Repository in the Evolution of HLA Matching and Typing for Unrelated Donor HCT

How to Find an Unrelated Donor Theory & Technology

Influenza Virus HA Subtype Numbering Conversion Tool and the Identification of Candidate Cross-Reactive Immune Epitopes

The Major Histocompatibility Complex (MHC)

Historical definition of Antigen. An antigen is a foreign substance that elicits the production of antibodies that specifically binds to the antigen.

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

CS2220 Introduction to Computational Biology

Host Genomics of HIV-1

MATCHMAKER, MATCHMAKER, MAKE ME A MATCH, FIND ME A MISMATCHED TRANSPLANT TO CATCH

TCR-p-MHC 10/28/2013. Disclosures. Rheumatoid Arthritis, Psoriatic Arthritis and Autoimmunity: good genes, elegant mechanisms, bad results

EBMT2008_1_21:EBMT :06 Pagina 46 * CHAPTER 3. Immunogenetics of allogeneic HSCT * 3.1. The role of HLA in HSCT. J.M.

Definition of MHC supertypes through clustering of MHC peptide binding repertoires

Factors Influencing Haematopoietic Progenitor cell transplant outcome Optimising donor selection

Antigen Presentation and T Lymphocyte Activation. Shiv Pillai MD, PhD Massachusetts General Hospital Harvard Medical School. FOCiS

Home Brewed Personalized Genomics

Significance of the MHC

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

ASHI Proficiency Testing Program Summary Report. Survey 2013-HT1 / HLA Typing

SUPPLEMENTARY INFORMATION

Human Leukocyte Antigens and donor selection

25/10/2017. Clinical Relevance of the HLA System in Blood Transfusion. Outline of talk. Major Histocompatibility Complex

SEQUENCE FEATURE VARIANT TYPES

HLA and more. Ilias I.N. Doxiadis. Geneva 03/04/2012.

Systems of Mating: Systems of Mating:

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

BDC Keystone Genetics Type 1 Diabetes. Immunology of diabetes book with Teaching Slides

OPTN/UNOS Policy Notice Review of HLA Tables (2016)

Nature Genetics: doi: /ng Supplementary Figure 1

Lack of association of IL-2RA and IL-2RB polymorphisms with rheumatoid arthritis in a Han Chinese population

21/05/2018. Continuing Education. Presentation Recording. learn.immucor.com

Potential cross reactions between HIV 1 specific T cells and the microbiome. Andrew McMichael Suzanne Campion

Alleles: the alternative forms of a gene found in different individuals. Allotypes or allomorphs: the different protein forms encoded by alleles

HOST-PARASITE INTERPLAY

Immunology - Lecture 2 Adaptive Immune System 1

Allele and Haplotype Frequencies of Human Leukocyte Antigen-A, -B, -C, -DRB1, and -DQB1 From Sequence- Based DNA Typing Data in Koreans

The Major Histocompatibility Complex (MHC)

Nomenclature. HLA genetics in transplantation. HLA genetics in autoimmunity

MHC class I MHC class II Structure of MHC antigens:

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Supplementary Figure 1

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk

Clinical Relevance of the HLA System in Blood Transfusion. Dr Colin J Brown PhD FRCPath. October 2017

Genetics and Pharmacogenetics in Human Complex Disorders (Example of Bipolar Disorder)

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

Drug Metabolism Disposition

Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk

Autoimmune diseases. Autoimmune diseases. Autoantibodies. Autoimmune diseases relatively common

CS229 Final Project Report. Predicting Epitopes for MHC Molecules

Use of BONSAI decision trees for the identification of potential MHC Class I peptide epitope motifs.

CHR POS REF OBS ALLELE BUILD CLINICAL_SIGNIFICANCE

Genetic Polymorphisms of Peptidase Inhibitor 3 (Elafin) Are. Associated with Acute Respiratory Distress Syndrome

Rare Variant Burden Tests. Biostatistics 666

Haplotype allelic classes in the lactase persistence locus

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

The Immune Epitope Database Analysis Resource: MHC class I peptide binding predictions. Edita Karosiene, Ph.D.

SUPPLEMENTARY INFORMATION

Major Histocompatibility Complex (MHC) and T Cell Receptors

Calculation Tables. Olerup SSP Kits without Taq Polymeras

Self reported ethnicity

Antigen Recognition by T cells

Two categories of immune response. immune response. infection. (adaptive) Later immune response. immune response

Supplementary Online Content

The Major Histocompatibility Complex

Transcription:

Identifying Biologically Relevant Amino Acids in Immunogenetic Studies Richard M. Single Department of Mathematics and Statistics University of Vermont

Outline HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) Motivation, Definition & Example Amino acid level analyses of HLA disease associations SFVT Analysis & Pairwise allele level analyses Conditional Haplotype analyses & ALD Identifying units of selection ALD as a tool

HLA molecules are cell-surface proteins that present peptide fragments to T-cells HLA class I HLA class II TCR TCR = peptide fragm ent T CR = T -cell recept or -m -m = microglobulin HLA molecules bind specific sets of peptides (based on structure) Any given HLA allele codes to present a subset of available peptides to T-cells

HLA Allele Nomenclature HLA-A * 24 : 02 : 01 : 02 : L Locus Field 1 (2-Digit) Serological level (where possible) Field 2 (4-Digit) Peptide level (amino acid difference) Field 3 (6-Digit) Nucleotide level [silent] (synonymous substitutions) Field 4 (8-Digit) Intron level (3 or 5 polymorphism) Expression N = null L = low S = soluble For most analyses, we want to distinguish among unique peptide sequences, i.e., 2 fields ( 4-digit ) level This level of resolution treats alleles with the same peptide sequence for exons 2 & 3 (class I) or exon 2 (class II) as being equivalent [ binning alleles]

HLA Nomenclature and why it matters Challenges for HLA data management and analysis The HLA genes are very polymorphic; HLA nomenclature is complicated; There are multiple ways to generate HLA data; All common typing systems generate ambiguous data; There are multiple ways to report alleles and ambiguities; These issues make meta-analyses of HLA data from different sources very difficult.

Extending STREGA to Immunogenomic Studies The STrengthening the REporting of Genetic Association studies (STREGA) statement provides community-based data reporting and analysis standards for genomic disease association studies The IDAWG (immunogenomics.org) has proposed an extension of STREGA: STrengthening the REporting of Immunogenomic Studies (STREIS)

From STREGA to STREIS Extensions to the STREGA guidelines for immunogenomic data include: Describing the system(s) used to store, manage, and validate genotype and allele data Documenting all methods applied to resolve ambiguity Defining any codes used to represent ambiguities - e.g., NMDP codes - A*0201/0209/0266 = A*02AJEY - A*0201/0209/0266/0275/0289 = A*02BSFJ Describing any binning or combining of alleles into common categories - e.g., G-codes - A*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289 = A020101g Avoiding the use of subjective terms (e.g. high-resolution typing), that may change over time

Resources for HLA Data Validation & Analysis Immunology Database and Analysis Portal (www.immport.org) Developed under the Bioinformatics Integration Support Contract (BISC) for NIH, NIAID, & DAIT (Division of Allergy, Immunology, and Transplantation) Data validation pipeline Analysis tools Standardized ambiguity reduction tools Data from a large number of immunogenomic studies ImmunoGenomics Data Analysis Working Group (www.immunogenomics.org) (www.igdawg.org) An international collaborative group working to facilitate the sharing of immunogenomic data (HLA, KIR, etc.) and foster consistent analysis and interpretation of immunogenomic data

Outline HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) Motivation, Definition & Example Amino acid level analyses of HLA disease associations SFVT Analysis & Pairwise allele level analyses Conditional Haplotype analyses & ALD Identifying units of selection ALD as a tool

Asymmetric Linkage Disequilibrium (ALD) - Standard LD measures give an incomplete description of the correlation of genetic variation at two loci when there are different numbers of alleles at the loci. - We developed a pair of conditional asymmetric LD (ALD) measures that more accurately capture this information. - For disease association studies, the ALD can help to identify when stratification analyses can be applied to detect primary disease predisposing genes. - For evolutionary studies, the ALD can be informative for the study of forces such as selection acting on individual amino acids, or other loci in high LD. - For SNP studies, ALD measures can be used for analyses of LD between haplotype blocks, for SNP gene LD, and for haplotype block gene LD.

Linkage Disequilibrium (LD) Measures The two most common measures of the strength of LD are: (1) the normalized measure of the individual LD values, namely D ij ' = D ij / D max (Lewontin 1964); and (2) the correlation coefficient r for bi-allelic data, which is most often reported as r 2 = D 2 / (p A1 p A2 p B1 p B2 ). r =1 only when the allelic variations at the two loci show 100% correlation Their multi-allelic extensions are: D I J i 1 j 1 p q D i j ij W n 1 2 I J 2 Dij piq j 2 i 1 j 1 XLD 2N min( I 1 J 1) min( I 1 J 1) 1 2

Asymmetric LD measures: W A/B and W B/A When there are different numbers of alleles at two loci, the direct correlation property for the r measure is not retained. The asymmetric LD (ALD) measures more accurately reflect covariation at two loci. - W A/B and W B/A describe variation observed at the 1 st locus conditioned on the 2 nd Example: (two and three alleles at the A and B loci) f(a 1 B 1 ) = 0.3, f(a 2 B 2 ) = 0.5, f(a 2 B 3 ) = 0.2, W n = 1, W A/B = 1 and W B/A = 0.73, There is variation at the B locus on haplotypes containing the A 2 allele there is not 100% correlation. - ALD measures indicate that, with appropriate sample size, stratification analyses could be carried out for some comparisons. - W n = 1 could result in passing over these data for conditional analyses.

Standard LD measures D and Wn Standard LD measures (overall D & Wn) assume/force symmetry, even though with >2 alleles per locus that is not the case Data Source: Immport Study#SDY26: Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine

Asymmetric Linkage Disequilibrium (ALD) Interpretation: ALD for HLA-DRB1 conditioning on HLA-DQA1 W DRB1 / DQA1 =.58 ALD for HLA-DQA1 conditioning on HLA-DRB1 W DQA1 / DRB1 =.95 ALD row gene conditional on column gene The overall variation for DRB1 is relatively high given specific DQA1 alleles. The overall variation for DQA1 is relatively low given specific DRB1 alleles.

Asymmetric Linkage Disequilibrium (ALD) Table 1. Linkage disequilibrium and genetic diversity measures Description Definition of Measures a 1. Single locus homozygosity (F) b F A = i p Ai 2 2. Haplotype specific homozygosity (HSF) c F A/Bj = i (f ij / p Bj ) 2 3. Overall weighted HSF values d F A/B (and F B/A ) 4. Multi-allelic ALD e squared W A/B (and W B/A ) F A/B = j (F A/Bj ) (p Bj ) W A/B 2 = (F A/B F A ) / (1 F A ) = F A + i j D ij 2 / p Bj Thomson and Single(2014) Genetics

Asymmetric Linkage Disequilibrium (ALD) Table 1. Linkage disequilibrium and genetic diversity measures Description Definition of Measures a 1. Single locus homozygosity (F) b F A = i p Ai 2 2. Haplotype specific homozygosity (HSF) c F A/Bj = i (f ij / p Bj ) 2 3. Overall weighted HSF values d F A/B (and F B/A ) 4. Multi-allelic ALD e squared W A/B (and W B/A ) F A/B = j (F A/Bj ) (p Bj ) W A/B 2 = (F A/B F A ) / (1 F A ) = F A + i j D ij 2 / p Bj If both loci are bi-allelic: W A/B 2 = [ i j (D ij 2 / p Bj )] / (1 F A ) = D 2 / (p A1 p A2 p B1 p B2 ) = r 2, since D 11 = D 12 = D 21 = D 22 =D Thomson and Single(2014) Genetics

Other Conditional Measures of LD Other measures of LD that are conditional have been proposed (Nei and Li, 1980; Chakravarti et al, 1984; Hudson, 1985; Kaplan and Weir, 1992; Guo SW, 1997). - They measure association between alleles at a marker locus (locus B) and alleles at a disease locus (locus A). - They were developed to account for study designs in which individuals are not randomly sampled from a single population, but where sampling intensity varies within disease categories. - They are equivalent to Somer s D statistic defined on the contingency table relating two categorical variables In contrast, our statistic is a population-based measure that does not depend on a specific patient sampling scheme.

ALD & tag-snps in the HLA region DeBakker et al. (2006) identified tag-snps based on r 2 for SNPs with recoded HLA alleles (recoded as presence/absence of each specific HLA allele) DeBakker et al. (2006) Nature Genetics

ALD & tag-snps in the HLA region Thomson and Single(2014) Genetics

Outline HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) Motivation, Definition & Example Amino acid level analyses of HLA disease associations SFVT Analysis & Pairwise allele level analyses Conditional Haplotype analyses & ALD Identifying units of selection ALD as a tool

Juvenile Idiopathic Arthritis oligoarticular persistent (JIA-OP) Common HLA-DRB1 alleles Risk Category I I II II DRB1 *08:01 *11:04 *13:01 *11:01 patients 102 57 90 60 controls 13 11 38 36 OR 6.9 4.3 1.9 1.3 AA 86 implicated via pairwise within serogroup analysis II *01:01 74 50 1.2 II *03:01 89 61 1.1 II *13:02 28 23 0.9 III *04:04 7 16 0.3 III *15:01 38 80 0.3 III *07:01 30 65 0.3 III *04:01 21 47 0.3 sum 596 440 total 708 546 Overall p-value < 2.6E-27

Sequence Feature Variant Type (SFVT) Analysis - Overview An exploratory approach for genetic association studies that uses combinations of amino acid (AA) residues as the unit of analysis. Goal: To identify biologically relevant amino acid (AA) residues that account for the major disease risk attributable to HLA Genes/proteins are sub-divided into biologically relevant units affecting gene expression and/or protein function (i.e., Sequence Features) Polymorphic AAs (single AA sites) Structural features (e.g., beta 1 domain, alpha-helix 2, ) Functional features (e.g., peptide binding, T-cell interacting, ) Combinational (e.g., alpha-helix 2 & peptide binding, )

www.immport.org

Summary of SFVT Analysis HLA Typing (Allele-level) Group HLA alleles based on structural/ functional sequence motifs (Sequence Features) Perform disease association tests based sequence motifs (Sequence Feature-level) ORs & p-values Identify individual AAs & combinations of AAs directly involved in disease risk LD patterns Conditional/ Stratification analyses Choose the top Sequence Features associated with disease risk for further study

Representative Sequence Features: HLA-DRB1 Sequence Feature ID Sequence Feature Name Sequence Feature Type Amino Acid Position(s) HLA-DRB1_SF1 allele Standard Allele Designation NA 497 HLA-DRB1_SF4 mature protein Structural - Complete protein 1..237 52 HLA-DRB1_SF5 beta 1 domain Structural - Domain 1..95 69 HLA-DRB1_SF12 loop between beta-strands 1 & 2 Structural - Secondary structure motif 19, 20, 21, 22 5 HLA-DRB1_SF13 beta-strand 2 Structural - Secondary structure motif 23..32 28 HLA-DRB1_SF21 alpha-helix 2 Structural - Secondary structure motif 65..72 29 HLA-DRB1_SF128 T cell receptor binding Functional 60, 64, 65, 66, 67, 69, 70, 71, 73, 76, 77, 78, 80, 81, 82, 84, 85 81 HLA-DRB1_SF137 peptide antigen binding pocket 7 Functional 28, 30, 47, 61, 67, 71 53 HLA-DRB1_SF163 alpha-helix 2_peptide antigen binding Structural_Functional Combination 67, 70, 71 21 65, 66, 67, 69, 70, HLA-DRB1_SF164 alpha-helix 2_T cell receptor binding Structural_Functional Combination 71 24 # of Variant Types Table from Karp et al. (2010) Hum Molec Genet

Karp et al 2010 Hum Mol Gen Variant Types for HLA-DRB1_SF153 beta-strand 2_peptide antigen binding 5 of 11 Variant Types (VTs) for Sequence Feature 153 (SF153) DRB1_SF153_VT1 (LEC): DRB1*0101, 0102, 0103, 0104, 0105, DRB1_SF153_VT2 (FEL): DRB1*0113, 0701, 0703, 0704, 0705, DRB1_SF153_VT3 (YDY): DRB1*0301, 0304, 0305, 0306, 0308,

SFVT analysis DRB1 summary for JIA-OP DRB1 Amino Acids p-value ORmax ORmin AA position 13 13 2.00E-28 4.9 0.33 Pocket 6 11, 13, 30 4.00E-28 7.1 0.31 Pocket 4 13, 26, 28, 70, 71, 74, 78 6.00E-28 6.8 0.28 DRB1 allele 9.86 1.00E-27 9.4 0.28 Pocket 7 28, 30, 47, 61, 67, 71 9.00E-27 9.4 0.28 AA positions X-LD [11, 12, 10, 16] 9.00E-25 3.2 0.33 AA position 67 67 3.00E-17 3.4 0.54 Pocket 9 9, 37, 57 4.00E-16 3.9 0.33 AA position 74 74 4.00E-16 6.8 0.33 AA position 37 37 4.00E-13 1.8 0.34 AA position 57 57 6.00E-13 3.9 0.44.. AA position 86 86 ns 1.1 0.9 DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9 AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis

SFVT Analysis - Summary An exploratory approach for identifying biologically relevant AAs in HLA association studies Pros Utilizes information about the inter-relationships among HLA alleles Covers more extended protein regions than single amino acid-based analyses Cons Care is needed to address complex patterns of LD among AAs and SFs in order to identify AAs directly involved in disease Due to multiple comparisons with highly correlated SFs appropriate p-value adjustments are necessary The effects of some amino acids (or combinations) may be missed, so complementary analyses are useful

Conditional Haplotype Analysis of JIA-OP DRB1 Amino Acids 13 and 67 13-67 patients controls OR G - F 108 14 6.8 S - F 130 49 2.3 S - I 131 71 1.5 G - I 13 8 1.3 S - L 102 80 1.0 R - I 44 91 0.2 others 270 233 p < 8E-9 AA 13 involved or an AA in LD overall p < 2E-28

Conditional Haplotype Analysis of JIA-OP DRB1 Amino Acids 13 and 67 13-67 patients controls OR G - F 108 14 6.8 S - F 130 49 2.3 S - I 131 71 1.5 G - I 13 8 1.3 S - L 102 80 1.0 R - I 44 91 0.2 others 270 233 p < 0.002 AA 67 involved or an AA in LD p < 0.001 AA 67 involved or an AA in LD An extensive set of CH analyses are required, as well as consideration of LD patterns

SFVT analysis DRB1 summary for JIA-OP DRB1 Amino Acids p-value ORmax ORmin AA position 13 13 2.00E-28 4.9 0.33 Pocket 6 11, 13, 30 4.00E-28 7.1 0.31 Pocket 4 13, 26, 28, 70, 71, 74, 78 6.00E-28 6.8 0.28 DRB1 allele 9.86 1.00E-27 9.4 0.28 Pocket 7 28, 30, 47, 61, 67, 71 9.00E-27 9.4 0.28 AA positions X-LD [11, 12, 10, 16] 9.00E-25 3.2 0.33 AA position 67 67 3.00E-17 3.4 0.54 Pocket 9 9, 37, 57 4.00E-16 3.9 0.33 AA position 74 74 4.00E-16 6.8 0.33 AA position 37 37 4.00E-13 1.8 0.34 AA position 57 57 6.00E-13 3.9 0.44.. AA position 86 86 ns 1.1 0.9 DRB1: AAs 13, 67, 37, 57, 74, 86 in binding pockets 6, 4, 7, and 9 AAs underlined have a potential effect on disease risk, the effect of those in italics may be explained by LD with AA 13. Note that AA 86 is NS by SFVT analysis

LD for DRB1 AAs W n (symmetric) Asymmetric LD (ALD) W n JIA controls ALD row gene conditional on column gene

Conditional Haplotype Analysis of JIA-OP 11_13 Cases Controls OR S-G 121 22 4.89 p<3.6e-06 S-S 363 200 1.81 D-F 9 6 1.15 ns L-F 87 66 1.01 V-H 46 84 0.38 P-R 50 99 0.34 G-Y 30 65 0.33 Total 708 546 12_13 Cases Controls OR T-G 121 22 4.91 p<3.6e-06 T-S 363 200 1.82 K-F 98 76 0.994 K-H 46 84 0.382 p<1.2e-05 K-R 50 99 0.343 K-Y 30 65 0.327 Total 708 546

Common DRB1 Alleles & AAs in JIA-OP OR AA position 13 67 74 86 37 57 6.9 DRB1*0801 G F L G Y S 4.3 DRB1*1104 S F A V Y D 1.9 DRB1*1301 S I A V N D 1.3 DRB1*1101 S F A G Y D 1.2 DRB1*0101 F L A G S D 1.1 DRB1*0301 S L R V N D 0.9 DRB1*1302 S I A G N D 0.3 DRB1*0404 H L A V Y D 0.3 DRB1*1501 R I A V S D 0.3 DRB1*0701 Y I Q G F V 0.3 DRB1*0401 H L A G Y D These alleles show the strongest evidence for direct involvement in JIA-OP disease risk The 6 identified AA sites uniquely define each allele, preventing further stratification analyses

Outline HLA background and nomenclature Asymmetric Linkage Disequilibrium (ALD) Motivation, Definition & Example Amino acid level analyses of HLA disease associations SFVT Analysis & Pairwise allele level analyses Conditional Haplotype analyses & ALD Identifying units of selection ALD as a tool

Balancing Selection Operates at Most HLA Loci Balancing selection can result from: - Overdominance/Heterozygote advantage - Frequency-dependent selection - Selective regimes that change over time/space For HLA, the common factor in these models is rare allele advantage, which is consistent with a pathogen-directed frequency-dependent selection model. At the Amino Acid (AA) level we see - High AA variability at antigen recognition sites (ARS) - Relatively even AA frequencies at ARS sites - Higher rates of non-synonymous vs. synonymous changes at ARS

allele frequency allele frequency allele frequency Homozygosity (F) and the Normalized Deviate (Fnd) F k p Fnd = (F OBS - F EQ ) / SD(F EQ ) i 1 2 i Neutrality Directional Selection Balancing Selection 0.3 F OBS F EQ F nd 0 0.6 F OBS > F EQ F nd > 0 0.12 F OBS < F EQ F nd < 0 0.25 0.5 0.1 0.2 0.4 0.08 0.15 0.3 0.06 0.1 0.2 0.04 0.05 0.1 0.02 0 allele 0 allele 0 allele

Fnd for DRB1 AA sites in JIA Controls Fnd << 0 gives evidence of possible balancing selection. Fnd >> 0 gives evidence of possible directional selection.

Fnd for DRB1 AA sites (Meta-Analysis) Fnd for all polymorphic sites in a meta-analysis of 57 populations Fnd << 0 gives evidence of possible balancing selection. Fnd >> 0 gives evidence of possible directional selection.

LD for DRB1 AAs Wn (symmetric) Asymmetric LD (ALD) Wn : JIA Controls Asymmetric LD : JIA Controls (Row gene conditional on column gene)

Acknowledgements University of Sao Paulo Diogo Meyer University of Graz Wolfgang Helmberg Cincinnati Children s Hospital Susan Thompson David Glass University of Texas Nishanth Marthandan Paula Guidry David Karp Richard Scheuermann Children's Hospital Oakland Research Inst. Steven J. Mack Jill A. Hollenbach Harvard Medical School Alex Lancaster UC Berkeley Glenys Thomson UC San Francisco Owen Solberg Roche Molecular Systems Henry A. Erlich Anthony Nolan Research Inst. Steven G.E. Marsh Matthew Waller NCBI/NIH Mike Feolo NGIT Jeff Wiser Patrick Dunn Tom Smith

Distributions of Fnd values Results from a meta-analysis of 497 HLA population studies in ten geographic regions

Distributions of Fnd values Solberg et al., 2008

Evidence of Balancing Selection at HLA-DPB1 Cano & Fernandez-Vina (2009) described two sequence dimorphisms that define the primary immunodominant serological epitopes for HLA- DPB1. All DPB1 alleles can be divided into four serologic categories (DP1, DP2, DP3, and DP4): AA position Serological Category 56 85 86 87 DP1 A E A V DP2 E G P M DP3 E E A V DP4 A G P M

Global Distribution of DP serological categories

Fnd for DPB1 Alleles ( ) & DP Serological Categories ( ).

Evidence of Balancing Selection at HLA-DPB1 We constructed a randomization test ( random binning to 4 categories) to ensure that the effect was not driven by differences in the observed number of variants at the allele-level vs. serotype-level. Randomization tests have confirmed results for European populations more than in other geographic regions - A possible ascertainment bias? (many common alleles were first identified in European populations) - Could natural selection favoring DPB1 diversity at the serologic level be greater in Europe?

Evidence of Balancing Selection at HLA-DPB1 Supplementary Figure S1. Mean F nd values for trios of variant DPB1 Exon 2 amino acid positions 1 mean Fnd values in variable sets of 3 amino-acid positions vs 36/56/85 paired trios 0.5 0 mean F nd -0.5-1 -1.5 0 50 100 150 200 250 300 350 Amino-Acid Position Trio

Acknowledgements University of Sao Paulo Diogo Meyer University of Graz Wolfgang Helmberg Cincinnati Children s Hospital Susan Thompson David Glass University of Texas Nishanth Marthandan Paula Guidry David Karp Richard Scheuermann Children's Hospital Oakland Research Inst. Steven J. Mack Jill A. Hollenbach Harvard Medical School Alex Lancaster UC Berkeley Glenys Thomson UC San Francisco Owen Solberg Roche Molecular Systems Henry A. Erlich Anthony Nolan Research Inst. Steven G.E. Marsh Matthew Waller NCBI/NIH Mike Feolo NGIT Jeff Wiser Patrick Dunn Tom Smith