Handling Immunogenetic Data Managing and Validating HLA Data

Similar documents
DEFINITIONS OF HISTOCOMPATIBILITY TYPING TERMS

Identifying Biologically Relevant Amino Acids in Immunogenetic Studies

HLA Mismatches. Professor Steven GE Marsh. Anthony Nolan Research Institute EBMT Anthony Nolan Research Institute

HLA-A * L

The Human Major Histocompatibility Complex

An association analysis of the HLA gene region in latent autoimmune diabetes in adults

ASSESSMENT OF THE RISK FOR TYPE 1 DIABETES MELLITUS CONFERRED BY HLA CLASS II GENES. Irina Durbală

Human Leukocyte Antigens and donor selection

HLA Disease Associations Methods Manual Version (July 25, 2011)

Research: Genetics HLA class II gene associations in African American Type 1 diabetes reveal a protective HLA-DRB1*03 haplotype

Documentation of Changes to EFI Standards: v 5.6.1

Indian Journal of Nephrology Indian J Nephrol 2001;11: 88-97

ASHI Proficiency Testing Program Summary Report. Survey 2013-HT1 / HLA Typing

Decomposition of the Genotypic Value

Completing the CIBMTR Confirmation of HLA Typing Form (Form 2005)

Diversity and Frequencies of HLA Class I and Class II Genes of an East African Population

BST227 Introduction to Statistical Genetics. Lecture 4: Introduction to linkage and association analysis

Whole-genome detection of disease-associated deletions or excess homozygosity in a case control study of rheumatoid arthritis

CS2220 Introduction to Computational Biology

Variant Classification. Author: Mike Thiesen, Golden Helix, Inc.

HLA Amino Acid Polymorphisms and Kidney Allograft Survival. Supplemental Digital Content

Effects of age-at-diagnosis and duration of diabetes on GADA and IA-2A positivity

Statistical Tests for X Chromosome Association Study. with Simulations. Jian Wang July 10, 2012

Evaluation of MIA FORA NGS HLA test and software. Lisa Creary, PhD Department of Pathology Stanford Blood Center Research & Development Group

Association between the -77T>C polymorphism in the DNA repair gene XRCC1 and lung cancer risk

2/10/2016. Evaluation of MIA FORA NGS HLA test and software. Disclosure. NGS-HLA typing requirements for the Stanford Blood Center

Bio 312, Spring 2017 Exam 3 ( 1 ) Name:

Cover Page. The handle holds various files of this Leiden University dissertation.

Significance of the MHC

SNPrints: Defining SNP signatures for prediction of onset in complex diseases

Minimal Requirements for Histocompatibility & Immunogenetics Laboratory

For more information about how to cite these materials visit

Analysis of single gene effects 1. Quantitative analysis of single gene effects. Gregory Carey, Barbara J. Bowers, Jeanne M.

Supplementary Figure 1 Dosage correlation between imputed and genotyped alleles Imputed dosages (0 to 2) of 2-digit alleles (red) and 4-digit alleles

Association mapping (qualitative) Association scan, quantitative. Office hours Wednesday 3-4pm 304A Stanley Hall. Association scan, qualitative

Association between the CYP11B2 gene 344T>C polymorphism and coronary artery disease: a meta-analysis

Example HLA-B and abacavir. Roujeau 2014

HLA and new technologies. Vicky Van Sandt

FONS Nové sekvenační technologie vklinickédiagnostice?

Validation of the MIA FORA NGS FLEX Assay Using Buccal Swabs as the Sample Source

Systems of Mating: Systems of Mating:

2) Cases and controls were genotyped on different platforms. The comparability of the platforms should be discussed.

Histocompatibility Evaluations for HSCT at JHMI. M. Sue Leffell, PhD. Professor of Medicine Laboratory Director

5/2/18. After this class students should be able to: Stephanie Moon, Ph.D. - GWAS. How do we distinguish Mendelian from non-mendelian traits?

Association between interleukin-17a polymorphism and coronary artery disease susceptibility in the Chinese Han population

DEFINITIONS: POPULATION: a localized group of individuals belonging to the same species

Title:The effect of CD14 and TLR4 gene polimorphisms on asthma phenotypes in adult Turkish asthma patients: a genetic study

Type 1 diabetes is an autoimmune disease characterized

IMMUNOLOGY. Elementary Knowledge of Major Histocompatibility Complex and HLA Typing

An Introduction to Quantitative Genetics I. Heather A Lawson Advanced Genetics Spring2018

Section 8.1 Studying inheritance

White Paper Estimating Genotype-Specific Incidence for One or Several Loci

PopGen4: Assortative mating

Tutorial on Genome-Wide Association Studies

Patient Consult Report

Transmission Disequilibrium Test in GWAS

Genome-wide association studies for human narcolepsy and other complex diseases

HLA Complex Genetics & Biology

Bio 1M: Evolutionary processes

Review Article Association between HLA-DQ Gene Polymorphisms and HBV-Related Hepatocellular Carcinoma

Introduction to LOH and Allele Specific Copy Number User Forum

Allelic and Haplotype Frequencies of the p53 Polymorphisms in Brain Tumor Patients

The MHC and Transplantation Brendan Clark. Transplant Immunology, St James s University Hospital, Leeds, UK

appstats26.notebook April 17, 2015

Exploring the Importance of Single Nucleotide Polymorphisms of HSPA9 in DNA of Sarcoma Patients

Retrospective Genetic Analysis of Efficacy and Adverse Events in a Rheumatoid Arthritis Population Treated with Methotrexate and Anti-TNF-α

Genetics and Genomics in Medicine Chapter 8 Questions

FTO Polymorphisms Are Associated with Obesity But Not with Diabetes in East Asian Populations: A Meta analysis

The Biology and Genetics of Cells and Organisms The Biology of Cancer

Mendel Short IGES 2003 Data Preparation. Eric Sobel. Department of of Human Genetics UCLA School of of Medicine

MATCHMAKER, MATCHMAKER, MAKE ME A MATCH, FIND ME A MISMATCHED TRANSPLANT TO CATCH

Roadmap. Inbreeding How inbred is a population? What are the consequences of inbreeding?

Myoglobin A79G polymorphism association with exercise-induced skeletal muscle damage

RNA based high-resolution HLA sequencing based typing

How to Find an Unrelated Donor Theory & Technology

Nature Genetics: doi: /ng Supplementary Figure 1

New trends in donor selection in Europe: "best match" versus haploidentical. Prof Jakob R Passweg

HLA-DRB1*1101: A Significant Risk Factor for Sarcoidosis in Blacks and Whites

6/19/2012. Who is in the room today? What is your level of understanding of Donor Antigens and Candidate Unacceptables in KPD?

Research Article Computational Approaches to Facilitate Epitope-Based HLA Matching in Solid Organ Transplantation

Selection at one locus with many alleles, fertility selection, and sexual selection

Cover Page. The handle holds various files of this Leiden University dissertation.

HOST-PARASITE INTERPLAY

Supplementary Figures

National Disease Research Interchange Annual Progress Report: 2010 Formula Grant

New Enhancements: GWAS Workflows with SVS

Supplementary Figure 1. Principal components analysis of European ancestry in the African American, Native Hawaiian and Latino populations.

Nature Genetics: doi: /ng Supplementary Figure 1. Mutational signatures in BCC compared to melanoma.

(b) What is the allele frequency of the b allele in the new merged population on the island?

Hands-On Ten The BRCA1 Gene and Protein

Complex HLA-DR and -DQ Interactions Confer Risk of Narcolepsy- Cataplexy in Three Ethnic Groups

Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk

Performing. linkage analysis using MERLIN

Association between ERCC1 and ERCC2 gene polymorphisms and susceptibility to pancreatic cancer

Mendelian Genetics using Fast Plants Report due Sept. 15/16. Readings: Mendelian genetics: Hartwell Chapter 2 pp , Chapter 5 pp

University of Groningen. Metabolic risk in people with psychotic disorders Bruins, Jojanneke

Supplementary Figure 1: Attenuation of association signals after conditioning for the lead SNP. a) attenuation of association signal at the 9p22.

Outline. How archaics shaped the modern immune system. The immune system. Innate immune system. Adaptive immune system

Lack of association of IL-2RA and IL-2RB polymorphisms with rheumatoid arthritis in a Han Chinese population

Available online at , 2(1):27-37

Transcription:

Handling Immunogenetic Data Managing and Validating HLA Data Steven J. Mack PhD Children s Hospital Oakland Research Institute 16 th IHIW & Joint Conference Sunday 3 June, 2012

Overview 1. Master Analytical Dataset 2. The Perils of Using MS Excel 3. HLA Ambiguity 4. Internal Validation and Standardization of Datasets 5. Analytical Validation of HLA Population Data

Create a Master Dataset For Analysis/Sharing Create a single master data-file for research datasets Integrate demographic, phenotypic, and genotypic information Each analytical element (sample) is a row Each variable (genotype, phenotype, measurement) is a column Where possible, encode variable data numerically avoids errors, ambiguity of meaning, and saves space A data dictionary accompanies and explains the master data-file Describe the meaning and significance of each variable Define the numerical codes for each variable 1 = affected, 0 = control Define the code(s) for missing values -9 = unknown

Should I Store My Data in Excel? Microsoft Excel doesn t speak HLA IMGT/HLA db version Allele Name Excel Error 1.*.* 01011 1011 2.*.* 010101 10101 3.*.* 01:01:01 1:01:01 AM 3.*.* 01:61 0.0840278 Opening a text file by right-clicking and selecting Excel will result in Excel errors like these. Use Excel at your own risk; store data using a text format; consider using database software.

Consider Short-Term and Long-Term Needs for HLA Data Short-Term Need Interim Reporting, Submission Deadline, Data Analysis May often require an abstracted best guess to summarize the data. But that best guess may only be useful at the time it is made. Long-Term Need Archiving, Storage, Deposition into Public Domain Registry, Meta-Analysis, Multi-cycle projects Need to be able to make full use of these data in the future; The goal is to maximize available information, which often requires that primary or raw data be maintained. Should be storing both cleaned and ambiguous data.

HLA Data Ambiguity Allele ambiguity results when the polymorphisms that distinguish alleles fall outside of the regions assessed by the genotyping system. A*02:03:01/A*02:253/A*02:264 identical exon 2&3 sequence Genotype ambiguity results from an inability to establish chromosomal phase between identified polymorphisms. DRB1*04:01:01+DRB1*13:01:01 or DRB1*04:01:01+ DRB1*13:117 or DRB1*04:13+DRB1*13:02:01 or DRB1*04:14+DRB1*14:21 or DRB1*04:35+DRB1*13:40 or DRB1*04:38+DRB1*13:20 identical heterozygous exon 2 sequence An HLA genotype can include both allele and genotype ambiguity A*02:03:01/A*02:253/A*02:264+A*03:01:02 or A*02:171:01+A*03:50 or A*02:171:02+A*03:66 http://www.ebi.ac.uk/imgt/hla/ambig.html

NMDP Allele Codes Coding Ambiguous HLA Data DRB1*04:02:01+DRB1*11:20 or DRB1*04:14+DRB1*11:16 Can be represented as DRB1*04:BK+DRB1*11:YR DRB1*04:BK represents DRB1*04:02/DRB1*04:14 DRB1*11:YR represents DRB1*11:20/DRB1*11:16 But these codes also represent excluded genotypes DRB1*04:02:01+DRB1*11:16 and DRB1*04:14+DRB1*11:20 Allele Codes increase ambiguity and omit information in the 3 rd & 4 th fields http://bioinformatics.nmdp.org/hla/allele_codes/allele_codes.aspx

Coding Ambiguous HLA Data Allele Groups G groups: Alleles with identical nucleotide sequence in the exons encoding the peptide binding domain (exon 2 for class II alleles, and exons 2 & 3 for class I alleles) A*02:03:01G = A*02:03:01/A*02:253/A*02:264 A*02:07:01G = A*02:07:01/A*02:07:02/A*02:15N/A*02:265 http://hla.alleles.org/alleles/g_groups.html P groups: Alleles with identical peptide sequence in the peptide binding domain, with the exclusion of null alleles. A*02:03P = A*02:03:01/A*02:03:02/A*02:03:03/A*02:03:04/A*02:253/A*02:264 A*02:07P = A*02:07:01/A*02:07:02/A*02:265 http://hla.alleles.org/alleles/p_groups.html

Recording Ambiguous HLA Data Genotype Strings Genotype List String (GL String) Uses specific operators to describe the relationships between alleles, allowing genotype data to be recorded in a single line. Order of Precedence Data Delimiter Operator Description 1 ^ Gene/Locus 2 Genotype list 3 + Genotype 4 ~ Haplotype 5 / Alleles http://www.ebi.ac.uk/ipd/kir/standards.html

GL String Representation of Ambiguous HLA Data Allelic ambiguity delimiter (forward slash) Defines an allele list / A*23:26/A*23:39 allele allele Possible alleles at locus A

GL String Representation of Ambiguous HLA Data ~ Haplotype delimiter (tilde) Applied in cis to identify a haplotype DRB5*01:02~DRB1*15:04 allele at locus DRB5 allele at locus DRB1 same chromosome

GL String Representation of Ambiguous HLA Data Genotype delimiter (plus sign) Identifies alleles on different chromosomes, genotype (trans) Delimits haplotype May also indicate gene duplication (ambiguous cis) + A*02:302+A*23:26/A*23:39 allele allele allele HLA-A allelic ambiguity genotype

GL String Representation of Ambiguous HLA Data Genotype list delimiter (pipe) Distinguishes ambiguous genotypes A*02:69+A*23:30 A*02:302+A*23:26/A*23:39 allele allele allele ambiguous allele list genotype genotype

GL String Representation of Ambiguous HLA Data Locus delimiter (carat) Distinguishes loci A*02:69+A*23:30 A*02:302+A*23:26/A*23:39^B*44:02:13+B*49:08 ^ allele allele allele ambiguous allele list allele allele possible genotype for A possible genotype for A Ambiguous genotype list for HLA-A HLA-B genotype

Alternative Format for Recording Ambiguous HLA Data UNIFORMAT Also allows ambiguous genotype data to be recorded in a single line, using different operators (colons, commas, spaces, tabs) than in GL String. identifier {tab mark} allele,allele [{space} allele,allele...][{tab mark} allele, allele...][{tab mark}#comments] sample identifier ambiguous alleles at one locus ambiguous alleles at additional loci comments http://geneva.unige.ch/generate/

Know Your Nomenclature Version Identify the IMGT/HLA database release number applicable to your data http://www.ebi.ac.uk/imgt/hla/ The release number identifies which alleles should and should not be in your data IMGT/HLA db rel Allele 1.13 A*2416 1.14 A*3108 1.15 B*1522 1.16 B*3543 2.28 DPB1*0502 3.0.0 DPB1*104:01 This information allows you to check your data for naming/recording errors

Dataset Validation The Allele Name Translation Tool (ANTT) can be used to validate/update the allele names in column-formatted datasets against/to any IMGT/HLA db release. Parses forward-slash (/) allele ambiguity delimiters; the next version will parse any delimiters (e.g. all GL String delimiters). Documents right-truncated allele names and unrecognized allele names. Identifies the id and row-column position of errors. DRB1*08:02:00 could not be found in the HLA-DRB1.upd translation file.[id = 003][Row = 4 Column = 2] DRB1*04:08 appears to be a truncated version of the DRB1*04:08:01 allele, and was translated to DRB1*0408. [id = 005][Row = 6 Column = 2] http://immunogenomics.org/software.html

Internal Standardization of Data Data Consistency Record data consistently across individuals (and datasets). For individuals typed Record homozygotes as diploid A*02:03:01/A*02:253/A*02:264+A*02:03:01/A*02:253/A*02:264 Use a code to identify missing data ****, -9, missing, etc. Use a code to identify absent loci when recording structural variants DRB1*01:01:01~DRB3*BLANK~DRB4*BLANK~DRB5*BLANK+DRB1*15:01:01~DRB3*BLANK~DRB4*BLANK~DRB5*01:01:01

Internal Standardization of Data Data Modification Document any post-typing modifications made to data. How did A*02:69+A*23:30 A*02:302+A*23:26/A*23:39 become A*02:302+A*23:39? For analysis across individuals and datasets Analyze allele names at the same level of polymorphism Avoid A*01:01:01:01, A*01:01:01, A*01:01, and A*01 in the same analysis You will have to throw out some data/information. Analyze alleles at the same sequence level For exons 2/2& 3 testing, don t analyze alleles in the same G group separately. Analyze DRB1*14:01:01 and DRB1*14:54 as DRB1*14:01:01G. Analyze allele names in the same nomenclature context Don t analyze A*2416 and A*3108; update to a single nomenclature.

Analytical Validation of HLA Population Data The Hardy-Weinberg (HW) model can give you insights into data-quality Hardy-Weinberg Equilibrium The frequency of the alleles should predict the frequency of the genotypes. If it does not (HW deviation), you may have problems with your data. Multi-locus HW deviation Sampling error Related individuals Populations mixed together (admixture) Solution: Review inclusion criteria and remove individuals Critical Typing problem (uncommon) Single-locus HW deviation Typing error Excess of homozygotes due to missed alleles Excess of heterozygotes due to poor assignment Solution: Review and redo typings Selection (unexpected in control-populations)

Example of HW Data Validation 13 DQB1 alleles in a population of n=109 Genotype Observed Count Expected Count p-value DQB1*03:03:02+DQB1*02:01:01 0 3.137 0.0493 DQB1*03:03:02+DQB1*03:03:02 3 0.743 0.0223 Chen s test of individual genotypes in PyPop (http://www.pypop.org) HW deviation due to poor detection of DQB1*02:01:01 in the presence of DQB1*03:03:02, resulting from a SNP in DQB1*02:01:01 under a PCR primer. For population/control datasets, the first analysis done should be a HW test.

Much Thanks To Pierre-Antoine Gourraud Standard Methods for the Management Jill A. Hollenbach of Immunogenetic Data. Frank T. Christiansen Thomas Barnetche and Brian D. Tait (eds.), Immunogenetics: Richard Single Methods and Applications in Clinical Practice, Steven J. Mack Methods in Molecular Biology, vol. 882. pages 197-213. 2012. doi: 10.1007/978-1-61779-842-9_12 Children s Hospital Oakland Henry A. Erlich Janelle Noble Elizabeth Trachtenberg Immunogenetics Colleagues Glenys Thomson Alicia Sanchez-Mazas Owen D. Solberg Martin Maiers Carolyn Hurley Marcel Tilanus Christien Voorter Immunogenetics Community

Handling Immunogenetic Data Managing Highly Polymorphic Data for Disease Association Studies Jill A. Hollenbach, PhD, MPH Children s Hospital Oakland Research Institute 16 th IHIW & Joint Conference Sunday 3 June, 2012

Immunogenetic data require special handling in disease association studies

Immunogenetic data require special handling in disease association studies 1 Highly polymorphic loci

Immunogenetic data require special handling in disease association studies 1 Highly polymorphic loci Many rare alleles >>> sparse cells

Immunogenetic data require special handling in disease association studies 1 Highly polymorphic loci Many rare alleles >>> sparse cells Need to identify all disease associated alleles

Immunogenetic data require special handling in disease association studies 1 Highly polymorphic loci Many rare alleles >>> sparse cells Need to identify all disease associated alleles 2 Strong linkage disequilibrium

Immunogenetic data require special handling in disease association studies 1 Highly polymorphic loci Many rare alleles >>> sparse cells Need to identify all disease associated alleles 2 Strong linkage disequilibrium Identify which loci are primary

Immunogenetic data require special handling in disease association studies 1 Highly polymorphic loci Many rare alleles >>> sparse cells Need to identify all disease associated alleles 2 Strong linkage disequilibrium Identify which loci are primary 3Immunogenetic loci show strong population structure

Case-Control Study

Case-Control Study Statistical tests

Case-Control Study Statistical tests First step: Population analyses

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP Calculation of allele and haplotype frequencies

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP Calculation of allele and haplotype frequencies Association tests

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP Calculation of allele and haplotype frequencies Association tests Contingency tables /chi-squared test

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP Calculation of allele and haplotype frequencies Association tests Contingency tables /chi-squared test Logistic regression

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP Calculation of allele and haplotype frequencies Association tests Contingency tables /chi-squared test Logistic regression Other tests/special cases (Cochran-Armitage test for trend; survival analysis; etc)

Case-Control Study Statistical tests First step: Population analyses Tests for fit to HWEP Calculation of allele and haplotype frequencies Association tests Contingency tables /chi-squared test Logistic regression Other tests/special cases (Cochran-Armitage test for trend; survival analysis; etc)

Case-Control Study Contingency tables

Case-Control Study Contingency tables Test difference (independence) of frequency distributions for categorical variables between groups - 2 test

Case-Control Study Contingency tables Test difference (independence) of frequency distributions for categorical variables between groups - 2 test Always constructed with raw counts, not frequency data

Case-Control Study Contingency tables Test difference (independence) of frequency distributions for categorical variables between groups - 2 test Always constructed with raw counts, not frequency data Analyses can be performed at the allele, genotype, haplotype, amino acid or other levels

Sparse cells in contingency tables

Chi-squared Test Statistic: (O - E) 2 c 2 = E å all cells O is the observed cell counts E is the expected cell counts, where E = Sparse cells in contingency tables (row total column total) 2N

Chi-squared Test Statistic: (O - E) 2 c 2 = E å all cells O is the observed cell counts E is the expected cell counts, where E = Sparse cells in contingency tables (row total column total) 2N c 2 test is inappropriate if any expected count is less than 1 or if the expected count is less than five in more than 20% of all cells in a contingency table *** aka sparse cells

Sparse cells in contingency tables DRB1 case control 0101 4 9 0102 14 13 0103 1 0 0301 18 24 0302 16 23 0401 8 7 0403 2 0 0404 1 2 0405 1 5 0407 0 3 0701 44 21 0801 0 1 0802 3 6 0803 1 1 0804 12 12 0806 1 1 0901 4 11 1001 7 3 1101 30 28 1102 14 11 1104 1 1 1201 22 11 1301 21 12 1302 19 21 1303 9 4 1304 1 2 1401 7 3 1402 0 2 1501 5 9 1502 2 2 1503 36 35 1602 8 3 2n 312 286

Sparse cells in contingency tables DRB1 case control 0101 4 9 0102 14 13 0103 1 0 0301 18 24 0302 16 23 0401 8 7 0403 2 0 0404 1 2 0405 1 5 0407 0 3 0701 44 21 0801 0 1 0802 3 6 0803 1 1 0804 12 12 0806 1 1 0901 4 11 1001 7 3 1101 30 28 1102 14 11 1104 1 1 1201 22 11 1301 21 12 1302 19 21 1303 9 4 1304 1 2 1401 7 3 1402 0 2 1501 5 9 1502 2 2 1503 36 35 1602 8 3 2n 312 286

Sparse cells in contingency tables DRB1 case control 0701 44 21 1503 36 35 1101 30 28 1201 22 11 1301 21 12 1302 19 21 0301 18 24 0302 16 23 0102 14 13 1102 14 11 0804 12 12 1303 9 4 0401 8 7 1602 8 3 1001 7 3 1401 7 3 1501 5 9 0101 4 9 0901 4 11 0802 3 6 0405 1 5 binned 10 15 2n 312 286

Sparse cells in contingency tables DRB1 case control p-value 0701 44 21 0.01 1503 36 35 0.81 1101 30 28 0.95 1201 22 11 0.10 1301 21 12 0.20 1302 19 21 0.56 0301 18 24 0.24 0302 16 23 0.17 0102 14 13 0.97 1102 14 11 0.71 0804 12 12 0.83 1303 9 4 0.23 0401 8 7 0.93 1602 8 3 0.18 1001 7 3 0.27 1401 7 3 0.27 1501 5 9 0.23 0101 4 9 0.13 0901 4 11 0.05 0802 3 6 0.27 0405 1 5 0.09 binned 10 15 0.23

Sparse cells in contingency tables DRB1 case control p-value 0701 44 21 0.01 1503 36 35 0.81 1101 30 28 0.95 1201 22 11 0.10 1301 21 12 0.20 1302 19 21 0.56 0301 18 24 0.24 0302 16 23 0.17 0102 14 13 0.97 1102 14 11 0.71 0804 12 12 0.83 1303 9 4 0.23 0401 8 7 0.93 1602 8 3 0.18 1001 7 3 0.27 1401 7 3 0.27 1501 5 9 0.23 0101 4 9 0.13 0901 4 11 0.05 0802 3 6 0.27 0405 1 5 0.09 binned 10 15 0.23

Identifying all disease associated alleles

Identifying all disease associated alleles Relative predispositional effects method (RPE; Payami et al 1989)

Identifying all disease associated alleles Relative predispositional effects method (RPE; Payami et al 1989) Method to identify all heterogeneity in disease risk at locus of interest

Identifying all disease associated alleles Relative predispositional effects method (RPE; Payami et al 1989) Method to identify all heterogeneity in disease risk at locus of interest Contingency table testing reveals overall difference in allele frequency distributions at a locus

Identifying all disease associated alleles Relative predispositional effects method (RPE; Payami et al 1989) Method to identify all heterogeneity in disease risk at locus of interest Contingency table testing reveals overall difference in allele frequency distributions at a locus But we want to identify all alleles that contribute significantly

Identifying all disease associated alleles Relative predispositional effects method (RPE; Payami et al 1989) Method to identify all heterogeneity in disease risk at locus of interest Contingency table testing reveals overall difference in allele frequency distributions at a locus But we want to identify all alleles that contribute significantly Alleles with the strongest predisposing or protective effects sequentially removed from analysis until no further heterogeneity in risk effects is seen

Identifying all disease associated alleles DRB1 case control p-value 0701 44 21 0.01 0901 4 11 0.05 0405 1 5 0.09 1201 22 11 0.10 0101 4 9 0.13 0302 16 23 0.17 1602 8 3 0.18 1301 21 12 0.20 1501 5 9 0.23 1303 9 4 0.23 binned 10 15 0.23 0301 18 24 0.24 0802 3 6 0.27 1001 7 3 0.27 1401 7 3 0.27 1302 19 21 0.56 1102 14 11 0.71 1503 36 35 0.81 0804 12 12 0.83 0401 8 7 0.93 1101 30 28 0.95 0102 14 13 0.97

Identifying all disease associated alleles DRB1 case control p-value 0701 44 21 0.01 0901 4 11 0.05 0405 1 5 0.09 1201 22 11 0.10 0101 4 9 0.13 0302 16 23 0.17 1602 8 3 0.18 1301 21 12 0.20 1501 5 9 0.23 1303 9 4 0.23 binned 10 15 0.23 0301 18 24 0.24 0802 3 6 0.27 1001 7 3 0.27 1401 7 3 0.27 1302 19 21 0.56 1102 14 11 0.71 1503 36 35 0.81 0804 12 12 0.83 0401 8 7 0.93 1101 30 28 0.95 0102 14 13 0.97

Identifying all disease associated alleles

Identifying all disease associated alleles

Identifying the primary locus

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved.

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03)

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) (0.10)/(0.04)=

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) (0.10)/(0.04)=2.5

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.10)/(0.04)=2.5

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.10)/(0.04)=2.5 (0.05)/(0.02)=

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.10)/(0.04)=2.5 (0.05)/(0.02)=2.5

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.10 0.05 07:01~03:03 0.04 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.10)/(0.04)=2.5 (0.05)/(0.02)=2.5

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved.

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03)

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) (0.12)/(0.02)=

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) (0.12)/(0.02)=6

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.12)/(0.02)=6

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.12)/(0.02)=6 (0.05)/(0.02)=

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.12)/(0.02)=6 (0.05)/(0.02)=2.5

Identifying the primary locus The frequencies of two haplotypes of an allele at the predisposing locus may differ between patients and controls DRB1~DQB1 Haplotype Case Control 07:01~02:01 0.12 0.05 07:01~03:03 0.02 0.02 HOWEVER: the relative frequency of their ratios will be the same if the second locus is not involved. f case (07:01~02:01)/f case (07:01~03:03) f cont (07:01~02:01)/f cont (07:01~03:03) (0.12)/(0.02)=6 (0.05)/(0.02)=2.5

Population substructure in disease association studies

Population substructure in disease association studies (f) PopX (f) LocusA PopY

Population substructure in disease association studies (f) PopX (f) LocusA PopY

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY cases controls

Population substructure in disease association studies (f) PopX cases controls LocusA PopY No association (f) cases controls

Population substructure in disease association studies (f) PopX cases controls LocusA PopY No association (f) cases controls

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY No association PopXY cases controls

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY No association PopXY cases controls cases controls

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY No association PopXY cases controls cases controls

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY No association PopXY cases controls p<.05 cases controls

Population substructure in disease association studies cases controls p<.05

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY No association PopXY cases controls p<.05 cases controls

Population substructure in disease association studies (f) PopX cases controls (f) LocusA PopY No association PopXY cases controls p<.05 cases controls

Population substructure in disease association studies cases controls p<.05

Immunogenetic data require special handling in disease association studies

Immunogenetic data require special handling in disease association studies Highly polymorphic loci

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning Need to identify all associated alleles

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning Need to identify all associated alleles Relative predispositional effects

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning Need to identify all associated alleles Relative predispositional effects Strong linkage disequilibrium

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning Need to identify all associated alleles Relative predispositional effects Strong linkage disequilibrium Identify which loci are primary

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning Need to identify all associated alleles Relative predispositional effects Strong linkage disequilibrium Identify which loci are primary Condition within haplotypes

Immunogenetic data require special handling in disease association studies Highly polymorphic loci Combine low frequency alleles Binning Need to identify all associated alleles Relative predispositional effects Strong linkage disequilibrium Identify which loci are primary Condition within haplotypes Control for population substructure

For further discussion see: Hollenbach JA, Mack SJ, Thomson G, Gourraud PA. Analytical methods for disease association studies with immunogenetic data. Methods Mol Biol. 2012;882:245-66. DOI: 10.1007/978-1-61779-842-9_14

Thank you!