In silico methods for rational design of vaccine and immunotherapeutics

Similar documents
Epitope discovery and Rational vaccine design Morten Nielsen

Workshop presentation: Development and application of Bioinformatics methods

The Immune Epitope Database Analysis Resource: MHC class I peptide binding predictions. Edita Karosiene, Ph.D.

NIH Public Access Author Manuscript Immunogenetics. Author manuscript; available in PMC 2014 September 01.

CS229 Final Project Report. Predicting Epitopes for MHC Molecules

Eur. J. Immunol : Antigen processing 2295

Immune Epitope Database NEWSLETTER Volume 6, Issue 2 July 2009

Degenerate T-cell Recognition of Peptides on MHC Molecules Creates Large Holes in the T-cell Repertoire

Definition of MHC supertypes through clustering of MHC peptide binding repertoires

MetaMHC: a meta approach to predict peptides binding to MHC molecules

Profiling HLA motifs by large scale peptide sequencing Agilent Innovators Tour David K. Crockett ARUP Laboratories February 10, 2009

IMMUNOINFORMATICS: Bioinformatics Challenges in Immunology

NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence

Definition of MHC Supertypes Through Clustering of MHC Peptide-Binding Repertoires

Co-evolution of host and pathogen: HIV as a model. Can Keşmir Theoretical Biology/Bioinformatics Utrecht University, NL

A HLA-DRB supertype chart with potential overlapping peptide binding function

A community resource benchmarking predictions of peptide binding to MHC-I molecules

HOW TO HIT THEM ALL IN ONE BLOW

PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands

Mina John Institute for Immunology and Infectious Diseases Royal Perth Hospital & Murdoch University Perth, Australia

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

The major obstacle in the development of new treatment

Major histocompatibility complex class I binding predictions as a tool in epitope discovery

BIOINFORMATICS. June Immunological Bioinformatics

Vaccine Design: A Statisticans Overview

Short peptide epitope design from hantaviruses causing HFRS

Use of BONSAI decision trees for the identification of potential MHC Class I peptide epitope motifs.

Immuno-Oncology Therapies and Precision Medicine: Personal Tumor-Specific Neoantigen Prediction by Machine Learning

Early History of Vaccination Vaccines

Basic Immunology. Lecture 5 th and 6 th Recognition by MHC. Antigen presentation and MHC restriction

Potential cross reactions between HIV 1 specific T cells and the microbiome. Andrew McMichael Suzanne Campion

Selection of epitope-based vaccine targets of HCV genotype 1 of Asian origin: a systematic in silico approach

Identification and characterization of merozoite surface protein 1 epitope

Immuno-Oncology Therapies and Precision Medicine: Personal Tumor-Specific Neoantigen Prediction by Machine Learning

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

Antigen Presentation to T lymphocytes

Received 25 April 2002/Accepted 21 May 2002

Convolutional and LSTM Neural Networks

Lecture 2 Evolution in action: the HIV virus

Supplementary Figure 1. ALVAC-protein vaccines and macaque immunization. (A) Maximum likelihood

How HIV Causes Disease Prof. Bruce D. Walker

New technologies for studying human immunity. Lisa Wagar Postdoctoral fellow, Mark Davis lab Stanford University School of Medicine

HOST-PATHOGEN CO-EVOLUTION THROUGH HIV-1 WHOLE GENOME ANALYSIS

Machine Learning For Personalized Cancer Vaccines. Alex Rubinsteyn February 9th, 2018 Data Science Salon Miami

Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions

Helminth worm, Schistosomiasis Trypanosomes, sleeping sickness Pneumocystis carinii. Ringworm fungus HIV Influenza

Major Histocompatibility Complex Class II Prediction

NK mediated Antibody Dependent Cellular Cytotoxicity in HIV infections

Epitope discovery. Using predicted MHC binding CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS

Fondation Merieux J Craig Venter Institute Bioinformatics Workshop. December 5 8, 2017

Antigen Recognition by T cells

Host Genomics of HIV-1

5. Over the last ten years, the proportion of HIV-infected persons who are women has: a. Increased b. Decreased c. Remained about the same 1

SUPPLEMENTARY INFORMATION

CELL BIOLOGY - CLUTCH CH THE IMMUNE SYSTEM.

VIP: an integrated pipeline for metagenomics of virus

Gibbs sampling - Sequence alignment and sequence clustering

Major Histocompatibility Complex (MHC) and T Cell Receptors

T cell Vaccine Strategies for HIV, the Virus. With a Thousand Faces

Innate and Cellular Immunology Control of Infection by Cell-mediated Immunity

J. A. Sands, 21 October 2013 Lehigh University

Protein Structure and Computational Biology, Good morning and welcome!

Protein Structure and Computational Biology, Programme. Programme. Good morning and welcome! Introduction to the course

Rajesh Kannangai Phone: ; Fax: ; *Corresponding author

MHC Tetramers and Monomers for Immuno-Oncology and Autoimmunity Drug Discovery

The major histocompatibility complex (MHC) is a group of genes that governs tumor and tissue transplantation between individuals of a species.

Immune Epitope Database NEWSLETTER

Characterizing intra-host influenza virus populations to predict emergence

Significance of the MHC

Modeling the Antigenic Evolution of Influenza Viruses from Sequences

Potential Elucidation of a Novel CTL Epitope in HIV-1 Protease by the Protease Inhibitor Resistance Mutation L90M

Heterosubtypic immunity. Professor Ajit Lalvani FMedSci Chair of Infectious Diseases 14/07/2014

DIRECT IDENTIFICATION OF NEO-EPITOPES IN TUMOR TISSUE

Significance of the MHC

Vaccine Design: A Statisticans Overview

7.012 Quiz 3 Answers

Superior Control of HIV-1 Replication by CD8+ T Cells Targeting Conserved Epitopes: Implications for HIV Vaccine Design

BIOINF 3360 Computational Immunomics

BIOINF 3360 Computational Immunomics

Going Nowhere Fast: Lentivirus genetic sequence evolution does not correlate with phenotypic evolution.

Fayth K. Yoshimura, Ph.D. September 7, of 7 HIV - BASIC PROPERTIES

SUPPLEMENTARY INFORMATION

PERFORMANCE MEASURES

Image of Ebola viruses exiting host cells HUMAN VIRUSES & THE LIMITATION OF ANTIVIRAL DRUG AGENTS

Development of a Universal T Cell Vaccine. Tomáš Hanke Weatherall Institute of Molecular Medicine University of Oxford United Kingdom

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

Image of Ebola viruses exiting host cells HUMAN VIRUSES & THE LIMITATION OF ANTIVIRAL DRUG AGENTS

General information. Cell mediated immunity. 455 LSA, Tuesday 11 to noon. Anytime after class.

Lecture 11. Immunology and disease: parasite antigenic diversity

Evidence of HIV-1 Adaptation to HLA- Restricted Immune Responses at a Population Level. Corey Benjamin Moore

HIV-1 acute infection: evidence for selection?

Alessandra Franco MD PhD UCSD School of Medicine Department of Pediatrics Division of Allergy Immunology and Rheumatology

HLA and antigen presentation. Department of Immunology Charles University, 2nd Medical School University Hospital Motol

International Journal of Pharma and Bio Sciences ANTIGEN PROTEIN FROM SCHISTOSOMA MANSONI: NEW PARADIGM OF SYNTHETIC VACCINE DEVELOPMENT

Antigen processing and presentation. Monika Raulf

Developing Understanding of CMI. Dr Tom Wilkinson Associate Professor of Respiratory Medicine Faculty of Medicine University of Southampton UK

What do epidemiologists expect with containment, mitigation, business-as-usual strategies for swine-origin human influenza A?

2Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch

Evolution of influenza

Lecture 2 The Darwinian Revolution

Transcription:

In silico methods for rational design of vaccine and immunotherapeutics Morten Nielsen, CBS, Department of Systems Biology, DTU and Instituto de Investigaciones Biotecnológicas, Universidad de San Martín, Argentina

Bioinformatics in a nutshell List of peptides that have a given biological feature YMNGTMSQV GILGFVFTL ALWGFFPVV ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS WLSLLVPFV FLPSDFFPS CVGGLLTMV FIAGNSAYE Mathematical model (neural network, hidden Markov model) Search databases for other biological sequences with the same feature/property >polymerase! MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITAD! KRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE! KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE! SQLTITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW! EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMVDILRQ! NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGY! EEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF!...!

Influenza A virus (A/Goose/Guangdong/ /96(H5N)) >Segment! agcaaaagcaggtcaattatattcaatatggaaagaataaaagaactaagagatctaatg! tcgcagtcccgcactcgcgagatactaacaaaaaccactgtggatcatatggccataatc! aagaaatacacatcaggaagacaagagaagaaccctgctctcagaatgaaatggatgatg! gcaatgaaatatccaatcacagcagacaagagaataatggagatgattcctgaaaggaat!! Genome and 3350 other nucleotides on 8 segments!! Proteins >polymerase! MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITAD! KRIMEMIPERNEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPTTSTVHYPKVYKTYFE! KVERLKHGTFGPVHFRNQVKIRRRVDINPGHADLSAKEAQDVIMEVVFPNEVGARILTSE! SQLTITKEKKEELQDCKIAPLMVAYMLERELVRKTRFLPVAGGTSSVYIEVLHLTQGTCW! EQMYTPGGEVRNDDVDQSLIIAARNIVRRATVSADPLASLLEMCHSTQIGGIRMVDILRQ! NPTEEQAVDICKAAMGLRISSSFSFGGFTFKRTNGSSVKKEEEVLTGNLQTLKIKVHEGY! EEFTMVGRRATAILRKATRRLIQLIVSGRDEQSIAEAIIVAMVFSQEDCMIKAVRGDLNF!...! and 9 other proteins 9mer peptides MERIKELRD! ERIKELRDL! RIKELRDLM! IKELRDLMS! KELRDLMSQ! ELRDLMSQS! LRDLMSQSR! RDLMSQSRT! DLMSQSRTR! LMSQSRTRE! and 4376 other 9mers

Recent benchmark studies Class I Peters B, Bui HH, Frankild S et al. A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol 2006; 2:e65. Lin HH, Ray S, Tongchusak S, Reinherz EL, Brusic V. Evaluation of MHC class I peptide binding prediction servers: applications for vaccine research. BMC Immunol 2008; 9:8. Zhang GL, Ansari HR, Bradley P, et al.. Machine learning competition in immunology - Prediction of HLA class I binding peptides. J Immunol Methods. 20 Nov 30;374(-2):-4. Epub 20 Sep 29. Class II Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach.plos Comput Biol 2008; 4:e000048. Lin HH, Zhang GL, Tongchusak S, Reinherz EL, Brusic V. Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research. BMC Bioinformatics 2008; 9(Suppl. 2):S22. Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and toolslianming Zhang, Keiko Udaka, Hiroshi Mamitsuka, Shanfeng ZhuBriefings in bioinformatics. 09/20; DOI: 0.093/bib/bbr060

Prediction of epitopes. Summary Cytotoxic T cell epitope: (A ROC ~ 0.95) Will a given peptide bind to a given MHC class I molecule Helper T cell Epitope (A ROC ~ 0.85) Will a part of a peptide bind to a given MHC II molecule B cell epitope (A ROC ~ 0.80) Will a given part of a protein bind to one of the billions of different B Cell receptors

Validation of binding predictions

Epitope identification rates Pathogen Predicted Binding % Binding Epitopes % Epitopes Influenza virus A 327 94 59 35 Vaccinia virus 77 39 79 8 5 Mycobacterium tuberculosis 207 57 76 6 3 West Nile virus 6 22 76 2 3 Human immunodeciciency virus 84 ND ND 4 62 75-90% of predictions turn out to be binders Only 5-60% are epitopes

Limits in the success rate? Response diversity Hoof, et al., JI, 200

What defines a T cell epitope? MHC binding Processing (Proteasomal cleavage,tap) Other proteases T cell repertoire (similarity to self) and cross-reactivity MHC:peptide complex stability Source protein abundance, cellular location and function

Is there anything beyond MHC binding?

NO

MHC Class I pathway What about the other players? Figure by Eric A.J. Reits

MHC ligands prediction NetCTL (Larsen et al, 2005) Immuno proteasome TAP MHC

Evaluation. MHC ligands from SYFPEITHI Sort on binding Top Rank: AUC=.0 Random Rank: AUC=0.5

Processing Does proteasomal cleavage and TAP matter? NetCTL and MHC-pathway said yes (in 2005)

NetCTL, 2005 Wcl=0.05, Wt=0. (AUC)

20 says - processing has little impact Wcl=0, Wt=0 (AUC) - NetMHCpan Wcl=0.225, Wt= 0.025 (AUC0.) - NetCTLpan Stranzle. Immunogenetics. 200 Jun;62(6):357-68.

Why is this? MHC class I pathway co-evolution Nielsen, Kesmir, Immunogenetics, (2005) 57: 33 4

What defines a T cell epitope? MHC binding Processing (Proteasomal cleavage,tap) ( ) Other proteases T cell repertoire (similarity to self) and cross-reactivity MHC:peptide complex stability Source protein abundance, cellular location and function

Other proteases Trimming prior to binding 90 80 QRSPMFEGTL Rank=.8% RSPMFEGTL Rank =0% % Cytotoxicity 70 60 50 40 30 20 0 QRSPMFEGTL RSPMFEGTL RSPMFEGTLG RSPMFEGT SPMFEGTL 0 000 00 0 0. peptide conc. ng/ml BoLA Class I epitopes, Work by Ivan Morrison and co-workers

What defines a T cell epitope? MHC binding Processing (Proteasomal cleavage,tap) Other proteases T cell repertoire (similarity to self) and cross-reactivity MHC:peptide complex stability Source protein abundance, cellular location and function

T cell cross-reactivity Cross-reactivity is predictable (Pearsons r = 0.35-0.6) T cell cross-reactivity to self Amino acid similarity accounts for T cell cross-reactivity and for "holes" in the T cell repertoire. Frankild et al., PLoS One, 2008

What defines a T cell epitope? MHC binding Processing (Proteasomal cleavage,tap) Other proteases T cell repertoire (similarity to self) and cross-reactivity MHC:peptide complex stability Source protein abundance, cellular location and function

MHC Class I Assay Technology Affinity Stability MHC D A β 2 m Peptid e SA 0.25 0.20 High affinity Low affinity 0000 T ½ = 25 hours [pmhc], nm 0.5 0.0 0.05 K D = 0 nm K D = 2000 nm CPM 000 T ½ = hours Stable Unstable 0.00 0-3 0-2 0-0 0 0 0 2 [Peptide], nm 0 3 0 4 0 5 00 0 20 40 60 Δt, hours @ 37 C Harndahl M, Rasmussen M, Roder G, Buus S.J Immunol Methods. 200 Oct 3.

MHC:peptide Stability Two peptides, at-cell epitope (FLTSVINRV, filled circles ) and a non-immunogenic peptide (NQNDNEETV, open circles) were compared with respect to affinity and stability. A) Peptide titration of two peptides binding with equal affinity to HLA- A*02:0, KD = 7nM. B) The stability of the same two peptides measured at 37 C. M.N Harndahl et al., Eur J Immunol. 202 Jun;42(6):405-6

Assarsson et al, JI, 2007 73 9-mer binders classified into 2 immunogenic, 6 subdominant, 29 cryptic and 26 non-immunogenic

MHC:peptide Stability Stability of 2 pairs (paired on affinity) of immunogenic and nonimmunogenic peptides. M.N Harndahl et al., Eur J Immunol. 202 Jun;42(6):405-6

Predicting stability. A02:0 peptides PCC=0.72

Predicting MHC:peptide Stability (292 HLA-A02:0 ligands) Sort on binding } ligand/nonligand pair 292 Affinity matched A020 SYFPHITHI ligands

Predicting MHC:peptide Stability (290 HLA-A02:0 ligands) P<0.00 292 Affinity matched A020 SYFPHITHI ligands

The challenge of rational epitope selection We have more than 2500 MHC molecules We often have more than 500 different pathogenic strains How to design a method to select a small pool of peptides that will cover both the MHC polymorphism and the pathogen diversity? No peptide will bind to all MHC molecules and few (maybe even no) peptides will be present in all pathogenic strains

Vaccine discovery - HIV case story 0 HIV proteins > 2,000,000 different peptides exist within the known HIV clades Patient diversity More than 2500 different MHC molecules The challenge Select 00 (0.005%) peptides with optimal genomic and HLA coverage

HIV Gag phylogeny Clade C Few peptides conserved between all viral strains Clade AE Clade D Clade A Clade B

Epitope identification 56 (.5%) 9mer are conserved among all Clade A gag sequences

Polyvalent vaccines Select epitopes in a way so that they together cover all strains. Uneven coverage, Average coverage = 2 Strain Strain 2 X Epitope Even coverage, Average coverage = 2 Strain Strain 2

EpiSelect. Pathogen diversity S j G = P i j i δ +Ci Pérez CL. J Immunol. 2008

HLA polymorphism - frequencies Supertypes Phenotype frequencies Caucasian Black Japanese Chinese Hispanic Average A2,A3, B7 83 % 86 % 88 % 88 % 86 % 86% +A, A24, B44 00 % 98 % 00 % 00 % 99 % 99 % +B27, B58, B62 00 % 00 % 00 % 00 % 00 % 00 % Sette et al, Immunogenetics (999) 50:20-22

Response of 3 HIV infected patients to 84 predicted HIV epitopes Perez et al., JI, 2008

All HIV infected patients respond to at least one of nine peptides Perez et al., JI, 2008

HLA supertypes

More on supertypes! A3 A24 A26 A A2

Problems with HLA supertypes Supertypes are not perfect, i.e. HLA-A*:0 and HLA-A*03:0 do not bind the same set of peptides Supertype representatives are not optimal representatives in all populations Guinea Bissau: A2402.5%, A230 6% A030 6%, A0 0% Hong Kong: A2402 5%, A230 0% A030 0.8%, A0 29%

PopCover 2D searching > 2,000,000 different peptides exists within the known HIV clades 22709 peptides with prediction binding affinity stronger than 500 nm to any MHC molecule 5608(tat), 2096(nef), 3848(gag),42748(pol),25926 (env) No Gag peptides are found in all clades and 92% of all Gag peptides are shared only between 0-5% of all clades The challenge Select 64 (less than 0.00%) peptides with optimal genomic and HLA coverage tat(4), nef(5), gag(5), pol(5), env(5)

EpiSelect and PoPCover EpiSelect S j G = The sum is over all genomes i. P j i is if epitope j is present in genome i. C i is the number of times genome i has been targeted in the already selected set of epitopes PopCover S j A+ G = P i j i δ +Ci i k j R ki f k g i β + E ik The sum is over all genomes i and HLA alleles k. R j ki is if epitope j is present in genome i and is presented by allele k, and E ki is the number of times allele k has been targeted by epitopes in genome i by the already selected set of epitopes, f k is the frequency of allele k in a given population and g i is the genomes frequency

Virtual validation Select 5 peptide from the a pool of 300 HCV genomes with optimal genome and HLAclass I coverage Make a set of 000 virtual patients infected with a random virus have HLA haplotype matching a given population Calculate successrate as fraction of patients presenting one or more of the 5 peptides

Experimental validation of HIV class II epitopes Tat Nef Gag Pol Env 38 HIV infected individuals An average of 4,79 recognized peptides per patient Buggert M, PLoS One. 202

Experimental validation Buggert M, PLoS One. 202

Vaccine design. Polytope construction NH2 M Linker COOH Epitope C-terminal cleavage Cleavage within epitopes cleavage New epitopes

Polytope starting configuration Immunological Bioinformatics, The MIT press.

Polytope optimal configuration Immunological Bioinformatics, The MIT press.

Functional clustering of ligand binding domains

Using alignment Align A68:0 (365) versus A68:02 (365). Aln score 2454.000 Aln len 365 Id 0.9863 A68:0 0 MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAA ::::::::::::::::::::::::::::::::::: ::::::::::::::::::::::::::::: A68:02 0 MAVMAPRTLVLLLSGALALTQTWAGSHSMRYFYTSMSRPGRGEPRFIAVGYVDDTQFVRFDSDAA A68:0 65 SQRMEPRAPWIEQEGPEYWDRNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQMMYGCDVGSD ::::::::::::::::::::::::::::::::::::::::::::::::::::::: ::::::: : A68:02 65 SQRMEPRAPWIEQEGPEYWDRNTRNVKAQSQTDRVDLGTLRGYYNQSEAGSHTIQRMYGCDVGPD A68:0 30 GRFLRGYRQDAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQWRAYLEGTCVEWLRRY ::::::: : ::::::::::::::::::::::::::::::::::::::::::::::::::::::: A68:02 30 GRFLRGYHQYAYDGKDYIALKEDLRSWTAADMAAQTTKHKWEAAHVAEQWRAYLEGTCVEWLRRY A68:0 95 LENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: A68:02 95 LENGKETLQRTDAPKTHMTHHAVSDHEATLRCWALSFYPAEITLTWQRDGEDQTQDTELVETRPA A68:0 260 GDGTFQKWVAVVVPSGQEQRYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVITGA ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: A68:02 260 GDGTFQKWVAVVVPSGQEQRYTCHVQHEGLPKPLTLRWEPSSQPTIPIVGIIAGLVLFGAVITGA A68:0 325 VVAAVMWRRKSSDRKGGSYSQAASSDSAQGSDVSLTACKV :::::::::::::::::::::::::::::::::::::::: A68:02 325 VVAAVMWRRKSSDRKGGSYSQAASSDSAQGSDVSLTACKV

Sequence based clustering B58_0 B5_0 B27_05 A24_02 A26_0 A02_0 A68_0 B40_0 A68_02 B39_0 B07_02 B08_0 A03_0 A0_0

Sequence logos HLA-A*6802 HLA-A*680 Seq2Logo: http://www.cbs.dtu.dk/biotools/seq2logo

NetMHCpan - a pan-specific method NetMHC NetMHCpan NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence. Nielsen et al. PLoS ONE 2007

HLA-A02:0 versus HLA-A68:02 A020 A6802 ISCDEGRFK 0.022 0.032 TDRAAQTRE 0.03 0.09 IAPLRMSAT 0.065 0.8 KPAFKTGEE 0.09 0.025 GVERHIHIF 0.060 0.038 TYGWAWLLK 0.036 0.028 AEDIAKTVA 0.034 0.02 MSGNEIYDH 0.025 0.038 EDVERGQVV 0.028 0.7 ILVEHARVE 0.066 0.039 QKPTLTVML 0.055 0.40 AQKTIEWAQ 0.037 0.026 VEHPNVYKM 0.060 0.048 EERASSSKN 0.03 0.07 EDRKGHDRR 0.04 0.020 LQGTTDVTP 0.032 0.025 NIGVILLLT 0.7 0.54 MRLAHDPDA 0.055 0.035 GEYLKEKIR 0.09 0.0 IPRCSPPPP 0.05 0.023 HLA-A680 0.8 0.6 0.4 0.2 PCC: 0.6 0 0 0.2 0.4 0.6 0.8 HLA-A020

HLA-A02:0 versus HLA-A68:0 PCC: 0.09

Clusterring and binding motifs HLA-A68:02 d = PCC(A, B) HLA-A68:0 HLA-A68_02 HLA-A68_0 HLA-A02:0 HLA-A03:0 HLA-A02_0 HLA-A03_0 0.07

Specificity-based clustering (w logos) HLA-A68_02 HLA-B58_0 HLA-A24_02 HLA-A0_0 HLA-B5_0 HLA-A26_0 HLA-A02_0 HLA-A03_0 HLA-B27_05 HLA-B40_0 HLA-B39_0 HLA-B07_02 HLA-B08_0 HLA-A68_0

HLA-B40_02 HLA-A02_0 Patr-A0302 Patr-A040 Patr-B070 Patr-B350 HLA-A3_0 HLA-A34_0 Patr-A60 HLA-A23_0 Patr-B2202 Patr-B202 Patr-B30 Patr-A0902 HLA-B58_0 Patr-A060 HLA-B08_0 HLA-B27_05 HLA-A29_02 Patr-B200 HLA-B39_05 Patr-B290 HLA-A03_0 Patr-B080 Patr-B030 HLA-B40_0 Patr-A00 HLA-B44_02 Patr-B0 Patr-A080 Patr-A703 HLA-A68_0 HLA-B56_0 Patr-B020 HLA-B5_0 Patr-B0502 HLA-B49_0 Patr-A070 HLA-B46_0 Patr-B270 HLA-A_0 Patr-B2303 Patr-B40 Patr-A050 Patr-B300 Patr-A40 Patr-A80 HLA-B07_02 Patr-B040 HLA-A80_0 Patr-B70 HLA-A33_03 HLA-A30_02 Patr-A20 HLA-B5_0 Patr-B250 HLA-A02_04 Patr-B280 HLA-B48_0 HLA-B35_0 HLA-A26_0 Patr-B360 Patr-A00 Patr-B00 Patr-B00 Patr-B230 HLA-B57_0 Patr-A70 HLA-B45_0 Patr-B90 HLA-B39_06 Patr-A020 Patr-B60 Patr-B220 Patr_AL Patr-B370 HLA-A0_0 HLA-A02_06 Patr-A0 HLA-B3_0 Patr-A50 Patr-A090 Patr-B090 HLA-B39_0 Patr-A240 Patr-A230 HLA-A24_02 Patr-B80 Patr-A30 Patr-B260 Patr-B20 Patr-B240 Patr-B703 Patr-A030 Patr-B050 Patr-B060 HLA-A68_02 0.97 0.97 0.97 0.98 0.98 0.98 0.8 0.99 0.98 0.99 0.8 0.6 0.99 0.98 0.99 0.98 0.99 0.99 0.97 0.99 0.98 0.97 0.99 0.79 0.98 0.97 0.8 0.98 0.97 0.8 0.98 A3 A24 B62 A B27 B39 B44 A2 B59 A26 B7 B8 The selective sweep theory and the lost chimpanzee alleles A26, B27, B62 and A2 are lost from the chip repertoire B58

SLA (Swine) and men 00 53 99 00 97 99 00 00 98 00 00 62 00 00 00 99 00 00 00 00 99 00 00 00 00 00 00 00 00 6 00 00 00 0000 HLA-A260 HLA-B2705 HLA-B400 HLA-B0702 HLA-B080 HLA-B390 HLA-A030 HLA-A020 HLA-A2402 SLA-200 SLA-3040 SLA-00 SLA-080 SLA-040 HLA-B50 SLA-2050 HLA-B580 HLA-A00 SLA-2040 SLA-070

BoLA-N0340 BoLA-N0402 BoLA-N090 BoLA-N040 BoLA-T2c BoLA-N0030 BoLA-N0260 BoLA-N0490 BoLA-N0302 BoLA-HD6 BoLA-N0502 BoLA-N050 BoLA-N0500 BoLA-N0370 BoLA-N0460 BoLA-N050 BoLA-N0430 BoLA-N0470 BoLA-NC0040 BoLA-N00802 BoLA-N0080 BoLA-N0480 BoLA-N030 BoLA-D8.4 BoLA-T5 BoLA-N0090 BoLA-N0240 BoLA-N02402 BoLA-N0220 BoLA-T2a BoLA-N0450 BoLA-N0440 BoLA-N0550 BoLA-N080 BoLA-N0300 BoLA-N000 BoLA-N00602 BoLA-N0060 BoLA-N060 BoLA-N0250 BoLA-NC4000 BoLA-NC40020 BoLA-NC3000 BoLA-N00 BoLA-N0400 BoLA-N0390 BoLA-AW0 BoLA-N0002 BoLA-N0003 BoLA-JSP. BoLA-N0350 BoLA-N070 BoLA-N0702 BoLA-N0380 BoLA-N0270 BoLA-N0040 BoLA-N0560 BoLA-N0540 BoLA-T7 BoLA-N0520 BoLA-N0360 BoLA-NC2000 BoLA-N0050 BoLA-N0200 BoLA-N0420 BoLA-N0290 BoLA-N0280 BoLA-N020 BoLA-T2b BoLA

HLA-B30 BoLA-N040 HLA-B400 BoLA-N090 HLA-B4002 HLA-B490 BoLA-N0402 BoLA-N0200 HLA-B450 HLA-B4402 BoLA-N0420 HLA-B2705 HLA-A300 BoLA-N0550 BoLA-T2a HLA-A0 HLA-A030 BoLA-N0440 BoLA-N0450 HLA-A680 HLA-A30 HLA-A3303 HLA-B080 HLA-B50 HLA-B560 BoLA-N0060 BoLA-N00602 BoLA-N0250 BoLA-N060 HLA-B0702 BoLA-N000 BoLA-N0300 HLA-B350 BoLA-N080 BoLA-N0702 BoLA-N070 BoLA-N0350 BoLA-JSP. BoLA-N0002 BoLA-AW0 BoLA-N0003 BoLA-N0390 HLA-A2402 HLA-A230 BoLA-NC3000 BoLA-N0400 BoLA-NC4000 BoLA-NC40020 BoLA-N00 HLA-B580 HLA-B570 BoLA-N0540 BoLA-N0560 HLA-A00 BoLA-N0480 BoLA-N0240 BoLA-N02402 BoLA-N0220 BoLA-D8.4 BoLA-T5 BoLA-N030 HLA-B50 BoLA-N0090 BoLA-NC0040 BoLA-NC000 BoLA-N0470 HLA-A3002 HLA-A800 HLA-A2902 BoLA-N0430 BoLA-N0460 BoLA-N050 HLA-B460 BoLA-N00802 BoLA-N0080 BoLA-N0370 BoLA-N0500 BoLA-N0040 BoLA-N0270 BoLA-N0380 BoLA-T7 BoLA-N0520 BoLA-N0360 BoLA-NC2000 HLA-A0206 HLA-A0204 HLA-A020 HLA-A6802 BoLA-N0050 HLA-A260 HLA-A340 BoLA-N0030 HLA-B480 BoLA-HD6 BoLA-N0302 BoLA-N050 BoLA-N0502 BoLA-N0260 BoLA-N0490 BoLA-T2c HLA-B3905 HLA-B390 HLA-B3906 BoLA-N0280 BoLA-N020 BoLA-N0290 BoLA-N0340 BoLA-T2b BoLA+Human D8.4 JSP. T7 A020 B8

BoLA+Human (MHCcluster-2.0)

What defines a T cell epitope? MHC binding Processing (Proteasomal cleavage,tap) ( ) Other proteases T cell repertoire (similarity to self) and cross-reactivity ( ) MHC:peptide complex stability Source protein abundance, cellular location and function)

But, can we find the haystack?

MTB (mycobacterium tuberculosis) Bacterial genome coding for more then 4000 proteins 700 known epitopes, found in only 30 proteins (ORFs) TB W epitopes

MTB (mycobacterium tuberculosis) Bacterial genome coding for more then 4000 proteins 700 known epitopes, found in only 30 proteins (ORFs) Is this biology, or history? More than 500.000 unique 9mer peptides Where to start? Each HLA allele will binding ~5000 of these peptides..

Functional bias in TB epitope proteins Tang et al. J Immunol. 20 Jan 5;86(2):068-80.

Functional bias in TB epitope proteins All selections equally good, except for conservation Tang et al. J Immunol. 20 Jan 5;86(2):068-80.

CBS immunology web servers www.cbs.dtu.dk/services

Acknowledgements Immunological Bioinformatics group, CBS, DTU Ole Lund - Group leader Claus Lundegaard - Data bases, HLA binding predictions Collaborators IMMI, University of Copenhagen Søren Buus: MHC binding La Jolla Institute of Allergy and Infectious Diseases A. Sette, B. Peters: Epitope database Karolinska Institutet, Stockholm Annika Karlsson, HIV data and many, many more

And some of the people who really did the work Ilka Hoof Nicolas Rapin Hao Zhang Mette Voldby Larsen Thomas Stranzl Massimo Andreatta Leon Jessen Edita Karosiene Martin Thomsen Jens Kringelum www.cbs.dtu.dk/services