Use of BONSAI decision trees for the identification of potential MHC Class I peptide epitope motifs.

Similar documents
Profiling HLA motifs by large scale peptide sequencing Agilent Innovators Tour David K. Crockett ARUP Laboratories February 10, 2009

Definition of MHC supertypes through clustering of MHC peptide binding repertoires

Vaccine Design: A Statisticans Overview

Potential cross reactions between HIV 1 specific T cells and the microbiome. Andrew McMichael Suzanne Campion

Antigen Presentation to T lymphocytes

A HLA-DRB supertype chart with potential overlapping peptide binding function

Degenerate T-cell Recognition of Peptides on MHC Molecules Creates Large Holes in the T-cell Repertoire

Contents. Just Classifier? Rules. Rules: example. Classification Rule Generation for Bioinformatics. Rule Extraction from a trained network

The Immune Epitope Database Analysis Resource: MHC class I peptide binding predictions. Edita Karosiene, Ph.D.

FOCiS. Lecture outline. The immunological equilibrium: balancing lymphocyte activation and control. Immunological tolerance and immune regulation -- 1

RAISON D ETRE OF THE IMMUNE SYSTEM:

MHC Tetramers and Monomers for Immuno-Oncology and Autoimmunity Drug Discovery

Antigen Presentation and T Lymphocyte Activation. Abul K. Abbas UCSF. FOCiS

CS229 Final Project Report. Predicting Epitopes for MHC Molecules

SUPPLEMENTARY INFORMATION

New technologies for studying human immunity. Lisa Wagar Postdoctoral fellow, Mark Davis lab Stanford University School of Medicine

The Major Histocompatibility Complex (MHC)

Immuno-Oncology Therapies and Precision Medicine: Personal Tumor-Specific Neoantigen Prediction by Machine Learning

Lecture 6. Burr BIO 4353/6345 HIV/AIDS. Tetramer staining of T cells (CTL s) Andrew McMichael seminar: Background

Antigen Recognition by T cells

RAISON D ETRE OF THE IMMUNE SYSTEM:

DIRECT IDENTIFICATION OF NEO-EPITOPES IN TUMOR TISSUE

Immuno-Oncology Therapies and Precision Medicine: Personal Tumor-Specific Neoantigen Prediction by Machine Learning

the HLA complex Hanna Mustaniemi,

Antigen capture and presentation to T lymphocytes

Determinants of Immunogenicity and Tolerance. Abul K. Abbas, MD Department of Pathology University of California San Francisco

Attribution: University of Michigan Medical School, Department of Microbiology and Immunology

T-cell activation T cells migrate to secondary lymphoid tissues where they interact with antigen, antigen-presenting cells, and other lymphocytes:

T-cell activation T cells migrate to secondary lymphoid tissues where they interact with antigen, antigen-presenting cells, and other lymphocytes:

Introduction to Immunology Part 2 September 30, Dan Stetson

Adaptive Immune System

Key Concept B F. How do peptides get loaded onto the proper kind of MHC molecule?

B F. Location of MHC class I pockets termed B and F that bind P2 and P9 amino acid side chains of the peptide

Immunogens and Antigens *

IMMUNOINFORMATICS: Bioinformatics Challenges in Immunology

Gibbs sampling - Sequence alignment and sequence clustering

Lecture 11. Immunology and disease: parasite antigenic diversity

Principles of Adaptive Immunity

Definition of MHC Supertypes Through Clustering of MHC Peptide-Binding Repertoires

MHC class I MHC class II Structure of MHC antigens:

Major Histocompatibility Complex (MHC) and T Cell Receptors

Practical Solution: presentation to cytotoxic T cells. How dendritic cells present antigen. How dendritic cells present antigen

C. Incorrect! MHC class I molecules are not involved in the process of bridging in ADCC.

Supplementary Figure 1. Using DNA barcode-labeled MHC multimers to generate TCR fingerprints

CONVENTIONAL VACCINE DEVELOPMENT

The T cell receptor for MHC-associated peptide antigens

T Cell Receptor Optimized Peptide Skewing of the T-Cell Repertoire Can Enhance Antigen Targeting

Basel - 6 September J.-M. Tiercy National Reference Laboratory for Histocompatibility (LNRH) University Hospital Geneva

Adaptive immune responses: T cell-mediated immunity

The Major Histocompatibility Complex of Genes

Immunology - Lecture 2 Adaptive Immune System 1

Cell Mediated Immunity (I) Dr. Aws Alshamsan Department of Pharmaceu5cs Office: AA87 Tel:

Antigens and Immunogens

Immunity and Cancer. Doriana Fruci. Lab di Immuno-Oncologia

Immunotherapy for HCC

HLA and antigen presentation. Department of Immunology Charles University, 2nd Medical School University Hospital Motol

Helminth worm, Schistosomiasis Trypanosomes, sleeping sickness Pneumocystis carinii. Ringworm fungus HIV Influenza

PEPVAC: a web server for multi-epitope vaccine development based on the prediction of supertypic MHC ligands

T cell recognition. Statistics & Dynamics of Functional Sensitivity

Innate and Cellular Immunology Control of Infection by Cell-mediated Immunity

HLA and antigen presentation. Department of Immunology Charles University, 2nd Medical School University Hospital Motol

Attribution: University of Michigan Medical School, Department of Microbiology and Immunology

Epitope discovery and Rational vaccine design Morten Nielsen

Supplementary Data 1. Alanine substitutions and position variants of APNCYGNIPL. Applied in

Empirical function attribute construction in classification learning

Chapter 22: The Lymphatic System and Immunity

Neoantigen Identification using ATLAS Across Multiple Tumor Types Highlights Limitations of Prediction Algorithms

General information. Cell mediated immunity. 455 LSA, Tuesday 11 to noon. Anytime after class.

Significance of the MHC

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Immunology Basics Relevant to Cancer Immunotherapy: T Cell Activation, Costimulation, and Effector T Cells

Mina John Institute for Immunology and Infectious Diseases Royal Perth Hospital & Murdoch University Perth, Australia

Structure and Function of Antigen Recognition Molecules

Development Team. Head, Department of Zoology, University of Delhi. Department of Zoology, University of Delhi

Phase of immune response

Machine Learning For Personalized Cancer Vaccines. Alex Rubinsteyn February 9th, 2018 Data Science Salon Miami

Variable HIV peptide stability in human cytosol is critical to epitope presentation and immune escape

1. Overview of Adaptive Immunity

Identification and characterization of merozoite surface protein 1 epitope

Foundations in Microbiology

Cytotoxicity assays. Rory D. de Vries, PhD 1. Viroscience lab, Erasmus MC, Rotterdam, the Netherlands

Predicting Protein-Peptide Binding Affinity by Learning Peptide-Peptide Distance Functions

Autoimmunity. By: Nadia Chanzu, PhD Student, UNITID Infectious Minds Presentation November 17, 2011

Cellular Pathology of immunological disorders

The Adaptive Immune Response: T lymphocytes and Their Functional Types *

Cancer Vaccines and Cytokines. Elizabeth A. Mittendorf, MD, PhD Assistant Professor Department of Surgical Oncology

stably bound by HLA-A*0201 is significantly T. Elliott

Tumors arise from accumulated genetic mutations. Tumor Immunology (Cancer)

Mechanisms of antagonism of HIVspecific CD4+ T cell responses BSRI

A Query by HIV. I. A query by HIV. II. Recursion

Our T cell epitope test service offers you:

award M. tuberculosis T cell epitope analysis reveals paucity of antigenic variation and identifies rare variable TB antigens Mireia Coscolla

Nobel Prize in Physiology or Medicine 2018

Defensive mechanisms include :

Significance of the MHC

Bjoern Peters La Jolla Institute for Allergy and Immunology Buenos Aires, Oct 31, 2012

M.Sc. III Semester Biotechnology End Semester Examination, 2013 Model Answer LBTM: 302 Advanced Immunology

Test Bank for Basic Immunology Functions and Disorders of the Immune System 4th Edition by Abbas

ProPred1: prediction of promiscuous MHC Class-I binding sites. Harpreet Singh and G.P.S. Raghava

Received 25 April 2002/Accepted 21 May 2002

Transcription:

Use of BONSAI decision trees for the identification of potential MHC Class I peptide epitope motifs. C.J. SAVOIE, N. KAMIKAWAJI, T. SASAZUKI Dept. of Genetics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-Ku, Fukuoka 812 Japan S. KUHARA Graduate School of Genetic Resources Technology Kyushu University, Hakozaki, Fukuoka 812 Japan Recognition of short peptides of 8 to 10 mer bound to MHC class I molecules by cytotoxic T lymphocytes forms the basis of cellular immunity. While the sequence motifs necessary for binding of intracellular peptides to MHC have been well studied, little is known about sequence motifs that may cause preferential affinity to the T cell receptor and/or preferential recognition and response by T cells. Here we demonstrate that computational learning systems can be useful to elucidate sequence motifs that affect T cell activation. Knowledge of T cell activation motifs could be useful for targeted vaccine design or immunotherapy. With the BONSAI computational learning algorithm, using a database of previously reported MHC bound peptides that had positive or negative T cell responses, we were able to identify sequence motif rules that explain 70% of positive T cell responses and 84% of negative T cell responses. 1 Introduction 1.1 MHC Class I Peptide Motifs MHC class I molecules bind short peptides of 8 to 10 mer that are primarily derived from endogenous proteins. MHC class I molecules possess peptide binding preferences at certain amino acid positions that are referred to as binding anchor residues. The bound peptides are recognized by the T cell receptors of CTL and are the primary antigenic determinants of the cellular immune response. The affect of certain amino acid residues of MHC class I bound peptides on binding to MHC has been well characterized by studies of naturally bound, eluted peptides and binding affinity studies of synthesized peptides 1,2. This has facilitated the prediction of which peptides might bind to MHC molecules with high affinity. Such knowledge has been useful in reducing the amount of synthesized peptides that must be produced and screened in the search as peptide epitope candidates. However, while affinity to MHC is necessary for recognition by the TCRs of cytotoxic T cells, TCR affinity alone is insufficient to cause activation and immune response by T cells. Even among of peptides that bind to MHC with equally high affinities, there can be great variations in T cell responsiveness. The affect of single

amino acid substitutions on responses by particular CTL clones have been well reported, but generalized sequence motif rules unrelated to binding affinity that influence or predict T cells have yet to be reported, although T cell responses to large synthesized peptide libraries indicate the existence of donor-independent activation motifs 3. Identification of such motifs would be extremely useful for targeted vaccine development and epitope prediction in the investigation of immune responses in viral infection, cancer and autoimmune disease. The idea of mining potential MHC peptide epitopes based on the notion that T cell epitopes in proteins motifs tend to be concentrated into epitope-rich regions has been suggested 4,,5. However, this approach does not take account of the different binding characteristics of different MHC molecules or allelic variants nor does it discriminate between binding affinity for MHC and T cell activation. To elucidate motifs that accurately predict the likelihood of a particular bound peptide to elucidate a response, which would be useful for mining of candidate epitopes from biological databases, independent, MHC allele-specific knowledge of both the sequence motifs responsible for MHC affinity, and those motifs which affect T cell activation, is critical. 1.2 Identification of Peptide Motifs with the BONSAI Program Knowledge acquisition from amino acid sequences by learning algorithms has been useful in the prediction of functional motifs in proteins, such as transmembrane domains 6,10,11. The BONSAI program is based on computational learning and is used to elucidate sequence motif rules that explain the distinguishing sequence properties between two groups of amino acid sequences. In this paper, we used the BONSAI program to investigate the motif rules that predict T cell activation from a data set of peptides with reported high binding affinity to the same MHC class I molecule (HLA-A*0201). 2. Data and Methods 2.1 MHC Class I Peptide Sequence Data Set The data samples used for this work were obtained from the MHCPEP database 8. This database is comprised of over 13,000 MHC-bound peptides that have been previously described in the literature. The database entries include fields for the MHC corresponding molecule, binding affinity and the presence or lack of T cell

activity in response to a given peptide. Of the peptides in the database, 243 peptides were reported to exhibit high binding affinity to HLA-A*0201 and have either a positive or negative T cell response. Peptides for which the T cell response characteristics were unknown were excluded from our study. Of the high-binders 174 were reported to elicit positive cytotoxic T cell responses and 69 were reported to be negative for T cell activation. These 174 peptides comprised the study data set. 2.2 The BONSAI Program The BONSAI algorithm is based on a elementary formal system algorithm which uses dynamic indexing to determine the appropriate regular expression clusters that produce optimum rules to explain differences between positive and negative amino acid sequence samples. Decision tree algorithms have been shown to be an effective approach for the identification of motifs from protein sequences such as the transmembrane regions in proteins. Unlike ID3 7,9, in BONSAI, the index attributes of variable regular expressions are not predefined. Rather, the index expressions are defined by optimization of discrimination resolution between positive and negative examples at runtime. Only the allowable maximum of index expressions is predefined outside the system. An allowable maximum for variables is necessary because it is known that the class of regular pattern languages is not polynomialtime learnable unless NP RP. The proofs for the algorithms upon which the BONSAI program are described in detail elsewhere 10,11. Fig. 1 describes the algorithm in brief. For a decision tree T over regular patterns, let nodes (T) be the number of nodes in T, and τ(t) be the set of trees constructed by replacing a leaf v of T by the tree of Figure 1 (a) or Figure 1 (b) for some pattern π. The score function Score(T,P,N) balances the information gains in classification and is defined as Score(T, P, N) = P L(T). N L(T). P N This algorithm DT(P,N,MaxNode) checks all leaves at each phase of node generation. This algorithm is noise-tolerant in that it allows conflicts between positive and negative training examples.

π π 0 1 1 0 (a) (b) function DT (P, N: sets of strings, MaxNode: int ): tree; begin if N = 0 then return ( CREATE ( 1, null, null) ) else if P = 0 then return ( CREATE( 0, null, null) ) else begin T CREATE( 1, null, null); while (nodes(t) < MaxNode and Score(T,P,N) < 1 ) do begin find T max τ(t) that maximizes Score (T max, P, N); T T max end return (T) end Figure 1. BONSAI decision tree algorithm. 3. Results 3.1 Indexing Clusters Figure 2 shows the decision tree generated by BONSAI for the HLA-A*0201 peptide binding peptide sequences. The decision tree for the panels of T Cell

immunoreactivity negative (TCI-) and positive (TCI+) peptides resulted in four index groups of amino acids; [0: A, K, M,], [1: C, D, T, V], [2: E, F, H, I, L, N, P, Q] and [3: G, R, S, W, Y]. These were chosen by dynamic optimization of the maximum score on the BONSAI algorithm. That is, they are free from artificial bias outside the algorithm as occurs in other implementations of ID3-based elementary formal systems, in which the index clusters are predetermined by a limited set of attributes. In the case of amino acids, these attributes could include features such as hydrophobicity or structural similarities. This cluster list, chosen based on the optimization of rules generated to explain differences between positive and negative examples of HLA-A*0201 peptides epitopes, is interesting in that it does not reflect the well-characterized features of HLA binding preferences. It is well established, for example, that the binding anchor motif strongly prefers V or L residues at P2 and at P9 of HLA-A associated peptides. However, according to the index produced by this data set, L and V are in separate index clusters. Therefore, the grouping must reflect features other than differences in the ability to bind appropriately to MHC molecules by the peptides in the negative example set, whereas more subtle differences in affinity to MHC and/or the TCR may be involved in the differentiation between negative and positive sequences, thus generating this grouping. Alternatively, features unrelated to binding affinity may be driving this selection, such as structure or differences in the size of the pool of T cells which recognize certain structural motifs of the peptide. Indexing: A C D E F G H I K L M N P Q R S T V W Y 0 1 1 2 2 3 2 2 0 2 0 2 2 2 3 3 1 1 3 3 Decision Tree --------------------------- x33y [YES]*** NEG *** ( 27, 29) [NO ] x100y [YES]*** NEG *** ( 24, 24) [NO ] x103y [YES]*** NEG *** ( 2, 5) [NO ]*** POS *** ( 121, 11) Score ----------------------------------- POS : 121 / 174 = 69.540 % NEG : 58 / 69 = 84.058 % Max Score... 0.58454 Figure 1. Decision tree for T cell reactivity motif. A motif explaining 84.058% of negative cases was identified using the BONSAI program.

3.2 Identification of Adverse Motifs Affecting T cell Recognition 29 of 69 sequences negative for T cell reactivity did not contain the amino acid sequence [3,3]. Of the remaining sequences, 24 did not contain the motif [1,0,0]. This means that cumulatively, 76.8% of the negative peptides had the sequences [3,3] or [1,0,0]. Conversely, only 29.3% of the positive sequences contained either of these sequence motifs, indicating that the presence of these sequence combinations is inversely correlated with the potential for T cell reactivity. Furthermore, 121/174 (84%) of sequences positive for T cell reactivity did not possess the sequences [3,3] nor [1,0,3]. A peptide sequence that contains these adverse motif amino acid combinations is therefore much less likely to fall in the group of potential T cell epitopes than peptides without these sequences. Conversely, sequences that do not contain these motifs are more likely to elicit a T cell response. 4. Discussion While the present data set that the motif rules were derived from represent a limited number of peptides compared to the complete set of peptides that could theoretically associate with MHC class I molecules, the present results do indicate the existence of sequence characteristics that affect the probability of a given bound peptide being an epitope. Only extensive experimental data can validate these rules or elucidate the mechanisms underlying such T cell recognition preferences. Such validation could potentially be aided by examining T cell responses to large combinatorial peptide libraries. Nevertheless, rules which could limit the number of peptides that should be screened for potential immunogenicity would greatly aid the work of immunologists and who presently must either synthesize all potential target epitopes (which is not always feasible), or to make educated guesses as to which proteins might be likely targets for investigation and limit their investigation to suspicious proteins, such as oncogenes in the case of cancer or envelope proteins, in the case of viral immunity. These rules will also assist in the prioritization of screening even in cases where exhaustive screening is necessitated. Further work in our laboratory will focus on the validation of the preliminary findings with experimental data, the investigation of T cell preference motifs for other MHC molecules (including class II molecules) and the generation of larger and more inclusive MHC peptide reactivity data using combinatorial chemistry to generate large peptide libraries.

Acknowledgments This work has been supported in part by grants in aid from the Ministry of Education, Science, Sports and Culture, Japan. References 1. T. Sudo, N. Kamikawaji, A. Kimura, Y. Date, C.J. Savoie, H. Nakashima, E. Furuichi, S. Kuhara, and T. Sasazuki, Differences in MHC Class I self peptide repertoires among HLA-A2 subtypes. J. Immunol.: 155: 4749-4756, (1995) H.G. Rammensee, T. Friede, S. Stevanovic, MHC ligands and peptide motifs: first listing. Immunogenetics 41: 178-228 (1995) 3. T. Tana, N. Kamikawaji, C.J.Savoie, T. Sudo, Y. Kinoshita, T. Sasazuki, A HLA binding motif-aided peptide epitope library: A novel library design for the screening of HLA-DR4-restricted antigenic peptides recognized by CD4+ T cells. J. Human Genet., 43:14-21 (1998) 4. G.E. Meister, C.G.P. Roberts, J.A. Berzofsky, A.S. De Groot, Two novel T cell epitope prediction algorithms based on MHC-binding motifs; comparison of predicted and published epitopes from Mycobacterium tuberculosis and HIV protein sequences Vaccine, 13:581-591, (1995) 5. G.E. Meister, C.G.P. Roberts, B.T. Edelson, J.A. Berzofsky, A.S. De Groot in Vaccines 95, 219 226, Two novel MHC-binding motif-based T cell epitope prediction algorithms; prediction of epitopes for six Mycobacterium tuberculosis protein antigens (Cold Spring Harbor Laboratory Press, Plainview (1995) S. Arikawa, S. Kuhara, S. Miyano, A. Shinohara, T. Shinohara, A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains Proc. Twenty-fifth Hawaii International Conference on System Sciences, 675-684 (1992) 7. J.R. Quinlan, "Induction of decision trees", Machine Learning, 1:81-106 (1986) 8. V. Brusic, G. Rudy, L.C. Harrison, MHCPEP, a database of MHC-binding peptides: update 1997 Nucleic Acids Research, 26, Issue 01: January 1 (1998) 9. P.E. Utgoff, Incremental induction of decision tree Machine Learning 4:161-186 (1989) S. Arikawa, S. Kuhara, S. Miyano, Y. Mukouchi, A. Shinohara, T. Shinohara, A machine discovery from amino acid sequences by decision trees over regular patterns Proc. of the International Conference on Fifth Generation Computer Systems, 618-625 (1992)

11. S. Shimozono, A. Shinohara, T. Shinohara, S. Miyano, S. Kuhara, S. Arikawa, Knowledge acquisition from amino acids sequences by machine learning system BONSAI Trans. of Information Processing Society of Japan, Vol. 35:2009-2018 (1994)