NeuroBayes in and outside the Ivory Tower of High Energy Physics. Particle Physics Seminar, University Bonn, January 13, 2011

Similar documents
From Delphi. to Phi-T

NeuroBayes A modern analysis tool from High Energy Physics and its way as prognosis tool into business

Automatic Definition of Planning Target Volume in Computer-Assisted Radiotherapy

Error Detection based on neural signals

LHC Physics or How LHC experiments will use OSG resources

Neurons and neural networks II. Hopfield network

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

Experimental Design. Outline. Outline. A very simple experiment. Activation for movement versus rest

SUPPLEMENTARY INFORMATION In format provided by Javier DeFelipe et al. (MARCH 2013)

Introduction to Computational Neuroscience

Introduction to Computational Neuroscience

PSYCH-GA.2211/NEURL-GA.2201 Fall 2016 Mathematical Tools for Cognitive and Neural Science. Homework 5

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Chapter 1. Introduction

Lecture 3: Bayesian Networks 1

UNLOCKING VALUE WITH DATA SCIENCE BAYES APPROACH: MAKING DATA WORK HARDER

Model reconnaissance: discretization, naive Bayes and maximum-entropy. Sanne de Roever/ spdrnl

Evaluation: Scientific Studies. Title Text

Learning and Adaptive Behavior, Part II

Introduction to Bayesian Analysis 1

LHCb Outreach example Ben Couturier (CERN) on behalf of the LHCb Collaboration. Frontiers of Fundamental Physics 14 Marseille July 15th to 18th 2014

Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm

Spatiotemporal clustering of synchronized bursting events in neuronal networks

Flexible, High Performance Convolutional Neural Networks for Image Classification

Learning in neural networks

Application of Artificial Neural Networks in Classification of Autism Diagnosis Based on Gene Expression Signatures

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

ERA: Architectures for Inference

Sparse Coding in Sparse Winner Networks

Predicting Breast Cancer Survivability Rates

FUSE TECHNICAL REPORT

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Bayesian Joint Modelling of Benefit and Risk in Drug Development

IT S ALL IN YOUR MIND

Ch.20 Dynamic Cue Combination in Distributional Population Code Networks. Ka Yeon Kim Biopsychology

Outlier Analysis. Lijun Zhang

Artificial Neural Networks (Ref: Negnevitsky, M. Artificial Intelligence, Chapter 6)

Oscillatory Neural Network for Image Segmentation with Biased Competition for Attention

BACKPROPOGATION NEURAL NETWORK FOR PREDICTION OF HEART DISEASE

Artificial Intelligence Lecture 7

Artificial Neural Networks and Near Infrared Spectroscopy - A case study on protein content in whole wheat grain

Probability-Based Protein Identification for Post-Translational Modifications and Amino Acid Variants Using Peptide Mass Fingerprint Data

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Trip generation: comparison of neural networks and regression models

AP Psych - Stat 1 Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Bayesian and Frequentist Approaches

Reveal Relationships in Categorical Data

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

Ecological Statistics

PHARMACY BENEFITS MANAGER SELECTION FAQ FOR PRODUCERS

Reduce Tension by Making the Desired Choice Easier

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

A Biostatistics Applications Area in the Department of Mathematics for a PhD/MSPH Degree

J2.6 Imputation of missing data with nonlinear relationships

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Modeling of Hippocampal Behavior

Biology Partnership (A Teacher Quality Grant) Lesson Plan Construction Form

Speech recognition in noisy environments: A survey

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Run Time Tester Requirements Document

Cognitive Neuroscience History of Neural Networks in Artificial Intelligence The concept of neural network in artificial intelligence

Walgreen Co. Reports Second Quarter 2010 Earnings Per Diluted Share of 68 Cents; Results Include 2 Cents Per Diluted Share of Restructuring Costs

CRISP: Challenging the Standard Framework of Hippocampal Memory Function

Model-Based fmri Analysis. Will Alexander Dept. of Experimental Psychology Ghent University

Modeling Sentiment with Ridge Regression

Supplementary Figure 1. Nature Neuroscience: doi: /nn.4547

An Auditory System Modeling in Sound Source Localization

Classıfıcatıon of Dıabetes Dısease Usıng Backpropagatıon and Radıal Basıs Functıon Network

Analysis of in-vivo extracellular recordings. Ryan Morrill Bootcamp 9/10/2014

Bayesian hierarchical modelling

Prof. Greg Francis 7/31/15

Artificial Neural Networks

Multilayer Perceptron Neural Network Classification of Malignant Breast. Mass

IT Foundation Application Map

Bayesian Models for Combining Data Across Subjects and Studies in Predictive fmri Data Analysis

Business Statistics Probability

Evaluating Classifiers for Disease Gene Discovery

Consob Net Short Positions ( NSP ) Notification System. User Manual

Supplemental Information. Triangulating the Neural, Psychological, and Economic Bases of Guilt Aversion

A Brain Computer Interface System For Auto Piloting Wheelchair

Analysis and Interpretation of Data Part 1

An ECG Beat Classification Using Adaptive Neuro- Fuzzy Inference System

Predictive performance and discrimination in unbalanced classification

EECS 433 Statistical Pattern Recognition

HIV Modelling Consortium

NEXT. Top 12 Myths about OEE

International Journal of Research in Science and Technology. (IJRST) 2018, Vol. No. 8, Issue No. IV, Oct-Dec e-issn: , p-issn: X

Lesson 6 Learning II Anders Lyhne Christensen, D6.05, INTRODUCTION TO AUTONOMOUS MOBILE ROBOTS

Review Questions in Introductory Knowledge... 37

Clinical Examples as Non-uniform Learning and Testing Sets

Introduction to Statistical Data Analysis I

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Neural Coding. Computing and the Brain. How Is Information Coded in Networks of Spiking Neurons?

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

Chapter 3 Software Packages to Install How to Set Up Python Eclipse How to Set Up Eclipse... 42

Classification of EEG signals in an Object Recognition task

Field data reliability analysis of highly reliable item

cloglog link function to transform the (population) hazard probability into a continuous

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

Transcription:

NeuroBayes in and outside the Ivory Tower of High Energy Physics Particle Physics Seminar, University Bonn, January 13, 2011 Michael IEKP KCETA Karlsruhe Feindt, Institute of Technology KIT, KIT Scientific Advisor Phi-T GmbH KIT University of the State of Baden-Wuerttemberg and Michael National Feindt Research Center NeuroBayes of the Helmholtz Association in and ouside the ivory tower of HEP Seminar Univ. Bonn, Jan. 13, 2011 www.kit.edu

Agenda of this talk: What is NeuroBayes? Where does it come from? How to use NeuroBayes classification Robustness, speed, generalisation ability, ease-to-use NeuroBayes output is a Bayesian posterior probability How to find data-monte Carlo disagreements with NeuroBayes How to train NeuroBayes from data when no good MC model is available Examples of successful NeuroBayes applications in physics Full B reconstruction at B factories Examples from industry applications

History of NeuroBayes 1993-2003 M.F. & co-workers: experience with NN in DELPHI, development of many packages: ELEPHANT, MAMMOTH, BSAURUS etc. 1999 Invention of NeuroBayes algorithm 1997-now extensive use of NeuroBayes in CDF II 2000-2002 NeuroBayes - specialisation for economy at the University of Karlsruhe, supported by BMBF 2002: Phi-T GmbH founded, industrial projects, further developments 2008: Foundation of sub-company Phi-T products & services, 2. office in Hamburg 2008-now: extensive use of NeuroBayes in Belle 2010: LHCb decides to use NeuroBayes massively to optimise reconstruction code Phi-T owns exclusive rights for NeuroBayes Staff (currently 40) almost all physicists (mainly from HEP) Continuous further development of NeuroBayes

Successful in competition with other data-miningmethods World s largest students competion: Data-Mining-Cup 2005: Fraud detection in internet trading 2006: price prediction in ebay auctions 2007: coupon redemption prediction 2008: lottery customer behaviour prediction

Since 2009: new rules: only up to 2 teams per University and 2009 Task: Prognosis of the turnaround of 8 books in 2500 book stores Winner: Uni Karlsruhe II with help of NeuroBayes and 2010... Task: Winner: Optimisation of individual customer care measures in online shop Uni Karlsruhe II with help of NeuroBayes

NeuroBayes task 1: Classifications Classification: Binary targets: Each single outcome will be yes or no NeuroBayes output is the Bayesian posterior probability that answer is yes (given that inclusive rates are the same in training and test sample, otherwise simple transformation necessary). Examples: > This elementary particle is a K meson. > This jet is a b-jet. > This three-particle combination is a D+. > This event is real data and not Monte Carlo. > This neutral B-meson was a particle and not an antiparticle at production time. > Customer Meier will cancel his contract next year.

NeuroBayes task 2: Conditional probability densities Probability density for real valued targets: For each possible (real) value a probability (density) is given. From that all statistical quantities like mean value, median, mode, standard deviation, etc. can be deduced. Deviations from normal distribution, e.g. crash probability Expectation value Standard deviation volatility Mode Examples: > Energy of an elementary particle (e.g a semileptonically decaying B meson with missing neutrino) > Q value (invariant mass) of a decay > Lifetime of a decay > Phi-direction of an inclusively reconstructed B-meson in a jet. > Turnaround of an article next year (very important in industrial applications)

One way to construct a one dimensional test statistic from multidimensional input (a MVA-method): Neural networks Self learning procedures, copied from nature Frontal Lobe Motor Cortex Parietal Cortex Temporal Lobe Brain Stem Occipital Lobe Cerebellum

Neural networks The NeuroBayes classification core is based on a simple feed forward neural network. The information (the knowledge, the expertise) is coded in the connections between the neurons. Each neuron performs fuzzy decisions. A neural network can learn from examples. Human brain: about 100 billion ( 10 11 ) neurons about 100 trillion ( 10 14 ) connections NeuroBayes : 10 to few 100 neurons

Neural Network basic functions

Neural network transfer functions

NeuroBayes classifications Input Preprocessing NeuroBayes Teacher: Learning of complex relationships from existing data bases (e.g. Monte Carlo) NeuroBayes Expert: Prognosis for unknown data Significance control Postprocessing Output

How it works: training and application Historic or simulated data Data set a =... b =... c =...... t =! Actual (new real) data Expert system Probability that hypothesis is correct (classification) or probability density for variable t Data set a =... b =... c =...... t =?

Neural network training Backpropagation (Rumelhardt et al. 1986): Calculate gradient backwards by applying chain rule Optimise using gradient descent method. Step size??

Neural network training Difficulty: find global minimum of highly non-linear function in high (~ >100) dimensional space. Imagine task to find deepest valley in the Alps (just 2 dimensions) Easy to find the next local minimum... but globally......impossible! needs good preconditioning

NeuroBayes strengths: NeuroBayes is a very powerful algorithm excellent generisability (does not overtrain) robust always finds good solution even with erratic input data fast automatically select significant variables output interpretable as Bayesian a posteriori probability can train with weights and background subtraction NeuroBayes is easy to use Examples and documentation available Good default values for all options fast start! Direct interface to TMVA available Introduction into root planned

<phi-t> NeuroBayes > is based on 2nd generation neural network algorithms, Bayesian regularisation, optimised preprocessing with non-linear transformations and decorrelation of input variables and linear correlation to output. > learns extremely fast due to 2nd order BFGS methods and even faster with 0-iteration mode. > produces small expertise files. > is extremely robust against outliers in input data. > is immune against learning by heart statistical noise. > tells you if there is nothing relevant to be learned. > delivers sensible prognoses already with small statistics. > can handle weighted events, even negative weights. > has advanced boost and cross validation features. > is steadily further developed professionally.

Bayes Theorem P(T D) P(D T), but Likelihood Prior Posterior Evidence NeuroBayes internally uses Bayesian arguments for regularisation NeuroBayes automatically makes Bayesian posterior statements

Teacher code fragment (1) #include "NeuroBayesTeacher.hh //create NeuroBayes instance NeuroBayesTeacher* nb = NeuroBayesTeacher::Instance(); const int nvar = 14; //number of input variables nb->nb_def_node1(nvar+1); nb->nb_def_node2(nvar); nb->nb_def_node3(1); nb->nb_def_task("cla"); nb->nb_def_iter(10); // nodes in input layer // nodes in hidden layer // nodes in output layer // binominal classification // number of training iterations nb->setoutputfile("bsdspiksk_expert.nb"); nb->setrootfile("bsdspiksk_expert.root"); // expertise file name // histogram file name

Teacher code fragment (2) // in training event loop nb->setweight(1.0); //set weight of event // set Target nb->settarget(0.0) ; // set Target, this event is BACKGROUND, else set to 1. InputArray[0] = GetValue(back,"BsPi.Pt"); // define input variables InputArray[1] = TMath::Abs(GetValue(back,"Bs.D0"));... nb->setnextinput(nvar,inputarray); //end of event loop nb->trainnet(); //perform training Many options existing, but this simple code usually already gives very good results.

Expert code fragment #include "Expert.hh"... Expert* nb = new Expert("../train/BsDsPiKSK_expert.nb",-2);... InputArray[0] = GetValue(signal,"BsPi.Pt"); InputArray[1] = TMath::Abs(GetValue(signal,"Bs.D0"));... Netout = nb->nb_expert(inputarray);

input variables ordered by relevance (standard deviations of additional information)

NeuroBayes training output (analysis file) NeuroBayes output distribution red:signal black: background Signal purity S/(S+B) in bins of NeuroBayes output. If on diagonal, then P=2*NBout+1 is the probability that the event actually is signal. This proves that NB always is well calibrated in the training.

NeuroBayes training output (analysis file) Purity vs. signal efficiency plot for different NeuroBayes output cuts. Should be as much in upper right corner as possible. The lower curve comes from cutting the wrong way round. Signal efficiency vs. total efficiency when cutting at different NeuroBayes outputs (lift chart). The area between blue curve and diagonal should be large. Physical region: white Right diagonal: events randomly sorted, no individualisation. Left diagonal border: completely correctly sorted, first all signal events, then all bg. Gini index: classification quality measure, The larger, the better.

NeuroBayes training output (analysis file) Correlation matrix of input variables. 1.row/column: training target

NeuroBayes training output (analysis file) Most important input variable significance: 78 standard deviations Accepted for the training Probability integral transformed input variable distribution: signal, background ( this is a binary variable!) Signal purity as function of the input variable (this case: unordered classes) Mean 0, width 1 transformation of signal purity oftransformed input variable Purity-efficiency plot of this variable compared to that of complete NeuroBayes

NeuroBayes training output (analysis file) 2.most important input variable, alone 67 standard deviations. But added after most important var taken into account only 11 sigma. Probability integral transformed input variable distribution: signal, background ( this is a largely continuous variable!) Signal purity as function of the input variable (this case: spline fit) Mean 0, width 1 transformation of (fitted) signal purity of input variable Purity-efficiency plot of this variable compared to that of complete NeuroBayes

NeuroBayes training output (analysis file) 39. most important input variable, alone 17 standard deviations, but only 0.6 sigma added after more significant variables. Ignored for the training Probability integral transformed input variable distribution: signal, background For 3339 events this input was not available (delta-function) Signal purity as function of the input variable (this case: spline fit + delta) Mean 0, width 1 transformation of (fitted) signal purity of input variable Due to the preprocessing 94 the delta is mapped to 0, not to its purity.

NeuroBayes output is a linear measure of the Bayesian posterior signal probability: P T (S) = (NB +1)/2 Signal to background ratio in training set: r T = S T If the training was performed with different S/B than actually present in expert dataset, one can transform the signal probability: P E (S) = 1+ 1 1 P T (S) 1 r T r E B T, in expert set: r E = S E B E

Hunting data MC disagreements with NeuroBayes 1. Use data as signal, Monte Carlo as background. 2. Train NeuroBayes classification network. If MC model describes data well, nothing should be learned! 3. Look at most significant variables of this training. These give a hint where MC is not good. Could e.g. be in pt-spectrum or invariant mass spectrum (width). 4. Decide whether effects are due to physics modelling or detector resolution/efficiency. 5. Reweigh MC by w=(1+nbout)/(1-nbout) or produce a more realistic MC goto 1

Scenario: MC for signal available, but not for backgrounds Idea: take background from sidebands in data Check that network cannot learn mass (by training left sideband vs. right sideband: remove input variables until this net cannot learn anything more) Works well if data-mc agreement quite good

Scenario: Neither reliable signal nor background Monte Carlo available Idea: Training with background subtraction Signal: Peak region weight 1 Sideband region with weight -1 (statistical subtraction) Background: Sideband region with weight 1 works very well! also for Y(2S) and Y(3S)! Although just trained on Y(1S)

Example for data-only training (on 1. resonance)

NeuroBayes B s to J/ψ Φ selection without MC (2 stage background subtraction training process) all data soft cut on net 1, input to second NeuroBayes training soft preselection, input to first NeuroBayes training cut on net 2

Exploiting S/B information more efficiently : The splot-method Fit data signal and background in one distribution (e.g. mass). Compute splot weights w s for signal (may be <0 or >1) as function of mass from fit. Train NeuroBayes network with each event treated both as signal with signal weight w S and as background with weight 1-w S. Soft cut on output enriches S/B considerably: Make sure network cannot learn mass! (Paper in preparation)

More than 60 diploma and Ph.D. theses and many publications from experiments DELPHI, CDF II, AMS, CMS and Belle used NeuroBayes or predecessors very successfully. Also ATLAS and LHCb applications starting. Many of these can be found at www.neurobayes.de Talks about NeuroBayes and applications: www-ekp.physik.uni-karlsruhe.de/~feindt Forschung

Some NeuroBayes highlights: Bs oscillations Discovery of excited Bs states X(3872) properties Single top quark production discovery High mass Higgs exclusion

Just a few examples NeuroBayes soft electron identification for CDF II Thesis U. Kerzel: on basis of Soft Electron Collection (much more efficient than cut selection or JetNet with same inputs) - after clever preprocessing by hand and careful learning parameter choice this could also be as good as NeuroBayes

Just a few examples NeuroBayes selection

Just a few examples First observation of B_s1 and most precise of B_s2* Selection using NeuroBayes

Belle B-factory ran very successfully 2000-2010. KIT joined Belle Collaboration in 2008 and introduced NeuroBayes. Continuum subtraction Flavour tagging Particle ID S/B selection optim. Full B reconstruction NeuroBayes enhances efficiency of flavour tagging calibration reaction B-> D* l by 71% at same purity.

Physics at B factory (asymmetric e+e- collider at Y(4S) Y(4S) decays into 2 B mesons almost at rest in CMS. Decay products of 2 Bs not easily distinguishable. Many 1000 exclusive decay chains per B. Reconstruct as many as possible Bs exclusively (tag side). Then all other reconstructed particles belong to other B (signal side). And kinematics of signal side uniquely determined, allows missing mass reconstruction.

Hierarchical Full Reconstruction

Example D0 signals of very different purity with/without NB cut

Optimal combination of decay channels of very different purity using NeuroBayes outputs Precuts such that number of additional bg events per additional signal event is constant.

Full reconstruction of B mesons in 1042 decay chains. Hierarchical probabilistic reconstruction system with 71 NeuroBayes networks, fully automatic (NeuroBayes Factory) B+ efficiency increased by ~104 % at same (total) purity (corresponds to many years of additional data taking) using NeuroBayes classical algorithm

Alternatively one can make the sample cleaner, e.g. to the same background level: B+ efficiency increased by +88% at same background level. (Real data plots, about 8% of full data set) signal using NeuroBayes background signal (classical algorithm)

Alternatively one can make the sample much cleaner: e.g. at same signal efficiency as classical algorithm: B+ background suppression by factor of 17! background (classical algorithm) background using NeuroBayes signal

First application test on real data: Select B 0 D* + l ν on signal side, fully reco. B on tag side. Calculate missing mass squared on signal side. Peak at 0 from missing neutrino expected and seen. Efficiency more than doubled with new algorithm!

Flexibility: Working with NeuroBayes allows continuous choice of working point in purity-efficiency. NIM-paper in preparation.

customers & projects Very successful projects for: among others BGV and VKB car insurances Lupus Alpha Asset Management Otto Versand (mail order business) Thyssen Krupp (steel industry) AXA and Central health insurances dm drogerie markt (drugstore chain) Libri (book wholesale)... expanding

Individual risk prognoses for car insurances: Accident probability Cost probability distribution Large damage prognosis Contract cancellation prob. very successful at

Correlation among input variables, target color coded Ramler II-Plot 55

Contract cancellations in a large financial institute Real cancellation rate as function of cancellation rate predicted by NeuroBayes Very good performance within statistical errors

Near Future Turnaround Predictions for Chain Stores 1. Time series modelling 2. Correction and error estimate using NeuroBayes

Turnover prognosis for mail order business

Typical test results (always very successful) Colour codes: NeuroBayes better same worse than classical methods Trainings- Seasons Test-Seasons

Prognosis of individual health costs Pilot project for a large private health insurance Prognosis of costs in following year for each person insured with confidence intervals 4 years of training, test on following year Results: Probability density for each customer/tarif combination Kunde N. 00000 Mann, 44 Tarif XYZ123 seit ca. 17 Jahre Very good test results! Has potential for a real and objective cost reduction in health management

Prognosis of financial markets VDI-Nachrichten, 9.3.2007 NeuroBayes based risk averse market neutral fonds for institutional investors. Fully automatic trading (2007-2009 20 Mio, since 2010 130 Mio ) Lupus Alpha NeuroBayes Short Term Trading Fonds Test Test Börsenzeitung, 6.2.2008

Licenses NeuroBayes is commercial software. All rights belong to Phi-T GmbH. It is not open source. CERN, Fermilab, KEK have licenses for use in high energy physics research. Expert runs without license (can run in the grid!) License only needed for training networks. For purchasing additional teacher licenses (for computers outside CERN) please contact Phi-T. Bindings to many programming languages exist. Code generator for easy usage exists.

Prognosis of sports events from historical data: NeuroNetz er Results: Probabilities for home - tie - guest

Documentation Basics: M. Feindt, A Neural Bayesian Estimator for Conditional Probability Densities, E-preprint-archive physics 0402093 M. Feindt, U. Kerzel, The NeuroBayes Neural Network Package, NIM A 559(2006) 190 Web Sites: www.phi-t.de (Company web site, German & English) www-ekp.physik.uni-karlsruhe.de/~feindt (some NeuroBayes talks can be found here under -> Forschung) www.neurobayes.de (English site on physics results with NeuroBayes & all diploma and PhD theses using NeuroBayes, and discussion forum and FAQ for usage in physics.. Please use this and also post your results here! )

The <phi-t> mouse game: or: even your ``free will is predictable //www.phi-t.de/mousegame