Adaptive Load Balancing: A Study in Multi-Agent. Learning. Abstract

Similar documents
Going Below the Surface Level of a System This lesson plan is an overview of possible uses of the

REGRESSION ASSOCIATION VS. PREDICTION

Blind Estimation of Block Interleaver Parameters using Statistical Characteristics

Form. Tick the boxes below to indicate your change(s) of circumstance and complete the relevant sections of this form

AN ANALYSIS OF TELEPHONE MESSAGES: MINIMIZING UNPRODUCTIVE REPLAY TIME

Implementation of a planar coil of wires as a sinusgalvanometer. Analysis of the coil magnetic field

Reliability Demonstration Test Plan

How to Combine Expert (or Novice) Advice when Actions Impact the Environment?

EXPERIMENT 4 DETERMINATION OF ACCELERATION DUE TO GRAVITY AND NEWTON S SECOND LAW

Optimize Neural Network Controller Design Using Genetic Algorithm

National Assessment in Sweden. A multi-dimensional (ad)venture

DISCUSSION ON THE TIMEFRAME FOR THE ACHIEVEMENT OF PE14.

PHA Exam 1. Spring 2013

YOUR VIEWS ABOUT YOUR HIGH BLOOD PRESSURE

TWO REFERENCE japollo LUNAR PARKING - ORBITS / T. P. TIMER. (NASA CR OR rmx OR AD NUMBER) OCTOBER 1965 GODDARD SPACE FLIGHT CENTER

Probability, Genetics, and Games

Measuring Cache and TLB Performance and Their Effect on Benchmark Run Times

PRELIMINARY STUDY ON DISPLACEMENT-BASED DESIGN FOR SEISMIC RETROFIT OF EXISTING BUILDINGS USING TUNED MASS DAMPER

Design of a Low Noise Amplifier in 0.18µm SiGe BiCMOS Technology

Cattle Finishing Net Returns in 2017 A Bit Different from a Year Ago Michael Langemeier, Associate Director, Center for Commercial Agriculture

MATH 1300: Finite Mathematics EXAM 1 15 February 2017

Scheduling of Conditional Process Graphs for the Synthesis of Embedded Systems

Eugene Charniak and Eugene Santos Jr. Department of Computer Science Brown University Providence RI and

Difference in Characteristics of Self-Directed Learning Readiness in Students Participating in Learning Communities

Evaluation of Accuracy of U.S. DOT Rail-Highway Grade Crossing Accident Prediction Models

Car Taxes and CO 2 emissions in EU. Summary. Introduction. Author: Jørgen Jordal-Jørgensen, COWI

e/m apparatus (two similar, but non-identical ones, from different manufacturers; we call them A and B ) meter stick black cloth

APPLYING THE MIXED RASCH MODEL TO THE FRACTION CONCEPT OF PUPILS

AMIA 2009 Symposium Proceedings Page - 109

EXPERIMENTAL DRYING OF TOBACCO LEAVES

AGE DETERMINATION FROM RADIOLOGICAL STUDY OF EPIPHYSIAL APPEARANCE AND FUSION AROUND ELBOW JOINT *Dr. S.S. Bhise, **Dr. S. D.

Chapter 12 Student Lecture Notes 12-1

IBM Research Report. A Method of Calculating the Cost of Reducing the Risk Exposure of Non-compliant Process Instances

A Numerical Analysis of the Effect of Sampling of Alternatives in Discrete Choice Models

CUSTOMIZED INSTRUCTIONAL PEDAGOGY IN LEARNING PROGRAMMING PROPOSED MODEL

List 3 ways these pictures are the same, and three ways they are different.

Company registration number: ROI FRS 105 Demo Client UNAUDITED FINANCIAL STATEMENTS for the year ended 31 January 2018

Evaluation Of Logistic Regression In Classification Of Drug Data In Kwara State

Research into the effect of the treatment of the carpal tunnel syndrome with the Phystrac traction device

A Robust R-peak Detection Algorithm using Wavelet Packets

THEORY OF ACOUSTIC EMISSION FOR MICRO-CRACKS APPEARED UNDER THE SURFACE LAYER MACHINING BY COMPRESSED ABRASIVE

Emerging Subsea Networks

Fall 2005 Economics and Econonic Methods Prelim. (Shevchenko, Chair; Biddle, Choi, Iglesias, Martin) Econometrics: Part 4

A e C l /C d. S j X e. Z i

Machine Learning Approach to Identifying the Dataset Threshold for the Performance Estimators in Supervised Learning

Design and simulation of the microstrip antenna for 2.4 GHz HM remote control system Deng Qun 1,a,Zhang Weiqiang 2,b,Jiang Jintao 3,c

Combined use of calcipotriol solution (SOp.g/ ml) and Polytar liquid in scalp psoriasis.

Time Variation of Expected Returns on REITs: Implications for Market. Integration and the Financial Crisis

FEM Analysis of Welded Spherical Joints Stiffness Fan WANG a, Qin-Kai CHEN b, Qun WANG b, Ke-Wei ZHU b, Xing WANG a

Developments in the CBR at 1 January 2016

Developments in the CBR at 1 January 2014

Statistical Techniques For Comparing ACT-R Models of Cognitive Performance

elearning in the Organization Digital Media in Operational Education elearning BENCHMARKING Study 2015 siepmann media

Or-Light Efficiency and Tolerance New-generation intense and pulsed light system

A Practical System for Measuring Film Thickness. Means of Laser Interference with Laminar-Like Laser

2 Arrange the following angles in order from smallest to largest. A B C D E F. 3 List the pairs of angles which look to be the same size.

How Asset Maintenance Strategy Selection Affects Defect Elimination, Failure Prevention and Equipment Reliability

SCIENCE Student Book. 3rd Grade Unit 3

ON-LINE MONITORING AND FAULT DETECTION

Brushless DC motor speed control strategy of simulation research

Efficient MBS-FEM integration for structural dynamics

Impact of literacy status on Participation of Tribal Women in Panchayati Raj A case study of Nilgiri ITDA Block of Balasore district in Odisha.

Effective Subgrade Coefficients for Seismic Performance Assessment of Pile Foundations

PHA Case Study III (Answers)

ENCRYPTING OPTIMISATION TECHNIQUES WITH PARTIAL AUTHENTICATION

Dr She Lok, Dr David Greenberg, Barbara Gill, Andrew Murphy, Dr Linda McNamara

Company registration number: ROI FRS 105 Demo Client UNAUDITED FINANCIAL STATEMENTS for the year ended 31 December 2017

Catriona Crossan Health Economics Research Group (HERG), Brunel University

Bridge Maintenance Survey for Indiana Counties

An Empirical Analysis of Software Productivity

Comparison of lower-hybrid (LH) frequency spectra between at the high-field side (HFS) and low-field side (LFS) in Alcator C-Mod

Tests on a Single Phase Transformer

Identifying the Most Effective Model for Understanding the Growth Rate of Government e-transactions: Brown's Model of Exponential Smoothing

COURSES IN FOREIGN LANGUAGES for ERASMUS INCOMING STUDENTS. at Sofia University FACULTY OF CLASSICAL AND MODERN PHILOLOGY

Localization Performance of Real and Virtual Sound Sources

L4-L7 network services in shared network test plan

BYLAWS STAFF COUNCIL AT THE UNIVERSITY OF KENTUCKY. Section 2. The Council is not a forum for personal grievances or complaints.

Towards User-Adaptive Information Visualization

Approximate Dimension Equalization in Vector-based Information Retrieval

Improving the Surgical Ward Round.

NAMUR Choices of Wine Consumption Measure of Interaction Terms and Attributes

Input Techniques for Neural Networks in Stock Market Prediction Ensembles

Damage Model with Crack Localization Application to Historical Buildings

CARAT An Operational Approach to Risk Assessment Definitions, Processes, and Studies

Alternate Mount and Location for a Trolling Motor. Print in Landscape Mode with ¼ inch borders.

Volume 3, No.2, March - April 2014 International Journal of Advanced Trends in Computer Science and Engineering

Alternate Mount and Location for a Trolling Motor. Print in Landscape Mode with ¼ inch borders.

The Strengths and Limitations of the Statistical Modeling of Complex Social Phenomenon: Focusing on SEM, Path Analysis, or Multiple Regression Models

Sensitivity Analysis of the JPALS Shipboard Relative GPS Measurement Quality Monitor

Reliability of fovea palatinea in determining the posterior palatal seal

A multiple mediator model: Power analysis based on Monte Carlo simulation

New Methods for Modeling Reliability Using Degradation Data

VIRTUALLY PAINLESS DROSOPHILA GENETICS STANDARDS A, B, C B, C C, C

MUDRA PHYSICAL SCIENCES

Components Required: Small bread-board to build the circuit on( or just use clip leads directly) 2ea 220pF capacitors 1 ea 1nF 10uH inductor

UNCERTAINTY IN THE TAYLOR RULE AND MONETARY POLICY ASSESSMENT*

YGES Weekly Lesson Plan Template. Name: Kindergarten- ELA Date: Dec. 7-11, Monday Tuesday Wednesday Thursday Friday

Rudolf Huber GmbH ELECTROMAGNETIC TOOTH CLUTCHES

Short Summary on Materials Testing and Analysis

EVALUATION OF DIAGNOSTIC PERFORMANCE USING PARTIAL AREA UNDER THE ROC CURVE. Hua Ma. B.S. Sichuan Normal University, Chengdu, China, 2007

Transcription:

Journal of Articial Intllignc Rsarch 2 (1995) 475-500 Submittd 10/94; publishd 5/95 Adaptiv Load Balancing: A Study in Multi-Agnt Larning Andra Scharf ascharf@dis.uniroma1.it Dipartimnto di Informatica Sistmistica Univrsita di Roma \La Sapinza", Via Salaria 113, I-00198 Roma, Italy Yoav Shoham Robotics Laboratory, Computr Scinc Dpartmnt Stanford Univrsity, Stanford, CA 94305, USA Mosh Tnnnholtz Faculty of Industrial Enginring and Managmnt Tchnion, Haifa 32000, Isral shoham@flamingo.stanford.du mosht@i.tchnion.ac.il Abstract W study th procss of multi-agnt rinforcmnt larning in th contxt of load balancing in a distributd systm, without us of ithr cntral coordination or xplicit communication. W rst dn a prcis framwork in which to study adaptiv load balancing, important faturs of which ar its stochastic natur and th purly local information availabl to individual agnts. Givn this framwork, w show illuminating rsults on th intrplay btwn basic adaptiv bhavior paramtrs and thir ct on systm cincy. W thn invstigat th proprtis of adaptiv load balancing in htrognous populations, and addrss th issu of xploration vs. xploitation in that contxt. Finally, w show that naiv us of communication may not improv, and might vn harm systm cincy. 1. Introduction This articl invstigats multi-agnt rinforcmnt larning in th contxt of a concrt problm of undisputd importanc { load balancing. Ral lif provids us with many xampls of mrgnt, uncoordinatd load balancing: trac on altrnativ highways tnds to vn out ovr tim; mmbrs of th computr scinc dpartmnt tnd to us th most powrful of th ntworkd workstations, but vntually nd th lowr load on othr machins mor inviting; and so on. W would lik to undrstand th dynamics of such mrgnt load-balancing systms and apply th lsson to th dsign of multi-agnt systms. W dn a formal yt concrt framwork in which to study th issus, calld a multiagnt multi-rsourc stochastic systm, which involvs a st of agnts, a st of rsourcs, probabilistically changing rsourc capacitis, probabilistic assignmnt of nw jobs to agnts, and probabilistic job sizs. An agnt must slct a rsourc for ach nw job, and th cincy with which th rsourc handls th job dpnds on th capacity of th rsourc ovr th liftim of th job as wll as th numbr of othr jobs handld by th rsourc ovr that priod of tim. Our prformanc masur for th systm aims at globally optimizing th rsourc usag in th systm whil nsuring fairnss (that is, a systm shouldn't b mad cint at th xpns of any particular agnt), two common critria for load balancing. c1995 AI Accss Foundation and Morgan Kaufmann Publishrs. All rights rsrvd.

Scharf, Shoham, & Tnnnholtz How should an agnt choos an appropriat rsourc in ordr to optimiz ths masurs? Hr w mak an important assumption, in th spirit of rinforcmnt larning (Sutton, 1992): Th information availabl to th agnt is only its prior xprinc. In particular, th agnt dos not ncssarily know th past, prsnt, or futur capacitis of th rsourcs, 1 and is unawar of past, currnt, or futur jobs submittd by th various agnts, not vn th rlvant probability distributions. Th goal of ach agnt is thus to adapt its rsourcslction bhavior to th bhavior of th othr agnts as wll as to th changing capacitis of th rsourcs and to th changing load, without xplicitly knowing what thy ar. W ar intrstd in svral basic qustions: What ar good rsourc-slction ruls? How dos th fact that dirnt agnts may us dirnt rsourc-slction ruls act th systm bhavior? Can communication among agnts improv th systm cincy? In th following sctions w show illuminating answrs to ths qustions. Th contribution of this papr is thrfor twofold. W apply multi-agnt rinforcmnt larning to th domain of adaptiv load balancing and w us this basic domain in ordr to dmonstrat basic phnomna in multi-agnt rinforcmnt larning. Th structur of this papr is as follows. In Sction 2 w discuss our gnral stting. Th objctiv of this sction is to motivat our study and point to its impact. Th formal framwork is dnd and discussd in Sction 3. Sction 4 complts th discussion of this framwork by introducing th rsourc slction rul and its paramtrs, which function as th \control knobs" of th adaptiv procss. In Sction 5 w prsnt xprimntal rsults on adaptiv bhavior within our framwork and show how various paramtrs act th cincy of adaptiv bhavior. Th cas of htrognous populations is invstigatd in Sction 6, and th cas of communicating populations is discussd in Sction 7. In Sction 8 w discuss th impact of our rsults. In Sction 9 w put our work in th prspctiv of rlatd work. Finally, in Sction 10 w conclud with a brif summary. 2. Th Gnral Stting This papr applis rinforcmnt larning to th domain of adaptiv load balancing. Howvr, bfor prsnting th modl w us and our dtaild study, w nd to clarify svral points about our gnral stting. In particular, w nd to xplain th intrprtation of rinforcmnt larning and th intrprtation of load balancing w adopt. Much work has bn dvotd in th rcnt yars to distributd and adaptiv load balancing. On can nd rlatd work in th ld of distributd computr systms (.g., Pulidas, Towsly, & Stankovic, 1988; Mirchandany & Stankovic, 1986; Billard & Pasqual, 1993; Glocknr & Pasqual, 1993; Mirchandany, Towsly, & Stankovic, 1989; Zhou, 1988; Eagr, Lazowska, & Zahorjan, 1986), in organization thory and managmnt scinc (.g., Malon, 1. In many applications th capacitis of th rsourcs ar known, at last to som xtnt. This point will b discussd latr. Basically, in this papr w wish to invstigat how far on can go using only purly local fdback and without th us of any global information (Kalbling, 1993; Sutton, 1992). 476

Adaptiv Load Balancing: A Study in Multi-Agnt Larning 1987), and in distributd AI (.g., Bond & Gassr, 1988). Although som motivations of th abov-mntiond lins of rsarch ar similar, th sttings discussd hav som ssntial dirncs. Work on distributd computr systms adopts th viw of a st of computrs ach of which controls crtain rsourcs, has an autonomous dcision-making capability, and jobs arriv to it in a dynamic fashion. Th dcision-making agnts of th dirnt computrs (also calld nods) try to shar th systm load and coordinat thir activitis by mans of communication. Th actual action to b prformd, basd on th information rcivd from othr computrs, may b controlld in various ways. On of th ways adoptd to control th rlatd dcisions is through larning automata (Narndra & Thathachar, 1989). In th abov-mntiond work ach agnt is associatd with a st of rsourcs, whr both th agnt and th rlatd rsourcs ar associatd with a nod in th distributd systm. Much work in managmnt scinc and in distributd AI adopts a somwhat complmntary viw. In dirnc to classical work in distributd oprating systms, an agnt is not associatd with a st of rsourcs that it controls. Th agnts ar autonomous ntitis which ngotiat among thmslvs (Zlotkin & Rosnschin, 1993; Kraus & Wilknfld, 1991) on th us of shard rsourcs. Altrnativly, th agnts (calld managrs in this cas) may ngotiat th task to b xcutd with th procssors which may xcut it (Malon, 1987). Th modl w adopt has th avor of modls usd in distributd AI and organization thory. W assum a strict sparation btwn agnts and rsourcs. Jobs arriv to agnts who mak dcisions about whr to xcut thm. Th rsourcs ar passiv (i.., do not mak dcisions). A typical xampl of such a stting in a computrizd framwork is a st of PCs, ach of which is controlld by a dirnt usr and submits jobs to b xcutd on on of svral workstations. Th workstations ar assumd to b indpndnt of ach othr and shard among all th usrs. Th abov xampl is a ral-lif situation which motivatd our study and th trminology w adopt is takn from such a framwork. Howvr, thr ar othr ral-lif situations rlatd to our modl in aras dirnt from classical distributd computr systms. A canonical problm rlatd to our modl is th following on (Arthur, 1994): An agnt, mbddd in a multi-agnt systm, has to slct among a st of bars (or a st of rstaurants). Each agnt maks an autonomous dcision but th prformanc of th bar (and thrfor of th agnts that us it) is a function of its capacity and of th numbr of agnts that us it. Th dcision of going to a bar is a stochastic procss but th dcision of which bar to us is an autonomous dcision of th rspctiv agnt. A similar situation ariss whn a product managr dcids which procssor to us in ordr to prform a particular task. Th modl w prsnt in Sction 3 is a gnral modl whr such situations can b invstigatd. In ths situations a job arrivs to an agnt (rathr than to a nod consisting of particular rsourcs) who dcids upon th rsourc (.g., rstaurant) whr his job should b xcutd; thr is a-priori no association btwn agnts and rsourcs. W now discuss th way th agnts bhav in such a framwork. Th common thm among th abov-mntiond lins of rsarch is that load-balancing is achivd by mans of communication among activ agnts or activ rsourcs (through th rlatd dcisionmaking agnts). In our study w adopt a complmntary viw. W considr agnts who act in a purly local fashion, basd on purly local information as dscribd in th rcnt rinforcmnt larning litratur. As w mntiond, larning automata wr usd in th 477

Scharf, Shoham, & Tnnnholtz ld of distributd computr systms in ordr to prform adaptiv load balancing. Nvrthlss, th rlatd larning procdurs rly havily on communication among agnts (or among dcision-making agnts of autonomous computrs). Our work applis rcnt work on rinforcmnt larning in AI whr th information th agnt gts is purly local. Hnc, an agnt will know how cint th srvic in a rstaurant has bn only by choosing it as a plac to at. W don't assum that agnts may b informd by othr agnts about th load in othr rstaurants or that th rstaurants will announc thir currnt load. This maks our work strictly dirnt from othr work applying rinforcmnt larning to adaptiv load balancing. Th abov faturs mak our modl and study both basic and gnral. Morovr, th abov discussion raiss th qustion of whthr rinforcmnt larning (basd on purly local information and fdback) can guarant usful load balancing. Th combination of th modl w us and our prspctiv on rinforcmnt larning maks our contribution novl. Nvrthlss, as w mntiond abov (and as w discuss in Sction 9) th modl w us is not original to us and capturs many known problms and situations in distributd load balancing. W apply rinforcmnt larning, as discussd in th rcnt AI litratur, to that modl and invstigat th proprtis of th rlatd procss. 3. Th Multi-Agnt Multi-Rsourc Stochastic Systm In this sction w dn th concrt framwork in which w study dynamic load balancing. Th modl w prsnt capturs adaptiv load balancing in th gnral stting mntiond in Sction 2. W rstrict th discussion to discrt, synchronous systms (and thus th dnition blow will rfr to N, th natural numbrs); similar dnitions ar possibl in th continuous cas. W concntrat on th cas whr a job can b xcutd using any of th rsourcs. Although somwhat rstricting, this is a common practic in much work in distributd systms (Mirchandany & Stankovic, 1986). Dnition 3.1 A multi-agnt multi-rsourc stochastic systm is a 6-tupl ha; R; P; D; C; SRi, whr A = fa 1 ; : : :; a N g is a st of agnts, R = fr 1 ; : : :; r M g is a st of rsourcs, P : A N! [0; 1] is a job submission function, D : A N! < is a probabilistic job siz function, C : R N! < is a probabilistic capacity function, and SR is a rsourc-slction rul. Th intuitiv intrprtation of th systm is as follows. Each of th rsourcs has a crtain capacity, which is a ral numbr; this capacity changs ovr tim, as dtrmind by th function C. At ach tim point ach agnt is ithr idl or ngagd. If it is idl, it may submit a nw job with probability givn by P. Each job has a crtain siz which is also a ral numbr. Th siz of any submittd job is dtrmind by th function D. (W will us th unit tokn whr rfrring to job sizs and rsourc capacitis, but w do not man that tokns com only in intgr quantitis.) For ach nw job th agnt slcts on of th rsourcs. This choic is mad according to th rul SR; sinc thr is much to say about this rul, w discuss it sparatly in th nxt sction. In our modl, any job may run on any rsourc. Furthrmor, thr is no limit on th numbr of jobs srvd simultanously by a givn rsourc (and thus no quuing occurs). Howvr, th quality of th srvic providd by a rsourc at a givn tim dtriorats with 478

Adaptiv Load Balancing: A Study in Multi-Agnt Larning th numbr of agnts using it at that tim. Spcically, at vry tim point th rsourc distributs its currnt capacity (i.., its tokns) qually among th jobs bing srvd by it. Th siz of ach job is rducd by this amount and, if it drops to (or blow) zro, th job is compltd, th agnt is notid of this, and bcoms idl again. Thus, th xcution tim of a job j dpnds on its siz, on th capacity ovr tim of th rsourc procssing it, and on th numbr of othr agnts using that rsourc during th xcution of j. Our masur of th systm's prformanc will b twofold: W aim to minimiz timpr-tokn, avragd ovr all jobs, as wll as to minimiz th standard dviation of this random variabl. Minimizing both quantitis will nsur ovrall systm cincy as wll as fairnss. Th qustion is which slction ruls yild cint bhavior; so w turn nxt to th dnition of ths ruls. 4. Adaptiv Rsourc-Slction Ruls Th rul by which agnts slct a rsourc for a nw job, th slction rul (SR), is th hart of our adaptiv schm and th topic of this sction. Throughout this sction and th following on w mak an assumption of homognity. Namly, w assum that all th agnts us th sam SR. Notic that although th systm is homognous, ach agnt will act basd only on its local information. In Sctions 6 and 7 w rlax th homognity assumption and discuss htrognous and communicating populations. As w hav alrady mphasizd, among all possibl adaptiv SRs w ar intrstd in purly local SRs, ons that hav accss only to th xprinc of th particular agnt. In our stting this xprinc consists of rsults of prvious job submissions; for ach job submittd by th agnt and alrady compltd, th agnt knows th nam r of th rsourc usd, th point in tim, t start, th job startd, th point in tim, t stop, th job was nishd, and th job siz S. Thrfor, th input to th SR is, in principl, a list of lmnts in th form (r; t start ; t stop ; S). Notic that this typ of input capturs th gnral typ of systms w ar intrstd in. Basically, w wish to assum as littl as possibl about th information availabl to an agnt in ordr to captur ral loosly-coupld systms whr mor global information is unavailabl. Whnvr agnt i slcts a rsourc for its job xcution, i may gt its fdback aftr non-ngligibl tim, whr this fdback may dpnd on dcisions mad by othr agnts bfor and aftr agnt i's dcision. This forcs th agnt to rly on a non-trivial portion of its history and maks th problm much hardr. Thr ar uncountably many possibl adaptiv SRs and our aim is not to gain xhaustiv undrstanding of thm. Rathr, w hav xprimntd with a family of intuitiv and rlativly simpl SRs and hav compard thm with som non-adaptiv ons. Th motivation for choosing our particular family of SRs is partially du to obsrvations mad by cognitiv psychologists on how popl tnd to bhav in multi-agnt stochastic and rcurrnt situations. In principl, our st of SRs capturs th two most robust aspcts of ths obsrvations: \Th law of ct" (Thronkid, 1898) and th \Powr law of practic" (Blackburn, 1936). In our family of ruls, calld, which partially rsmbls th larning ruls discussd in th larning automata litratur (Narndra & Thathachar, 1989), and partially rsmbls th intrval stimation algorithm (Kalbling, 1993), agnts do not maintain complt history of thir xprinc. Instad, ach agnt, A, condnss this history into 479

Scharf, Shoham, & Tnnnholtz a vctor, calld th cincy stimator, and dnotd by A. Th lngth of this vctor is th numbr of rsourcs, and th i'th ntry in th vctor rprsnts th agnt's valuation of th currnt cincy of rsourc i (spcically, A (R) is a positiv ral numbr). This vctor can b sn as th stat of a larning automaton. In addition to A, agnt A kps a vctor jd A, which stors th numbr of compltd jobs which wr submittd by agnt A to ach of th rsourcs, sinc th bginning of tim. Thus, within, w nd only spcify two lmnts: 1. How agnt A updats A whn a job is compltd 2. How agnt A slcts a rsourc for a nw job, givn A and jd A Loosly spaking, A will b maintaind as a wightd sum of th nw fdback and th prvious valu of A, and th rsourc slctd will most probably b th on with highst A ntry xcpt that with low probability som othr rsourc will b chosn. Ths two stps ar xplaind mor prcisly in th following two subsctions. 4.1 Updating th Ecincy Estimator W tak th function updating A to b A (R) := W T + (1 W ) A (R) whr T rprsnts th tim-pr-tokn of th nwly compltd job and is computd from th fdback (R; t start ; t stop ; S) in th following way: 2 T = (t stop t start )=S W tak W to b a ral valu in th intrval [0; 1], whos actual valu dpnds on jd A (R). This mans that w tak a wightd avrag btwn th nw fdback valu and th old valu of th cincy stimator, whr W dtrmins th wights givn to ths pics of information. Th valu of W is obtaind from th following function: W = w + (1 w)=jd A (R) In th abov formula w is a ral-valud constant. Th trm (1 w)=jd A (R) is a corrcting factor, which has a major ct only whn jd A (R) is low; whn jd A (R) incrass, raching a valu of svral hundrds, this trm bcoms ngligibl with rspct to w. 4.2 Slcting th Rsourc Th scond ingrdint of adaptiv SRs in is a function pd A slcting th rsourc for a nw job basd on A and jd A. This function is probabilistic. W rst dn th following function ( pd 0 A (R) A(R) := n if jd A (R) > 0 E[ A ] n if jd A (R) = 0 2. Using paralll procssing trminology, T can b viwd as a strtch factor, which quantis th strtching of a program's procssing tim du to multiprogramming (Frrari, Srazzi, & Zignr, 1983). 480

Adaptiv Load Balancing: A Study in Multi-Agnt Larning whr n is a positiv ral-valud paramtr and E[ A ] rprsnts th avrag of th valus of A (R) ovr all rsourcs satisfying jd A (R) > 0. To turn this into a probability function, w dn th pd A as th normalizd vrsion of pd 0 A : pd A (R) := pd 0 A (R)= whr = R pd 0 (R) is a normalization factor.3 A Th function pd A clarly biass th slction towards rsourcs that hav prformd wll in th past. Th strngth of th bias dpnds on n; th largr th valu of n, th strongr th bias. In xtrm cass, whr th valu of n is vry high (.g., 20), th agnt will always choos th rsourc with th bst rcord. This stratgy of \always choosing th bst", although prhaps intuitivly appaling, is in gnral not a good on; it dos not allow th agnt to xploit improvmnts in th capacity or load on othr rsourcs. W discuss this SR in th following subsction, and xpand on th issu of xploration vrsus xploitation in Sctions 6 and 7. To summariz, w hav dnd a gnral stting in which to invstigat mrgnt load balancing. In particular, w hav dnd a family of adaptiv rsourc-slction ruls, paramtrizd by a pair (w; n). Ths paramtrs srv as knobs with which w tun th systm so as to optimiz its prformanc. In th nxt sction w turn to xprimntal rsults obtaind with this systm. 4.3 Th Bst Choic SR (BCSR) Th Bst Choic SR (BCSR) is a larning rul that assums a high valu of n, i., which always chooss th bst rsourc in a givn point. W will assum w is xd to a givn valu whil discussing BCSR. In our prvious work (Shoham & Tnnnholtz, 1992, 1994), w showd that larning ruls that strongly rsmbl BCSR ar usful for svral natural multi-agnt larning sttings. This suggsts that w nd to carfully study it in th cas of adaptiv load balancing. As w will dmonstrat, BCSR is not always usful in th load balancing stting. Th dirnc btwn BCSR and a larning rul whr th valu of n is low, is that in th lattr cas th agnt givs rlativly high probability for th slction of a rsourc that didn't giv th bst rsults in th past. In that cas th agnt might b abl to notic that th bhavior of on of th rsourcs has bn improvd du to changs in th systm. Not that th xploration of \non-bst" rsourcs is crucial whn th dynamics of th systm includs changs in th capacitis of th rsourcs. In such cass, th agnt could not tak advantag of possibl incrass in th capacity of rsourcs if it uss th BCSR. On might wondr, howvr, whthr in cass whr th main dynamic changs of th systm stm from load changs, rlying on BCSR is sucint. If th lattr is tru, w will b abl to ignor th paramtr n and to concntrat only on th BCSR, in systms whr th capacity of rsourcs is xd. In ordr to clarify this point, w considr th following xampl. 3. If for all R w hav jda(r) = 0, (i.., if th agnt is going to submit its vry rst job), thn w assum th agnt chooss a rsourc randomly (with a uniform probability distribution). 481

Scharf, Shoham, & Tnnnholtz Suppos thr ar only two rsourcs, R 1 and R 2, whos rspctiv (xd) capacitis, c R1 and c R2, satisfy th quality c R1 = 2c R2. Assum now that th load of th systm varis btwn a crtain low valu and a crtain high on. If th systm's load is low and th agnts adopt BCSR, thn th systm will volv in a way whr almost all of th agnts would b prfrring R 1 to R 2. This is du to th fact that, in th cas of low load, thr ar only fw ovrlaps of jobs, hnc R 1 is much mor cint. On th othr hand, whn th systm's load is high, R 1 could b vry busy and som of th agnts would thn prfr R 2, sinc th prformanc obtaind using th lss crowdd rsourc R 2 could b bttr than th on obtaind using th ovrly crowdd rsourc R 1. In th xtrm cas of a vry high load, w xpct th agnts to us R 2 on third of th tim. Assum now that th load of th systm starts from a low lvl, thn incrass to a high valu, and thn dcrass to rach its original valu. Whn th load incrass, th agnts, that wr mostly using R 1, will start obsrving that R 1 's prformanc is bcoming wors and, thrfor, following th BCSR thy will start using R 2 too. Now, whn th load dcrass, th agnts which wr using R 2 will obsrv an improvmnt in th prformanc of R 2, but th valu thy hav stord for R 1 (i.., A (1)), will still rct th prvious situation. Hnc, th agnts will kp on using R 2, ignoring th possibility of obtaining much bttr rsults if thy movd back to R 1. In this situation, th randomizd slction maks th agnts abl to us R 1 (with a crtain probability) and thrfor som of thm may discovr that th prformanc of R 1 is bttr than that of R 2 and switch back to R 1. This will improv th systm's cincy in a signicant mannr. Th abov xampl shows that th BCSR is, in th gnral cas, not a good choic. This is in gnral tru whn th valu of n is too high. In th abov discussion w hav assumd that th changs in th load ar unforsn. If w ar abl to prdict th changs in th load, th agnts can simply us th BCSR whil th load is xd and thn us a low valu of n during th changs. In our cas, instad, without vn ralizing that th systm has changd in som way, th agnts would nd to (and, as w will s, would b abl to) adapt to dynamic changs as wll as to ach othr. 5. Exprimntal Rsults In this sction w compar SRs in to ach anothr, as wll as to som non-adaptiv, bnchmark slction ruls. Th non-adaptiv SRs w considr in this papr ar thos in which th agnts partition thmslvs according to th capacitis and th load of th systm in a xd prdtrmind mannr and ach agnt uss always th sam rsourc. Latr in th papr, a SR of this kind is idntid by a conguration vctor, which spcis, for ach rsourc, how many agnts us it. Whn w tst our adaptiv SRs, w compar th prformanc against th nonadaptiv SRs that prform bst on th particular problm. This crats a highly comptitiv st of bnchmarks for our adaptiv SRs. In addition, w compar our adaptiv SRs to th load-qurying SR which is dnd as follows: Each agnt, whn it has a nw job, asks all th rsourcs how busy thy ar and always chooss th lss crowdd on. 482

Adaptiv Load Balancing: A Study in Multi-Agnt Larning 5.1 An Exprimntal Stting W now introduc a particular xprimntal stting, in which many of th rsults dscribd blow wr obtaind. W prsnt it in ordr to b concrt about th xprimnts; howvr, th qualitativ rsults of our xprimnts wr obsrvd in a varity of othr xprimntal sttings. On motivation of our particular stting stms from th PCs and workstations problm mntiond in Sction 2. For xampl, part of our study is rlatd to a st of computrs locatd at a singl sit. Ths computrs hav rlativly high load with som pak hours during th day and a low load at night (i.., th chancs a usr of a PC submits a job is highr during th day tim of th wk days than at night and on wknd). Anothr part of our study is rlatd to a st of computrs split all around th world, whr th load has quit random structur (i.., du to dirnc in tim zons, usrs may us PCs in unprdictabl hours). Anothr motivation of our particular stting stms from th rstaurant problm mntiond in Sction 2 (for discussion on th rlatd \bar problm" s Arthur, 1994). For xampl, w can considr a st of snack bars locatd at an industrial park. Ths snack bars hav rlativly high loads with som pak hours during th day and low load at night (i.., th chancs an mploy will choos to go to a snack-bar is highr during th day bcaus thr ar mor mploys prsnt during th day). Convrsly, w can assum a st of bars nar an airport whr th load has quit random structur (i.., th airport mploys may lik to us ths snack-bars in quit unprdictd hours). Although ths ar particular ral-situations, w would lik to mphasiz th gnral motivation of our study and th fact that th rlatd phnomna hav bn obsrvd in various dirnt sttings. W tak N, th numbr of agnts, to b 100, and M, th numbr of rsourcs, to b 5. In th rst st of xprimnts w tak th capacitis of th rsourcs to b xd. In particular, w tak thm to b c 1 = 40; c 2 = 20; c 3 = 20; c 4 = 10; c 5 = 10. W assum that all agnts hav th sam probability of submitting a nw job. W also assum that all agnts hav th sam distribution ovr th siz of jobs thy submit; spcically, w assum it to b a uniform distribution ovr th intgrs in th rang [50,150]. For as of xposition, w will assum that ach point in tim corrsponds to a scond, and w consquntly count th tim in minuts, hours, days, and wks. Th hour is our main point of rfrnc; w assum, for simplicity, that th changs in th systm (i.., load chang and capacity chang) happn only at th bginning of a nw hour. Th probability of submitting a job at ach scond, which corrsponds to th load of th systm, can vary ovr tim; this is th crucial factor to which th agnts must adapt. Not that agnts can submit jobs at any scond, but th probability of such submission may chang. In particular w concntrat on thr dirnt valus of this quantity, calld L lo ; L hi and L pak, and w assum that th systm load switchs btwn thos valus. Th actual valus of L lo ; L hi and L pak in th following quantitativ rsults ar 0:1%, 0:3% and 1%, which roughly corrspond to ach agnt submitting 3.6, 10.8, and 36 jobs pr hour (pr agnt) rspctivly. 483

Scharf, Shoham, & Tnnnholtz load conguration tim-pr-tokn L lo f100; 0; 0; 0; 0g 38.935 L hi f66; 16; 16; 1; 1g 60.768 L pak f40; 20; 20; 10; 10g 196.908 Figur 1: Bst non-adaptiv SRs for xd load In th following, whn masuring succss, w will rfr only to th avrag tim-prtokn. 4 Howvr, th adaptiv SRs that giv th bst avrag tim-pr-tokn wr also found to b fair. 5.2 Fixd Load W start with th cas in which th load is xd. This cas is not th most intrsting for adaptiv bhavior; howvr, a satisfactory SR should show rasonably cint bhavior in that basic cas, in ordr to b usful whn th systm stabilizs. W start by showing th bhavior of non-adaptiv bnchmark SRs in th cas of xd load. 5 Figur 1 shows thos that giv th bst rsults, for ach of th thr loads. As w can s, thr is a big dirnc btwn th thr loads mntiond abov. Whn th load is particularly high, th agnts should scattr around all th rsourcs at a rat proportional to thir capacitis; whn th load is low thy should all us th bst rsourc. Givn th abov, it is asy to s that an adaptiv SR can b ctiv only if it nabls moving quickly from on conguration to th othr. In a static stting such as this, w can xpct th bst non-adaptiv SRs to prform bttr than adaptiv ons, sinc th information gaind by th xploration of th adaptiv SRs can b built-in in th non-adaptiv ons. Th xprimntal rsults conrm this intuition, as shown in Figur 2 for L hi. Th gur shows th prformanc obtaind by th population whn th valu of n varis btwn 2 to 10 and for thr valus of w: 0.1, 0.3, and 0.5. Not that for th valus of (n; w) that ar good choics in th dynamic cass (s latr in th papr, valus in th intrvals [3; 5] and [0:1; 0:5], rspctivly), th dtrioration in th prformanc of th adaptiv SRs with rspct to th non-adaptiv ons is small. This is an ncouraging rsult, sinc adaptiv SRs ar mant to b particularly suitabl for dynamic systms. In th following subsctions w s that indd thy ar. 5.3 Changing Load W now bgin to xplor mor dynamic sttings. Hr w considr th cas in which th load on th systm (that is, th probability of agnts submitting a job at any tim) changs ovr tim. In this papr w prsnt two dynamic sttings: On in which th load changs according to a xd pattrn with only a fw random prturbations and anothr in which th load varis in som random fashion. Spcically, in th rst cas w x th load to b L hi 4. In th data shown latr w rfr, for convninc, to th tim for 1000 tokns. 5. Th non-adaptiv SRs ar human-dsignd SRs that ar usd as bnchmarks; thy assum knowldg of th load and capacity, which is not availabl for th adaptiv SRs w dsign. 484

Adaptiv Load Balancing: A Study in Multi-Agnt Larning A v r a g 67 66 65 6 Wight: w = 0.5 Wight: w = 0.3 Wight: w = 0.1 T i m p r 64 63 T ok n 62 61 2 3 4 5 6 7 8 9 10 - Exponnt of th Randomization Function: n Figur 2: Prformanc of th adaptiv Slction Ruls for xd load for tn conscutiv hours, for v days a wk, with two randomly chosn hours in which it is L pak, and to b L lo for th rst of th wk. In th scond cas, w x th numbr of hours in a wk for ach load as in th rst cas, and w distribut thm compltly randomly in a wk. Th rsults obtaind for th two cass ar similar. Figur 3 shows th rsults obtaind by th adaptiv SRs in th cas of random load. Th bst non-adaptiv dtrministic SR givs th tim-pr-tokn valu of 69:201 obtaind with th conguration (partition of agnts) f52; 22; 22; 2; 2g; th adaptiv SRs ar suprior. Th load-qurying SR instad gts th tim-pr-tokn valu of 48:116, which is obviously bttr, but is not so far from th prformancs of th adaptiv SRs. W also obsrv th following phnomnon: Givn a xd n (rsp. a xd w) th avrag tim-pr-tokn is non-monotonic in w (rsp. in n). This phnomnon is strongly rlatd to th issu of xploration vrsus xploitation mntiond bfor and to phnomna obsrvd in th study of Q-larning (Watkins, 1989). W also notic how th two paramtrs n and w intrplay. In fact, for ach valu of w th minimum of th tim pr tokn valu is obtaind with a dirnt valu of n. Mor prcisly, th highr w is th lowr n must b in ordr to obtain th bst rsults. This mans that, in ordr to obtain high prformanc, highly xploratory activity (low n) should b matchd with giving gratr wight to th mor rcnt xprinc (high w). This \paramtr 485

Scharf, Shoham, & Tnnnholtz 6 A v r a g 71 70 69 Wight: w = 0.5 Wight: w = 0.3 Wight: w = 0.1 T i m p r T ok n 68 67 66 65 2 3 4 5 6 7 8 9 10 - Exponnt of th Randomization Function: n Figur 3: Prformanc of th adaptiv Slction Ruls for random load matching" can b intuitivly xplaind in th following qualitativ way: Th xploration activity pays bcaus it allows th agnt to dtct changs in th systm. Howvr, it is mor ctiv if, whn a chang is dtctd, it can signicantly act th cincy stimator (i.., if w is high). Othrwis, th cost of th xploration activity is gratr than its gain. 5.4 Changing Capacitis W now considr th cas in which th capacity of th rsourcs can vary ovr tim. In particular, w will dmonstrat our rsults in th cas of th prviously mntiond stting. W will assum th capacitis rotat randomly among th rsourcs and, in v conscutiv days, ach rsourc gts th capacity of 40 for on day, 20 for 2 days, and 10 for th othr 2 days. 6 Th load also varis randomly. Th rsults of this xprimnt ar shown in Figur 4. Th bst non-adaptiv SR in this cas givs th tim-pr-tokn valu of 118:561 obtaind with th conguration f20; 20; 20; 20; 20g. 7 Th adaptiv SRs giv much bttr rsults, which ar only slightly 6. Usually th capacitis will chang in a lss dramatic fashion. W us th abov-mntiond stting in ordr to dmonstrat th applicability of our approach undr svr conditions. 7. Th load-qurying SR givs th sam rsults as in th cas of xd capacitis, bcaus such SR is obviously not inuncd by th chang. 486

Adaptiv Load Balancing: A Study in Multi-Agnt Larning A v r a g T i m p r T ok n 92.5 90 87.5 85 82.5 80 77.5 6 Wight: w = 0.5 Wight: w = 0.3 Wight: w = 0.1 2 3 4 5 6 7 8 9 10 Exponnt of th Randomization Function: n - Figur 4: Prformanc of th adaptiv Slction Ruls for changing capacitis wors than in th cas of xd capacitis. Th phnomna w mntiond bfor ar visibl in this cas too. S for xampl how a wight of 0:1 mismatchs with th low valus of n. 6. Htrognous Populations Throughout th prvious sction w hav assumd that all th agnts us th sam SR, i.. Homognity Assumption. Such assumption modls th situation in which thr is a sort of cntralizd o-lin controllr which, in th bginning, tlls th agnts how to bhav and thn lavs th agnts to mak thir own dcisions. Th situation dscribd abov is vry dirnt from having an on-lin cntralizd controllr which maks vry dcision. Howvr, w would lik now to mov vn furthr from that and invstigat th situation in which ach agnt is abl to mak its own dcision about which stratgy to us and, mayb, adjust it ovr tim. As a stp toward th study of systms of this kind, w drop th Homognity Assumption and considr th situation in which part of th population uss on SR and th othr part uss a scond on. In th rst st of xprimnts, w considr th stting discussd in Subsction 5.1 and w confront on with th othr, two populations (calld 1 and 2) of th sam siz (50 agnts ach). Each population uss a dirnt SR in. Th SR of population i (for i = 1; 2) will 487

Scharf, Shoham, & Tnnnholtz 6 A v r a g T i m p r 67 66 65 64 63 : T 1 : T 2 T ok 62 n 61 2 3 4 5 6 7 8 9 10 - Exponnt of th Randomization Function (n 2 ) Figur 5: Prformanc of 2 populations of 50 agnts with n 1 = 4 and w 1 = w 2 = 0:3 b dtrmind by th pair of paramtrs (w i ; n i ). Th masur of succss of population i will b dnd as th avrag tim-pr-tokn of its mmbrs, and will b dnotd by T i. Figur 5 shows th rsult obtaind for w 1 = w 2 = 0:3, and n 1 = 4, and for dirnt valus of n 2, in th cas of randomly varying load. Our rsults xpos th following phnomnon: Th two populations obtain dirnt outcoms from th ons thy obtain in th homognous cas. Mor spcically, for 4 n 2 6, th rsults obtaind by th agnts which us n 2 ar gnrally bttr than th rsults obtaind by th ons which us n 1, dspit th fact that an homognous population which uss n 1 gts bttr rsults than an homognous population which uss n 2. Th phnomnon dscribd abov has th following intuitiv xplanation. For n 2 in th abov-mntiond rang, th population which uss n 2 is lss \xploring" (i.., mor \xploiting") than th othr on, and whn it is lft on its own it might not b abl to adapt to th changs in a satisfactory mannr. Howvr, whn it is joind with th othr population, it gts th advantags of th xprimntal activity of agnts in that population, without paying for it. In fact, th mor xploring agnts, in trying to unload th most crowdd rsourcs, mak a srvic to th othr agnts as wll. It is worth obsrving in Figur 5 that whn n 2 is low (.g., n 2 3) th agnts that us n 2 tak th rol of xplorrs and los a lot, whil th agnts that us n 1 gain from that situation. Convrsly, for high valus of n 2 (.g., n 2 7) th prformancs of th xploitrs, 488

Adaptiv Load Balancing: A Study in Multi-Agnt Larning 6 A v r a g 67 66 : T 1 : T 2 T i m p r T ok n 65 64 63 62 61 2 3 4 5 6 7 8 9 10 - Exponnt of th Randomization Function: n 2 Figur 6: Prformanc of 2 populations of 90/10 agnts with n 1 = 4 and w 1 = w 2 = 0:3 which us n 2, dtriorat. This mans that if th xploitrs ar too static, thn thy hindr ach othr, and th xplorrs can tak advantag of it. For a bttr undrstanding of th phnomna involvd, w hav xprimntd with an asymmtric population, composd of on larg group and on small on, instad of two groups of similar siz. Figur 6 shows th rsults obtaind using a stting similar to th on abov, but whr population 1 is composd of 90 mmbrs whil population 2 consists of only 10 mmbrs. In this cas, for vry valu of n 2 4, th xploitrs do bttr than th xplorrs. Th xprimnts also show that in this cas, th highr n 2 is th bttr T 2 is, i.. th mor th xploitrs xploit, th mor thy gain. Th abov rsults suggst that a singl agnt gts th bst rsults for itslf by bing noncooprativ and always adopting th rsourc with th bst prformanc (i.., us BCSR), givn that th rst of th agnts us an adaptiv (i.., cooprativ) SR. Howvr, if all of th agnts ar non-cooprativ thn all of thm will los. 8 In conclusion, th slsh intrst of an agnt dos not match with th intrst of th population. This is contrary to rsults obtaind in othr basic contxts of multi-agnt larning (Shoham & Tnnnholtz, 1992). What w hav shown is how, for a xd valu of w, coxisting populations adopting dirnt valus of n intract. Similar rsults ar obtaind whn w x th valu of n and 8. This is in fact an illuminating instanc of th wll-known prisonrs dilmma (Axlrod, 1984). 489

Scharf, Shoham, & Tnnnholtz 6 A v r a g T i m p r 67 66 65 64 63 : T 1 : T 2 T ok 62 n 61 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Wight of th stimator paramtr (w 2 ) - Figur 7: Prformanc of 2 populations of 50 agnts with n 1 = n 2 = 4 and w 1 = 0:3 us two dirnt valus for w. In such cass, th agnts adopting th lowr valu of w ar in gnral th winnrs, as shown in Figur 7 for n 1 = n 2 = 4 and w 1 = 0:3. Whn w is vry low thn th corrsponding agnts gt poor rsults and thy ar no longr th winnrs, as in th cas of vry high n in Figur 5. Anothr intrsting phnomnon is obtaind whn confronting adaptiv agnts with load-qurying agnts. Load-qurying agnts ar agnts who ar abl to consult th rsourcs about whr thy should submit thir jobs. A load-qurying agnt will submit its job to th most unloadd rsourc at th givn point. Whn confronting load-qurying agnts with adaptiv ons, th rsults obtaind by th adaptiv agnts ar obviously wors than th rsults obtaind by th load-qurying ons, but ar bttr than th rsults obtaind by a complt population of adaptiv agnts. This mans that load-qurying agnts do not play th rol of \parasits", as th abov-mntiond \xploitrs"; th load-qurying agnts hlp in maintaining th load balancing among th rsourcs, and thrfor hlp th rst of th agnts. Anothr rsult w obtain is that agnts who adopt dtrministic SRs may bhav as parasits and worsn th prformanc of adaptiv agnts. Ths assrtions ar supportd by th xprimnts dscribd in Figur 8, whr a population of 90 agnts, ach of which uss an adaptiv SR with paramtrs (n; w), is facd with a minority of 10 agnts which us dirnt SRs, as statd abov. In particular, in th four cass w considr, th minority bhavs in th following ways: (i) thy choos th rsourc 490

Adaptiv Load Balancing: A Study in Multi-Agnt Larning 90 agnts 10 agnts T 1 T 2 (.3,4) (.3,20) 65.161 59.713 (.3,4) (.1,4) 64.630 63.818 (.3,4) Load-qurying 62.320 47.236 (.3,4) Using Rs. 0 65.499 55.818 Figur 8: Prformanc of 2 populations of 90/10 agnts with various SRs which gav bst rsults, (ii) thy ar vry consrvativ in updating th history, (iii) thy ar load-qurying agnts, (iiii) thy all us dtrministically th rsourc with capacity 40 (in our basic xprimntal stting). 7. Communication among Agnts Up to this point, w hav assumd that thr is no dirct communication among th agnts. Th motivation for this was that w considrd situations in which thr wr absolutly no transmission channls and protocols. This assumption is in agrmnt with th ida of multi-agnt rinforcmnt larning. In systms whr massiv communication is fasibl w ar not so much concrnd with multipl agnt adaptation, and th problm rducs to supplying satisfactory communication mchanisms. Multi-agnt rinforcmnt larning is most intrsting whr ral lif forcs agnts to act without a-priori arrangd communication channls and w must rly on action-fdback mchanisms. Howvr, it is of intrst to undrstand th cts of communication on th systm cincy (as in Shoham & Tnnnholtz, 1992; Tan, 1993), whr th agnts ar augmntd with som sort of communication capabilitis. Our study of this xtnsion ld to som illuminating rsults, which w will now prsnt. W assum that ach agnt can communicat only with som of th othr agnts, which w call its nighbors. W thrfor considr a rlation nighbor-of and assum it is rxiv, symmtric and transitiv. As a consqunc, th rlation nighbor-of partitions th population into quivalnc classs, that w call nighborhoods. Th form of communication w considr is basd on th ida that th cincy stimators of agnts within a nighborhood will b shard among thm whn a dcision is mad (i.., whn an agnt chooss a rsourc). Th radr should notic that this is a naiv form of communication and that mor sophisticatd typs of communication ar possibl. Howvr, th abov form of communication is most natural whn w concntrat on agnts that updat thir bhavior basd only on past information. In particular, this typ of communication is similar to th ons usd in th abov-mntiond work on incorporating communication into th framwork of multi-agnt rinforcmnt larning. W suppos that dirnt SRs may b usd by dirnt agnts in th sam population, but w impos th condition that within a singl nighborhood, th sam SR is usd by all its mmbrs. W also assum that ach agnt kps its own history and updats it by itslf in th usual way. Th choic, instad, is basd not only on th agnt cincy stimator, but on 491

Scharf, Shoham, & Tnnnholtz A v r a g T i m p r T ok n 71 70 69 68 67 66 65 6 : 5 CNs of 20 agnts : 20 CNs of 5 agnts : 50 CNs of 2 agnts 2 3 4 5 6 7 8 9 10 - Exponnt of th Randomization Function: n Figur 9: Prformanc of th adaptiv Slction Ruls for random load prol for communicating agnts th avrag of th cincy stimators of th agnts in th corrsponding nighborhood. Such avrag is calld th nighborhood cincy stimator. Th nighborhood cincy stimator has no physical storag: Its valu is rcalculatd ach tim a mmbr nds it. In ordr to compar th bhavior of communicating agnts and non-communicating ons, w assum that in a singl population thr might b, asid from th nighborhoods dnd abov, also som nighborhoods that do not allow th sharing of cincy stimators among its mmbrs. Th mmbrs of ths nighborhoods bhav as dscribd in th prvious sctions, i.., ach agnt rlis only on its own history. Th only thing that is common among th mmbrs of such a nighborhood is that all its mmbrs us th sam SR. W call communicating nighborhood (CN), a nighborhood in which th cincy stimators ar shard whn a dcision is takn and non-communicating nighborhood (NCN), a nighborhood in which this is not don. Th rst st of xprimnts w ran, rgards a population composd of only CNs, all of th sam siz. In particular, w considrd CNs of various sizs, starting from 50 CNs of siz 2, going to 5 CNs of siz 20. Th load prol xploitd is th random load chang dnd in Subsction 5.3, th valu of w is takn to b 0:3, and n is takn to hav various valus. Th rsults obtaind ar shown in Figur 9. 492

Adaptiv Load Balancing: A Study in Multi-Agnt Larning Th rsults show that such communicating populations do not gt good rsults. Th rason for this is that mmbrs of a CN tnd to b vry consrvativ, in th sns that thy mostly us th bst rsourc. In fact, sinc thy rly on an avrag of svral agnts, th pictur thy hav of th systm tnds to b much mor static. In particular, th biggr is th CN th mor consrvativ its mmbrs tnd to b. For xampl, considr th valus of (n; w) that giv th bst rsults for non-communicating agnts, thos valus giv quit bad prformanc for CNs sinc thy turn to b too consrvativ. Using mor adaptiv valus of (n; w), th bhavior of a communicating population improvs and rachs a prformanc that is just slightly wors than th prformanc of a non-communicating population. Tuning th paramtrs using a nr grain, it is possibl to obtain a prformanc that is qual to th on obtaind by a non-communicating population. Howvr, it sms clar that no obvious gain is achivd from this form of communication capability. Th intuitiv xplanation is that thr ar two opposit cts causd by th communication. On th on hand, th agnts gt a fairr pictur of th systm which prvnts thm from using bad rsourcs and thrfor gtting bad prformanc. On th othr hand, sinc all of th agnts in a CN hav a \bttr" pictur of th systm, thy all tnd to us th bst rsourcs and thus thy all compt for thm. In fact, th agnts bhav slshly and thir slsh intrst may not agr with th intrst of th population as a whol. Th intrsting mssag that w gt is that th fact that som agnts may hav a \distortd" pictur of th systm (which is typical for non-communicating populations), turns out to b an advantag for th population as a whol. Sharing th data among agnts lads to poorr prformancs also bcaus in this cas th agnts hav common viws of loads and targt jobs toward th sam (lightly loadd) rsourcs, which quickly bcom ovrloadd. In ordr to protably us th shard data, w should allow for som form of rasoning about th fact that th data is shard. This problm howvr is out of th scop of this papr (s.g., Lssr, 1991). In ordr to undrstand th bhavior of th systm whn CNs and NCNs fac ach othr, w considr an NCN of 80 agnts togthr with a st of CNs of qual siz, for dirnt valus of that siz. Th rsults of th corrsponding xprimnts ar shown in Figur 10. Th mmbrs of th CNs, bing mor inclind to us th bst rsourcs, bhav as parasits in th sns xplaind in Sction 6. Thy xploit th adaptivnss of th rst of th population to obtain good prformanc from th bst rsourcs. For this rason thy gt bttr rsults than th rst of th population, as shown by th xprimntal rsults. It it intrsting to obsrv that whn th NCN uss a vry consrvativ slction rul, th CNs obtain vn bttr rsults. Th intuitiv xplanation for this bhavior is that although all groups, i.., both th communicating ons and th on with high valu of n, tnd to b consrvativ, th communicating ons \win" bcaus thy ar consrvativ in a mor \clvr" way, that is making us of a bttr pictur of th situation. Th conclusion w draw in this sction is that th proposd form of communication btwn agnts may not provid usful mans to improv th prformanc of a population in our stting. Howvr, w do not claim that communication btwn agnts is compltly uslss. Nvrthlss, w hav obsrvd that it dos not provid a straightforward signicant improvmnt. Our rsults support th claim that th sol past history of an agnt is a 493

Scharf, Shoham, & Tnnnholtz 80 agnts 20 agnts T 1 T 2 (.3,4) 1 NCN (.3,4) 1 CN 65.287 63.054 (.3,4) 1 NCN (.3,4) 2 CNs 65.069 63.307 (.3,4) 1 NCN (.3,4) 5 CNs 65.091 62.809 (.3,4) 1 NCN (.3,4) 10 CNs 64.895 63.840 (.3,10) 1 NCN (.3,4) 1 CN 68.419 60.018 (.3,10) 1 NCN (.3,4) 2 CNs 68.319 59.512 (.3,10) 1 NCN (.3,4) 5 CNs 68.529 60.674 (.3,10) 1 NCN (.3,4) 10 CNs 68.351 61.711 Figur 10: Prformanc of CNs and NCNs togthr rasonabl information on which to bas its dcision, assuming w do not considr availabl any kind of ral-tim information (.g., currnt load of th rsourcs). 8. Discussion Th prvious sctions wr dvotd to a rport on our xprimntal study. W now synthsiz our obsrvations in viw of our motivation, as discussd in Sctions 1 and 2. As w mntiond, our modl is a gnral modl whr activ autonomous agnts hav to slct among svral rsourcs in a dynamic fashion and basd on local information. Th fact that th agnts us only local information maks th possibility of cint loadbalancing qustionabl. Howvr, w showd that adaptiv load balancing basd on purly local fdback is a fasibl task. Hnc, our rsults ar complmntary to th ons obtaind in th distributd computr systms litratur. As Mirchandany and Stankovic (1986) put it: \: : :what is signicant about our work is that w hav illustratd that is possibl to dsign a larning controllr that is abl to dynamically acquir rlvant job schduling information by a procss of trial and rror, and us that information to provid good prformanc." Th study prsntd in our papr supplis a complmntary contribution whr w ar abl to show that usful adaptiv load balancing can b obtaind using purly local information and in th framwork of a gnral organizational-thortic modl. In our study w idntid various paramtrs of th adaptiv procss and invstigatd how thy act th cincy of adaptiv load balancing. This part of our study supplis usful guidlins for a systms dsignr who may forc all th agnts to work basd on a common slction rul. Our obsrvations, although somwhat rlatd to prvious obsrvations mad in othr contxts and modls (Hubrman & Hogg, 1988), nabl to dmonstrat aspcts of purly local adaptiv bhavior in a non-trivial modl. Our rsults about th disagrmnt btwn slsh intrst of agnts and th common intrst of th population is in sharp contrast to prvious work on multi-agnt larning (Shoham & Tnnnholtz, 1992, 1994) and to th dynamic programming prspctiv of arlir work on distributd systms (Brtskas & Tsitsiklis, 1989). Morovr, w xplor how th intraction btwn dirnt agnt typs acts th systm's cincy as wll as 494

Adaptiv Load Balancing: A Study in Multi-Agnt Larning th individual agnt's cincy. Th rlatd rsults can b also intrprtd as guidlins for a dsignr who may hav only partial control of a systm. Th synthsis of th abov obsrvations tachs us about adaptiv load balancing whn on adopts a rinforcmnt larning prspctiv whr th agnts rly only on thir local information and activity. An additional stp w prformd attmpts to bridg som of th gap btwn our local viw and prvious work on adaptiv load balancing by communicating agnts, whos dcisions may b controlld by larning automata or by othr mans. W thrfor rul out th possibility of communication about th currnt status of rsourcs and of joint dcision-making, but nabl a limitd sharing of prvious history. W show that such limitd communication may not hlp, and vn dtriorat systm cincy. This lavs us with a major gap btwn prvious work whr communication among agnts is th basic tool for adaptiv load balancing and our work. Much is lft to b don in attmpting to bridg this gap. W s this as a major challng for furthr rsarch. 9. Rlatd Work In Sction 2 w mntiond som rlatd work in th ld of distributd computr systms (Mirchandany & Stankovic, 1986; Billard & Pasqual, 1993; Glocknr & Pasqual, 1993; Mirchandany t al., 1989; Zhou, 1988; Eagr t al., 1986). A typical xampl of such work is th papr by Mirchandany and Stankovic (1986). In this work larning automata ar usd in ordr to dcid on th action to b takn. Howvr, th suggstd algorithms havily rly on communication and information sharing among agnts. This is in sharp contrast to our work. In addition, thr ar dirncs btwn th typ of modl w us and th modl prsntd in th abov-mntiond work and in othr work on distributd computr systms. Applications of larning algorithms to load balancing problms ar givn by Mhra (1992), Mhra and Wah (1993). Howvr, in that work as wll, th agnts (sits, in th authors' trminology) hav th ability to communicat and to xchang workload valus, vn though such valus ar subjct to uncrtainty du to dlays. In addition, dirntly from our work, th larning activity is don o-lin. In particular, in th larning phas th whol systm is ddicatd to th acquisition of workload indics. Such load indics ar thn usd in th running phas as thrshold valus for job migration btwn dirnt sits. In spit of th dirncs, thr ar som similaritis btwn our work and th abovmntiond work. On important similarity is th us of larning procdurs. This is in dirnc from th mor classical work on paralll and distributd computation (Brtskas & Tsitsiklis, 1989) which applis numrical and itrativ mthods to th solution of problms in ntwork ow and paralll computing. Othr similaritis ar rlatd to our study of th division of th socity into groups. This somwhat rsmbls work on group formation (Billard & Pasqual, 1993) in distributd computr systms. Th information sharing w allow in Sction 7 is similar to th limitd communication discussd by Tan (1993). In th classication of load-balancing problms givn by Frrari (1985), our work falls into th catgory of load-indpndnt and non-prmptiv pur load-balancing. Th problms w invstigat can b also sn as sndr-initiatd problms, although in our cas th sndr is th agnt and not th (ovrloadd) rsourc. 495