CRIKEY - A Temporal Planner Looking at the Integration of Scheduling and Planning

CRIKEY - A Temoral Planner Looking at the Integration of Scheduling and Planning Keith Halsey and Derek Long and Maria Fox University of Strathclyde Glasgow, UK keith.halsey@cis.strath.ac.uk Abstract For many temoral lanning domains, the lanning and scheduling roblems are not tightly couled and so can be solved searately. However, in some cases, where the roblems do interact, this aroach will fail. A domain is resented where this is the case. CRIKEY, a lanner that searates out the logical and temoral reasoning, is introduced. It detects where they interact and the aer exlains both how it detects them and also how, in these cases, CRIKEY solves the roblems simultaneously. It will also look at CRIKEY as an architecture that uses a series of relaxations to find a lan. Preliminary results show its otential. Searation of Problems In temoral lanning, classical lanning moves closer to scheduling. This is demonstrated by PDDL2.1 (Fox & Long 2001), a language to exress temoral lanning roblems. Actions, that can have a duration, must both be chosen for their logical and metric effects (lanning) and then arranged in time so as not to break any constraints (scheduling). Conditions for the action are secified to hold at the start, at the end or over the duration of the action (these are called invariants). Effects are also secified to haen either at the start or at the end of the action. This section briefly describes a system for searating out the logical and temoral reasoning with PDDL2.1 temoral domains. It was the recursor to CRIKEY and is described in more detail in (Halsey 2003) but is summarised in Figure 1. It takes a temoral PDDL2.1 lanning domain and roblem, and translates it into a classical PDDL (Ghallab et al. 1998) with a searate file to hold the temoral information. This is done using the translator from LPGP (Long & Fox 2003) that slits a durative action into three searate actions - a start action, holding the at start conditions and effects, an invariant checking action, that holds the invariants as conditions, and an end action, holding the at end conditions and effects. There are further dummy redicates to ensure that an end action is not laced in the lan without its corresonding start and invariant action. An examle of this slit is shown in Figure 2. The STRIPS roblem is then Coyright c 2004, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. Problem Duration File PDDL2.1 Problem PDDL Problem Translator Classical Planner Total Order Plan Total Order to Partial Order Converter Partial Order Plan Simle Temoral Network Temoral Plan PDDL2.1 Domain PDDL Domain Figure 1: An Architecture for Searating Planning and Scheduling solved by a classical lanner to roduced a totally ordered lan, from which a artially ordered lan is lifted, which in turn is scheduled using the information in the temoral file and a Simle Temoral Network to roduce a temoral lan. FF (Hoffmann & Nebel 2001) was the classical lanner used, although this could have been relaced by any classical lanner. This system searates out the lanning and scheduling roblems found in temoral roblems (described in more detail in (Halsey, Long, & Fox 2003)) and solves each half searately. This has the advantage of making each roblem on its own easier to solve and has roved successful in the domains of the IPC 02 (International Planning Cometition

(:durative-action LOAD-TRUCK :arameters (?o -obj?t -truck?l -loc) :duration (=?duration 2) :condition (and (over all (at?t?l)) (at start (at?o?l))) :effect (and (at start (not (at?o?l))) (at end (in?o?t)))) (:action LOAD-TRUCK-START :arameter (?o -obj?t -truck?l -loc) :recondition (at?o?l) :effect (and (not (at?o?l)) (load-inv?o?t?l))) (:action LOAD-TRUCK-INV :arameter (?o -obj?t -truck?l -loc) :recondition (and (at?t?l) (load-inv?o?t?l)) :effect (and (load-inv?o?t?l) (iload-inv?o?t?l))) (:action LOAD-TRUCK-END :arameter (?o -obj?t -truck?l -loc) :recondition (and (load-inv?o?t?l) (iload-inv?o?t?l))) :effect (and (in?o?t) (not (load-inv?o?t?l)) (not (iload-inv?o?t?l))) Figure 2: Translation of a Durative Action into Three Instantaneous Actions 2002) (Fox & Long 2002). The roblem with this aroach is where lanning and scheduling interact, but this will be focused on later in the aer. CRIKEY This system was unified into one lanner, called CRIKEY which is imlemented in Java. However, there are some key differences in this re-imlementation that shall now be exlained. The lanning technology at its heart is a forward chaining heuristic lanner based very closely on MetricFF (Hoffmann 2002) making use of a relaxed lan length (for its heuristic) and helful actions (to aid its action selection). MetricFF erforms Enforced Hill-Climbing and on finding a lateau in the search sace will resort to Breadth First Search to exit the lateau, not exanding any states that have a lower value than the current state. If this fails to find a higher state it will resort to comlete search from the initial state. CRIKEY works in a similar manner, but will, on finding a lateau, erform Best First Search. This mean that it will exlore the same states as FF to start of with, but on failing to find an exit, will try more states, exanding those which move away from the goal. CRIKEY does away with the invariant action, searating durative actions into only a start and end instantaneous action (in fact, sometimes it does not even do this, but that will be discussed later). During the construction of the relaxed lan, if an invariant is achieved by the start effects, then it becomes a condition of the end action, else it becomes a condition of both the start and end action. Much like SAPA (Do & Kambhamati 2001), for each state the lanner kees a list of invariants that must hold and actions are not alicable if they delete one of these invariants. When a start action is chosen, its corresonding invariants are added to this list, and then when the corresonding end action is chosen, the invariants are removed from the list. This reduces the size of both the lan grah and the search sace by a third. It also revents an invariant being broken and re-achieved before an end action is chosen. Similarly, a list is ket of actions whose start actions have been selected but not their corresonding end action. An end action is only considered if it is in this list. A final state is not reached until not only are the goals achieved, but also this list of actions is emty. This rids the need for dummy redicates to be ut into the instantaneous actions, whilst still ensuring that an end action is not chosen without its corresonding start action and vice versa. This list of semi-comlete actions (a durative action where the start action has been chosen but not its corresonding end action) is also considered when the relaxed lan is extracted for the heuristic value. The goal is changed to contain dummy roositions which can only be achieved by utting in the end actions. This makes for an interesting rile effect on lateauxs in the search sace. If an action was required for its start effect, but not its end effect, then only the start action will aear in the relaxed lan. However, if this action is chosen then the end action will now aear in the relaxed lan, increasing its size by one, and turning otherwise flat lateau into having a rile on it. This can be combated by setting the cost of the end actions to zero. In fact, the cost of end actions is set to ɛ (the tolerance value) since otherwise there would be no incentive to ut in an end action with no useful effects, and these must be in to make a valid lan. The start actions are given a cost which is equal to the duration of its durative action. During relaxed lan extraction, short actions are referred to longer actions. This results in CRIKEY having a weak attemt at reducing the overall length of the lan. Interaction of Planning and Scheduling Whilst CRIKEY can erform cometitively against other state of the art technology in the domains of IPC 02 (see (Halsey 2003) for results), the aroach of searating the lanning and scheduling has a major flaw, which does not occur in the IPC 02 domains. In these domains, all the solution lans could be sequentialised (the actions erformed in a strictly ordered manner). That is to say that actions could haen in arallel if they didn t interfere, but that was a choice of the lanner, and not enforced by the domain. However, it is ossible to have domains where some actions must haen in arallel. An examle of this is in the Match Domain (Figure 3). In this domain, the goal is to mend fuses. To mend a fuse (which takes 5 time units) you must have a hand free (i.e. you can only mend one fuse at once) and there must be light (rovided by striking a match that lasts 8 time

units). It should be obvious that in a roblem instance where there are two fuses to fix, two matches are needed. A total of ten time units worth of light is needed to mend the fuses sequentially, however a match only rovides 8 time units of light. CRIKEY, as described above, will not realise this. Instead, a LIGHT MATCH START action will be chosen, then the actions necessary to mend both fuses, and then the LIGHT MATCH END action. Here an un-schedulable lan is roduced and the temoral lanner fails. In this case the sub-roblems of lanning and scheduling are more tightly couled and are interdeendent. The lanning art critically effects the scheduling roblem, and so in this case, the two cannot be searated. (define (domain matchcellar) (:requirements :tying :durative-actions) (:tyes match fuse) (:redicates (light) (handfree) (unused?match - match) (mended?fuse - fuse)) (:durative-action LIGHT_MATCH :arameters (?match - match) :duration (=?duration 8) :condition (and (at start (unused?match)) (over all (light))) :effect (and (at start (not (unused?match))) (at start (light)) (at end (not (light))))) (:durative-action MEND_FUSE :arameters (?fuse - fuse) :duration (=?duration 5) :condition (and (at start (handfree)) (over all (light))) :effect (and (at start (not (handfree))) (at end (mended?fuse)) (at end (handfree))))) Figure 3: The Match Domain A more in-deth look at where lanning and scheduling interact in temoral lanning (with articular reference to PDDL2.1) can be found in (Halsey, Long, & Fox 2003). A very brief summary follows. Interactions between scheduling and lanning occur where there are durative actions (called content actions) that must be executed in the time that another sequence of actions (called enveloe actions) execute. In these cases, the minimum length of time for the content actions must be less (or fit into) the maximum total length of time for the enveloe actions. If this is not the case, then an un-schedulable lan is roduced. During this discussion, it would be erhas fairer to refer to enveloes as otential enveloes. No effort (at this stage) is made to examine the domain further to see if the content actions could in fact be laced outside their enveloes, or indeed, if there are any content actions in the domain for the enveloe. However, on this last oint, whilst this could lead to extra enveloe reasoning being erformed, it does not alter the comleteness of the algorithm. The enveloe will in effect be emty. The work resented is conservative in this asect, with the aim being to only roduce schedulable lans. For conciseness, during the discussion, otential enveloes are simly referred to as enveloes. All this reasoning is articular to PDDL2.1, and its corresonding model of time. Logical change in PDDL2.1 haens only at the start and end of actions. However, it is still ossible, through simle comilation of the domain and the use of dummy actions and redicates, to model more comlex models of time (including change haening art way through an action) and also more comlex temoral constraints (for examle, enforcing actions to overla to a degree). The reasoning that follows does not exlicitly tackle these issues, but if the constraint is comiled into PDDL2.1, then the reasoning will hold (since now all change once again haens at the start and end of actions) and a valid, schedulable lan can be roduced. Detecting and Acting with Enveloes The basic rincile behind CRIKEY is to searate lanning and scheduling where ossible, and solve the two roblems together only where this is strictly necessary. This section describes how CRIKEY distinguishes between the two situations, and how it acts in each case. Both eventualities affect two laces in the algorithm. The first is during the calculation of the heuristic (i.e. during the construction of the relaxed lan grah and the extraction of the relaxed lan) and the second is during action selection. There are two affects of enveloes. One is to allow the further comression of durative actions, and the second, that only affects action selection, is to integrate scheduling into the algorithm at certain oints. It should be noted throughout the following reasoning that CRIKEY does not allow negative conditions at any oint in a durative action. Comressing Actions in the Relaxed Plan Grah During the calculation of the heuristic, delete effects are ignored, and so can also be ignored in the reasoning here. In this art of the algorithm, a durative action is considered an enveloe action if there are any at end conditions not met by an at start add effect. If this is not the case (i.e. (cond end \add start ) = ) then rather than slitting the durative action u into two searate start and end actions, it is treated as a single condensed action. The at start and at end add effects are combined to form the effects of the single action. The conditions to the new action are the at start conditions of the durative actions combined with any invariants of the durative action which are not met by the at start add effects. More recisely (where D is the Durative Action and A is the newly condensed action): ((D condend \ D addstart ) = ) A cond = D condstart (D condinvs \ D addstart ) A add = D addstart D addend

This action A is then treated as any other action in the relaxed lan grah. Suose there are two lanning grahs, one with the actions slit and one where it is not. The fact layer n + 1 (n being where the start action can first be alied) in the grah where the action has been slit is the same as layer n in the grah where the condensed action has been used and thus the comaction is valid. Since both the end action in action layer n + 1 and the start action in layer n must be chosen, the algorithm saves time by effectively choosing both actions (and so both set of add effects) at once. The grah is also now smaller (saving both time in construction and extraction) as it is (otentially) one layer shorter. In the case where all actions can be condensed, the grah is half the size as when the actions are not condensed (or a third of the size of the original system also had invariant actions). The actions cannot be condensed if they have at end conditions since to meet those conditions might require the use of at start add effects (or rather, an action that achieves the at end conditions, might itself have conditions that are met by the at start add effects) as in Figure 4. A cannot be ut in the RPG without B and B cannot be ut in without A. A B q q q A B q Figure 4: Examle of failure of comressing the action for a RPG Comressing Actions During Action Selection Similar reasoning to the above can be used in detecting enveloes during action selection, but now delete effects must also be taken into account. During action selection, if you ut in a start action of a durative action, then at some oint you must ut in its end action. Therefore, when a start action is ut in, the end action may as well be ut immediately afterwards. However, this logic fails where an action could be alied only in the state after the start action but not in the state after the end action and also not in the state receding lacing the state action in (as would haen in Figure 5). Figure 5: Examle of failure of comressing the action for state selection We shall name three states, s1, the state immediately before the alication of the start action, s2, the state immediately after the alication of the start action, and s3 the state immediately after the alication of the end action. An action alicable in s2 and not in s1 must have been achieved by the at start add effects (since there are no negative conditions, it could not have been achieved by an at start delete effect). Taking it further, there are no actions that could be alied in s2 and not in s3 which could not have been alied in s1, aart from those achieved by the at start add effects and then deleted by the at end delete effects. The other case where actions cannot be alied in immediate succession is the same as for actions in the relaxed lan grah, that is that there isn t an at end condition which isn t met by an at start add effect. If both of these cases do not hold, then both the start and the end action can be alied one after the other without any risk to comleteness, with no further search needed. More recisely, this can haen only where: ((cond end \ add start ) = ) ((add start del end ) = ) ((del start cond end ) = ) The alicability of both actions in state S must be determined at the same time. They are alicable if: cond start (cond end \ add start ) cond invs S facts (del start del end ) S invs = It is ossible that another action, different to the enveloe action, could have achieved the content action. This is trivial to test for in forward search since that roosition will already be true, and in this case the content action will not have to go inside the enveloe. Ensuring Schedulabilty Exactly where the slit actions cannot be alied one after the other is where the lanning and scheduling roblems interact and the two roblems cannot be solved indeendently. If lanning is done with no consideration for scheduling, then the risk is run of roducing an un-schedulable lan. To solve this, whilst still leaving the scheduling until after lanning, otenetial enveloes must be detected during lanning and then scheduling only done for them. An enveloe is an action where the slit actions cannot be alied one after the other: ((cond end \ add start ) ) ((add start del end ) ) ((del start cond end ) ) To reca, there must be time for a sequence of content actions to be executed whilst an enveloe action executes. Any time a start action from an enveloe is selected, a minischeduler is associated with that action. Any otential content action then selected after that is tested to see if it must fit in between the enveloe s start and end action, and also ordered with regard to any other content actions in the enveloe. If so, then it is tested to see if there is enough time to execute the action. If so, only then is it selected and added to the contents of the enveloe, or else that action is not alicable in the current state. Once the enveloe s end action is selected, then the mini-scheduler for the action is discarded. Each enveloe has associated with it, its start action, its end action, a Simle Temoral Network, and a list of content actions that must be inside the enveloe.

This Simle Temoral Network need only test for consistency and so uses Bellman-Ford s algorithm to test for negative cycles from aroriate sources (those nodes where there have been edges added to it). An algorithm for deciding whether an action is alicable in a state including these enveloe actions is described in Figure 6. By runing out actions that are not alicable, all lans roduced are schedulable. 1. Check A cond are satisfied. 2. Check A del do not delete any currently oen invariants. 3. If A is a start of an enveloe (a) Create a new enveloe for A and add to list of enveloes. 4. Else If A is an end of an enveloe (a) Remove A s enveloe from the list of enveloes. 5. For Each enveloe E in the list (a) Get orderings for A in E. (b) If no orderings, return true. (c) Add orderings to the STN. (d) return the consistency of the STN. Figure 6: Algorithm to decide whether an action A is alicable To convert to the artial order lan, only those actions that interact need be ordered. These arise in two cases; firstly, where one action achieves another, and secondly, where one action threatens another by deleting a recondition. The artial order lifter used is a modified version of the one described in (Moreno et al. 2002). It is a greedy algorithm that works backwards through the totally ordered lan. All threatening actions that aear before an action in the totally ordered lan, are constrained to aear before it in the artially ordered lan, and the closest action that achieves it in the totally lan, must be ordered before it in the artially ordered lan. Although this will not find an otimal lan, that is to say one that exloits all concurrency ossible, it is comlete and sound. This is modified to also be able to coe with metrics, such that all consumers of a resource are ordered before an action that has a greater than recondition using that resource, and enough roducers are also ordered before that action. (The roducers and consumers reverse roles in the case of a less than recondition). A Simle Temoral Network (?) with Floyd-Warshall s algorithm run once comutes the actual timings of the actions to roduce a valid temoral lan. Examle In the case of the match domain (Figure 3), assuming it is art of a bigger domain, CRIKEY will search forward ignoring temoral information. When it comes to ut in the start action to the light match action (an enveloe), it will instantiate a new mini-scheduler. It will then test to see if a fix fuse action need go in this mini-scheduler, and if so, if it is consistent. Indeed, it fits, so the action is alicable and choosen for the lan. It will then test the second fix fuse action. This is not consistent with mini-scheduler (there is not enough time left to fix it before the match runs out), so cannot be ut in the lan. (If the fuses could be fixed in arallel, then this second action would be consistent and so chosen). The end of the light action could then be chosen and the mini-scheduler and enveloe would be closed. CRIKEY would then go on to either light a second match (and so start a new mini-scheduler) or solve another art of the roblem. In this way a schedulable lan is roduced. Bigger Enveloes Rather than having a content action achieve the end condition to the enveloe action, a further enveloe action could achieve it in its start effects. In this case, the enveloe is from the start of the first enveloe to the end of the second enveloe and the mini-scheduler is not closed until this second enveloe action finishes (i.e. the end action is chosen). There can, of course, be many enveloes making u the big enveloe. It is also imortant to note that content actions can themselves be enveloe actions. Relaxations An alternative way of looking at CRIKEY is as a series of relaxations is detailed in Figure 7. The first relaxation is in the relaxed lan where delete effects are taken out of the roblem. This guides the search both in the heuristic distance to the goal state and also by roviding helful actions. Where this relaxation fails, it resorts to full search without helful actions. The second relaxation is in the non-temoral sequential lan, which forms a skeleton for the temoral arallel lan, where temoral effects are taken out. Relaxation also occurs where the durative actions are condensed into instantaneous action where there is no advantage in slitting them. However, where the relaxation is too severe (i.e. when leaving out the temoral information leads to no solution) the relaxation is tightened u, and the temoral information taken into account. The skill of CRIKEY is in recognising where this is the case. non-temoral lan rovides a skeleton scheduling scheduling ignored where ossible rovide heuristic distance and helful action lanning actions condensed heuristic guide Figure 7: Gradual Relaxations of CRIKEY delete effects ignored Preliminary Results A more comrehensive set of results, comaring the old system based on FF described at the beginning of this aer with other lanners on some roblems from IPC 02 can be

found in (Halsey 2003). Figures 1 & 2 comare this old system with the newly imlemented CRIKEY and its differences and also with SAPA (Do & Kambhamati 2001). It would be exected that CRIKEY would erform olynomially better over the FF system, since it comresses all the actions in this domain, and so has a third of the actions. However the results show the oosite, a olynomial decrease in erformance. Primarily, this is ut down to a different imlementation language (Java rather than C++). However, CRIKEY is more exressive, not only being able to tackle numerical roblems but of course also domains where the lanning and scheduling interact more. This carries an overhead which may also account for the decrease in erformance. CRIKEY does erform quicker on roblem 10. This is because whereas the FF System resorts to lanning from the initial state if it finds a dead end, CRIKEY will start comlete search from where it entered the lateau which led to the dead end. Table 1: DriverLog SimleTime Times file FFSystem CRIKEY SAPA 2 620 1150 7500 4 600 1880 8550 6 740 5990 2080 8 2580 14350 2180 10 2510 840 1560 Table 2: DriverLog SimleTime Plan Quality file FFSystem CRIKEY SAPA 2 100.03 126.07 104.09 4 89.03 121.04 98.12 6 109.04 114.06 64.07 8 51.03 111.04 75.08 10 91.04 71.02 36.05 CRIKEY is also comared against SAPA since this lanner is also imlemented in Java and can lan with metric values and find lans where the logical and temoral constraints interact. As can be seen, generally SAPA is slower but finds better quality lans. This is to be exected since it erforms A* search, therefore searching more states, but will most likely find a better lan. It is quicker on roblem 8 since this is where CRIKEY s forward heuristic search gets stuck in a dead end and it must resort to comlete search. A domain has been written called the lift-match domain to test CRIKEY s abilities. In it, electricians must travel round a building fixing fuses as in the match domain. It not only contains the roblem where the scheduling and lanning interact, but also a simle lift scheduling roblem, a logisitics roblem as well as resource management all embedded in. If an electrician has already lit a match in a room, then there is enough light for another electrician to fix a fuse by. Time Initial Literals (as in PDDL2.2 (Edelkam & Hoffmann 2003)) can be introduced by having fuses blowing at certain times. As of yet this domain has not been tested extensively with other lanners, but CRIKEY can solve instances whereas it is thought that some other lanners may not be able to. CRIKEY hoes to comete fully in the International Planning Cometition 2004, and so the results from that will hoefully give a better idea of CRIKEY s erformance comared with other lanners. Conclusions and Further Work CRIKEY shows that it is ossible to searate out temoral and logical reasoning, whilst combining them where necessary. CRIKEY erforms the reasoning necessary to do this. It also demonstrates that it is ossible to erform a series of relaxations in order to find a lan, and tighten these relaxations where necessary. It is hoed that these themes will be exlored further in the near future. The scheduler is of articular interest since at the moment it only erforms a greedy search. It is thought that the metric reasoning and logical reasoning can also be searated out in the scheduler, with the scheduler also erforming some search to imrove the quality of the lan, deending on the metric of the original roblem. References Do, M. B., and Kambhamati, S. 2001. Saa: a domainindeendent heuristic metric temoral lanner. In Proceedings from the 6th Euroean Conference of Planning (ECP). Edelkam, S., and Hoffmann, J. 2003. PDDL2.2: The language for the classical art of the 4th international lanning cometition. Technical reort, Fachbereich Informatik, Germany and Institut für Informatik, Germany. Fox, M., and Long, D. 2001. PDDL2.1: An extension to PDDL for exressing temoral lanning domains. Technical reort, University of Durham, UK. Fox, M., and Long, D. 2002. The third international lanning cometition: Temoral and metric lanning. In Proceedings from the 6th International Conference on Artificial Intelligence Planning and Scheduling (AIPS 02), 115 117. Ghallab, M.; Howe, A.; Knoblock, C.; McDermott, D.; Ram, A.; Veloso, M.; Weld, D.; and Wilkins, D. 1998. PDDL the lanning domain definition language. Halsey, K.; Long, D.; and Fox, M. 2003. Isolating where lanning and scheduling interact. In Proceedings from the 22nd UK Planning and Scheduling Secial Interest Grou (PlanSIG 03), 104 114. Halsey, K. 2003. Temoral lanning with a non-temoral lanner. In Doctoral Consortium at the 13th International Conference on Automated Planning and Scheduling (ICAPS), 48 52. Hoffmann, J., and Nebel, B. 2001. The FF lanning system: Fast lan generation through heuristic search. Journal of Artificial Intelligence Research 14:253 302. Hoffmann, J. 2002. Extending FF to numerical state variables. In Proceedings of the 15th Euroean Conference on Artificial Intelligence (ECAI-02), 571 575.

Long, D., and Fox, M. 2003. Exloiting a grahlan framework in temoral lanning. In Proceedings of the 13th International Conference on Automated Planning and Scheduling (ICAPS), 52 61. Moreno, D.; Oddi, A.; Borrajo, D.; Cesta, A.; and Meziat, D. 2002. Integrating hybrid reasoners for lanning and scheduling. In Proceedings from the 21st UK Planning and Scheduling Secial Interest Grou (PlanSIG 02), 179 189.