PLANNING HEURISTICS FOR CONJUNCTIVE AND DISJUNCTIVE GOALS. A Preliminary Report. Submitted to the Faculty. Purdue University.

Size: px

Start display at page:

Download "PLANNING HEURISTICS FOR CONJUNCTIVE AND DISJUNCTIVE GOALS. A Preliminary Report. Submitted to the Faculty. Purdue University."

Samantha Leonard
6 years ago
Views:

1 PLANNING HEURISTICS FOR CONJUNCTIVE AND DISJUNCTIVE GOALS A Preliminary Report Submitted to the Faculty of Purdue University by Lin Zhu In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December

2 ii TABLE OF CONTENTS Page LIST OF TABLES... iv LIST OF FIGURES... v ABSTRACT... vi Introduction.... Planning Heuristics for Conjunctive Goals.... Ongoing Work and Future Directions... Background.... Strips Planning.... Planning Heuristics.... Relaxed Plans.... The FF Planner.... Landmarks... Heuristics for Conjunctive Goals.... Combination via SUM.... Combination via MIN.... Combination via MAX.... Empirical Results..... Planners..... Domains..... Blocksworld, Blocksworld-no-arm, and Depots..... Twenty-four Puzzle..... Freecell..... Grid..... Logistics, Rovers, and Tireworld...

3 iii Page. Chapter Summary... Heuristics for Disjunctive Goals (Future Direction).... Generalized Mean Values.... Properties of Generalized Mean as Planning Heuristics.... Recursive Computation of Planning Heuristics... Random Synthesis of Planning Problems: A Quantitative Model of Subgoal Interactions (Future Direction).... Visualizing Conjunctive Heuristics.... A Generative Model for Random Planning Problems.... Empirical Properties of the Randomly Generated Planning Problems

4 iv Table LIST OF TABLES Page. Number of random instances tested on, runtime limit (in seconds), average runtime (in seconds), and average plan length (enclosed in parentheses). An entry of - represents that no instance is solved. The success ratio corresponding to any entry can be found in Figure..... Examples of generalized mean values....

5 v LIST OF FIGURES Figure Page. Success Ratio for Blocksworld..... Success Ratio for -puzzle..... Success Ratio for Freecell..... Success Ratio for Grid..... Generalized Mean of Two Subgoal Heuristics: Contour and Gradient.... Influence of Progress Rate: Contour and Gradient..... Influence of Positive Intractions: Contour and Gradient..... Influence of Negative Interactions: Contour and Gradient..... Influence of Solved Subgoals: Contour and Gradient....

6 vi ABSTRACT Zhu, Lin. Ph.D., Purdue University, December,. Planning Heuristics for Conjunctive and Disjunctive Goals. Major Professor: Robert L. Givan. Automated planning is an area of Artificial Intelligence (AI) that studies the reasoning process of anticipating the expected outcomes of action sequences and deliberately organizing them to achieve specified goals. Heuristic search based planners have been dominantly successful in recent years. Planning goals can typically be recursively decomposed into conjunctive and disjunctive sets of subgoals, with potential interactions between subgoals. We study the problem of building effective heuristics for achieving conjunctive and disjunctive goals from heuristics for individual subgoals. A major difficulty of planning with conjunctive sets of subgoals is that solutions to them may interfere with each other, yet the subgoals must be simultaneously achieved. Conjunctive subgoals are sometimes best solved sequentially or concurrently, depending on the specific domain structure. Previous methods do not generally accurately reflect interactions between subgoals. We consider a straightforward method for building conjunctive heuristics that smoothly trades off between previous common methods. In addition to first explicitly formulating the problem of designing conjunctive heuristics, our major contribution is the discovery that this straightforward method substantially outperforms previously used methods across a wide range of domains. Based on a single positive real parameter k, our heuristic measure sums the individual heuristic values for the subgoal conjuncts, each raised to the k th power. Varying k allows loose approximation and combination of the previous min, max, and sum approaches, while mitigating some of the weaknesses in those approaches. Our empirical work shows that for many benchmark plan-

7 vii ning domains there exist fixed parameter values that perform well we give evidence that these values can be found automatically by training. Our method, applied to top-level conjunctive goals, shows dramatic improvements over the heuristic used in the FF planner across a wide range of planning competition benchmarks. Also, our heuristic, without computing landmarks, consistently improves upon the success ratio of a recently published landmark-based planner FF-L. We then extend our research to heuristics for disjunctive sets of subgoals. Only one of the subgoal disjuncts must be achieved, thus previous methods dominantly estimate disjunctive heuristics as the min of subgoal heuristics. However, when the disjunctive goal itself is a subgoal of a conjunctive goal, the amounts of backup plans for the disjunctive goal is an important indicator of the robustness against interference from other conjunctive subgoals. Our ongoing work examines extensions of the aforementioned straightforward method that can reflect such collective difficulty of achieving disjunctive goals. For a final component, we develop an approach to generate random planning problems. By collecting real solution lengths for these problems, we develop a quantitative model that relates simple statistics reflecting subgoal interactions with appropriate parameter values of our conjunctive heuristic method. This model deepens our understanding of our method and leads to on-line algorithms for automatically adjusting parameters for our method.

8 . INTRODUCTION Automated planning is an area of Artificial Intelligence (AI) that studies the reasoning process of anticipating the outcomes of actions and deliberately organizing a sequence of actions to transform an initial state to a goal state. Planning is known to be computationally hard. Yet most planning application domains contain structures that can be exploited, making efficient planning possible. Most planners search for solutions in various spaces of potential plans [Blum and Furst, ; Hoffmann and Nebel, ; McAllester and Rosenblitt, ]. A mechanism to informatively guide the search, such as a heuristic function, is critical as those search spaces are all exponential in size. A heuristic function estimates the distance to goal in each search state. Good heuristic functions have been an important factor for success in many modern planners [Bonet and Geffner, ; Gerevini et al., ; Helmert, ; Hoffmann and Nebel, ]. Planning goals can typically be recursively decomposed into conjunctive and disjunctive sets of subgoals, with potential interactions between subgoals. We study the problem of building effective heuristics for achieving conjunctive and disjunctive goals from heuristics for individual subgoals. We call the resulting heuristics conjunctive heuristics and disjunctive heuristics respectively.. Planning Heuristics for Conjunctive Goals The need to seek a conjunction of subgoals occurs in many contexts in planning. Typically, planning goals and preconditions of actions are directly represented as a conjunctive set of literals. Furthermore, in partial-order planning, a given partial plan implicitly represents the conjunction of the subgoals of resolving all threats and supporting all open conditions [McAllester and Rosenblitt, ]. Other regression

9 approaches, such as the backward phase of GraphPlan [Blum and Furst, ], maintain at any time a conjunctive set of open subgoals. For a further example of conjunctive goals, we consider pattern databases [Culberson and Schaeffer, ], a proven successful source of heuristics in recent years from the search community. A pattern database is a pre-computed lookup table of solution lengths for a relaxation of the problem domain. Multiple pattern databases, i.e., multiple different relaxations, are typically combined to form a single heuristic that seeks to drive each database heuristic to zero, often by computing the maximum or sum over the available pattern databases [Korf, ]. We view the resulting heuristic as seeking a conjunction of subgoals, each stating that a pattern database heuristic is zero. Edelkamp [] used automatically constructed pattern databases in planning. A final example arises with planning landmarks. Hoffmann et al. [] introduced landmarks: these are propositions that every correct solution achieves (makes true) at some point during execution. Planning goals can be extended usefully by adding a conjunct for each landmark asserting that that landmark has been achieved, leading again to conjunctive goals. A major difficulty of planning with conjunctive sets of subgoals is that solutions to them may interfere with each other, yet the subgoals must be simultaneously achieved. Conjunctive subgoals are sometimes best solved sequentially or concurrently, depending on the specific domain structures. Most planners in the literature can be viewed as constructing a conjunctive heuristic, either implicitly or explicitly. In previous work, a conjunctive heuristic is often computed as either the sum [Bonet and Geffner, ; Hoffmann and Nebel, ; Korf, ; Nguyen and Kambhampati, ], the minimum [Hoffmann et al., ], or the maximum [Edelkamp, ; Haslum and Geffner, ; Korf, ] ofthe subgoal heuristics. These methods do not generally accurately reflect interactions The use of the minimum subgoal heuristic requires combination with the number of unachieved subgoals as discussed in Chapter.

10 between subgoals. In consequence, the resulting conjunctive heuristic may be misleading even if true distance is provided directly as the subgoal heuristic. For instance, the sum of the subgoal heuristics gives a large plateau in many transportation domains, where approaching one subgoal (e.g., city) requires simultaneously moving away from another. Here, negative subgoal interaction is being ignored in the summation. Even so, each of these simple conjunctive heuristics performs well in some subset of the standard planning benchmark domains. We consider a straightforward method for building conjunctive heuristics that smoothly trades off between the previous methods. In addition to first explicitly formulating the problem of designing conjunctive heuristics, our major contribution here is the discovery that this straightforward method substantially outperforms previously used methods across a wide range of domains. Based on a single, positive real parameter k, our heuristic measure sums the individual subgoal heuristics, each raised to the k th power. Different values of k allow the loose approximation of each previous method (sum, min, or max) discussed above, and intermediate values of k select a tradeoff between these methods: for example, focusing on the closest subgoal, but not without regard for the effect on the further subgoals. We will demonstrate empirically that intermediate values of k can be very important to performance, and that the loose approximations of sum, min, or max typically outperform sum, min, or max, respectively for subtle reasons discussed below. We expect this discovery to be relevant to performance improvement for a wide range of planners. Our experimental results show that, while the ideal value of k varies from domain to domain, for typical benchmark planning domains there exist fixed parameter values that perform well we give evidence below that these values can then be found automatically by training on the domain. Our method significantly improves FF s performance on a set of hard domains [Hoffmann, ] for FF, as well as on a representative search domain, the -puzzle, while retaining FF s performance in other domains. We also show, strikingly, that the use of our heuristic, without computing or using landmarks, consistently improves upon

11 the success ratio of a recently published planner FF-L that computes and exploits landmarks [Hoffmann et al., ], on the domains that FF-L was demonstrated on.. Ongoing Work and Future Directions We then extend our research to heuristics for disjunctive sets of subgoals. Only one of the subgoal disjuncts must be achieved, thus previous methods dominantly estimate disjunctive heuristics as the min of subgoal heuristics. However, when the disjunctive goal itself is a subgoal of a conjunctive goal, the amounts of backup plans for the disjunctive goal is an important indicator of the robustness against interference from other conjunctive subgoals. Our ongoing work examines extensions of the aforementioned straightforward method that can reflect such collective difficulty of achieving disjunctive goals. Our method of computing heuristic will favor states with more backup plans. It can potentially be very helpful to relaxed-plan based heuristics. Those heuristics are computed with negative effects of actions ignored, thus having much trouble with recognizing negative interactions. For a final component, we develop an approach to generate random planning problems. By collecting real solution lengths for these problems, we develop a quantitative model that relates simple statistics reflecting subgoal interactions with appropriate parameter values of our conjunctive heuristic method. This model deepens our understanding of our method and leads to on-line algorithms to automatically adjust parameters for our method. This report is organized as follows. We first introduce necessary backgrounds. In Chapter we review previous methods of computing conjunctive heuristics, qualitatively discuss merits of our method, and then present a broad range of empirical results. We then introduce our ongoing work on designing disjunctive heuristics and generating random planning problems based on a quantitative model of subgoal interaction.

12 . BACKGROUND. Strips Planning We first introduce Strips, the most widely used representation of planning problems. Let X be a finite set of propositions. A state S is a finite subset of X. Anaction o is a triple o = Pre(o), Add(o), Del(o) where Pre(o) are the preconditions, Add(o) istheadd list and Del(o) isthedelete list, each being a set of propositions. The result Result(S, (o,...,o n )) of applying an action sequence (o,...,o n )toa state S is given by Result(Result(S, (o,...,o n )), (o n )), where for n equals the result is undefined unless Pre(o ) S, and (S Add(o )) Del(o ), otherwise. A planning task P is a set of actions containing actions Start and Finish, where Pre(Start), Del(Finish), and Add(Finish) are all empty. We refer to Pre(Finish) as the goal region and Add(Start) as the initial state. A linear solution or plan for a task P is an ordered action sequence o, beginning with Start, ending with Finish such that Result(, o) is defined.. Planning Heuristics A heuristic function is a function mapping state-goal pairs to a totally ordered set with minimal and maximal elements and. This totally ordered set is typically the non-negative real numbers together with, denoted R +. We assume throughout that a heuristic h(s, g) is if and only if the goal g is achieved in state s, and only if g is unreachable from s. We will sometimes omit the goal argument g to a heuristic function when it is assumed known implicitly.

13 . Relaxed Plans We also consider the relaxed planning task P R (which ignores delete effects) given by {(Pre(o), Add(o), ) (Pre(o), Add(o), Del(o)) P }. A solution to a relaxed planning task is called a relaxed plan. Relaxed plans have become an important source of computing planning heuristics in recent years [Bonet and Geffner, ; Hoffmann and Nebel, ].. The FF Planner The success of FF [Hoffmann and Nebel, ] mainly comes from its efficient and accurate heuristic, and its unique search strategy, enforced hill-climbing, that is incomplete but often very fast. Unlike pure hill-climbing, which iteratively selects single actions with the best one-step-look-ahead heuristic value and often has difficulty with local minima and plateaus, enforced hill-climbing iteratively uses breadthfirst search to find action sequences that lead to states with heuristic values that are strictly better than the current state. We know that an ideal search heuristic would be the optimal length of a complete plan. Since this heuristic is not tractably computable, FF approximates it by two levels of relaxations. In the following discussion, we denote the set of plans for task P by Plans(P ), and the set of relaxed plans by Relaxed Plans(P ). Obviously Plans(P ) Relaxed Plans(P ). First, FF considers the relatively easier problem of computing Relaxed Plans(P ), and approximates (and lower bounds) the optimal length among Plans(P ) bythe optimal length among Relaxed Plans(P ). Empirical [Hoffmann, ] and theoretical [Hoffmann, ] results show that optimal relaxed plan length (applied with enforced hill-climbing) is a good heuristic for a large variety of planning domains, and often leads to polynomial search complexity. In the rare case the enforced hill-climbing fails, FF resorts to an expensive but complete search.

14 Second, since it s still difficult to compute the optimal relaxed plan, it extracts one relaxed plan to get an approximation of the optimal relaxed plan length, utilizing various heuristic considerations to encourage near-optimality. Empirical results [Hoffmann, ] show that the length of the relaxed plan extracted this way is often a good approximation of the optimal relaxed plan length. FF uses this length as its heuristic.. Landmarks Hoffmann et al. [Hoffmann et al., ] introduced the concept of landmarks for the purpose of automatically decomposing planning tasks into sub-tasks. We call a proposition l X a causal landmark of a planning task P if for every solution o of P, l Pre(o) for some action o included in o. Causal landmarks are landmarks in the sense of Hoffmann et al. [Hoffmann et al., ]: The planning problem cannot be solved if the actions adding such a proposition are removed. However, not every landmark is a causal landmark: Some landmarks are just incidental effects of the action that adds them. Consider a problem where the agent must travel in the rain to solve the problem. Getting wet will be a non-causal landmark, as it is a necessary effect of an essential action. Setting getting wet as a sub-task would be misleading. Thus we consider non-causal landmarks to be misleading and inappropriate as subgoals for the planning task. Hoffmann et al. showed the problem of finding landmarks for a planning task to be PSPACE-hard. The proof can be trivially extended to the case of causal landmarks.

15 . HEURISTICS FOR CONJUNCTIVE GOALS The problem we consider is to build an effective conjunctive heuristic H(s, G) for a conjunctive goal G, given a vector of heuristics h(s, g i ) for each subgoal g i G. Here, we focus on the total quasi-order on states induced by heuristic functions of interest. Recent previous work [Bonet and Geffner, ; Hoffmann and Nebel, ] has also primarily exploited the ordering information in heuristic functions. This limited focus on ordering derives from a best-first or greedy goal-seeking approach characteristic of sub-optimal planning for optimal planning, an A*-like quantitative use of the exact heuristic values is suggested. Given an implicit, fixed goal g, we use h to represent the quasi-ordering induced by h: s h s iff h(s ) h(s ). min: Several commonly considered conjunctive heuristics are sum, max, count, and H sum (s, G) = i h(s, g i ) H max (s, G) = max i h(s, g i ) H count (s, G) and H min (s, G) are best defined in words. H count (s, G) is simply the number of unachieved subgoals in G, as long as they are all reachable, and otherwise. Informally, H min then returns this count, breaking ties with the distance to the closest subgoal. We use the term total quasi-order for a total, reflexive, transitive relation such a relation totally orders equivalence classes of its domain elements. More formally, H min (G) returns values in R R, which are then totally ordered lexicographically. H min (G) returns the pair of H count (G) and the smallest non-zero subgoal distance. This lexicographic ordering approach is necessary to reward actions that achieve subgoals even though they also force the heuristic to start focusing on a further subgoal.

16 We argue below that these methods are often unsuitable even if true distance is provided directly for each subgoal heuristic. Here, we mix these methods by applying a parameterized nonlinear transformation on the subgoal heuristic values before the summation. The transformation we explore here is simply to raise the subgoal heuristic values to the k th power, for a positive real parameter k. So, we define H k (s, G) = i h(s, g i ) k By varying k, the heuristic H k can loosely resemble each of the basic heuristics H sum, H max, and H min. For large values of k, H k (G) is most sensitive to the heuristic value of the furthest goal g i, like H max. When k is one, H k (G) equals H sum. Finally, as k approaches zero, H k (G) is most sensitive to the reversal of achieved subgoals, and then to the heuristic value of the closest unachieved subgoal, just as is H min. Intermediate values of k give behaviors combining the properties of the above heuristics. We will discuss the approximation of sum, min, and max by H k further below. Before presenting experimental results for the use of H k, we first discuss the qualitative strengths and weaknesses of each basic conjunctive heuristic, suggesting reasons why a tradeoff heuristic like H k may outperform each basic heuristic. We also mention key planning systems that employ variants of each basic heuristic.. Combination via SUM Perhaps the most popular conjunctive heuristic is H sum, as used, for example, in RePOP [Nguyen and Kambhampati, ] and in HSP [Bonet and Geffner, ] when selecting one frequently successful setting. The heuristic used in the very successful planner FF [Hoffmann and Nebel, ] counts the steps in a plan that achieves the conjunctive goal, ignoring delete lists We provide a minimal example to clarify the behavior of H k as k approaches zero. Let (i, j) represent a state with an ordered pair of subgoal distances i and j. Both H min and H. would order the following states in the same sequence: (, ), (, ), (, ).

17 of all actions. Because steps may be shared in plans for different subgoals, the FF heuristic is thus generally smaller than H sum, while still resembling H sum. The heuristic H sum is a summary of all the subgoal heuristics, resulting in a trade off on progress between all subgoals. However, there are domains on which subgoals are best solved sequentially or concurrently. H sum s neutral position between these two approaches can be a critical weakness, as we will discuss in the following two subsections. Often H sum is uninformative when there is negative interaction between subgoals, so that advances on one subgoal lead to matching degradations on another. This problem can be caused, for instance, by competition for non-consumable resources. For one example, on domains where a truck needs to travel to multiple locations, the truck is a resource in the sense that it cannot be present simultaneously at several locations. As the truck moves closer to one subgoal location, it may be moving further from another, resulting in no net effect on H sum. This transportation theme appears in many benchmark domains [Helmert, ]. For another example, on benchmark domains such as Freecell, Airport and Sokoban, free space is another form of resource which cannot be occupied simultaneously by multiple objects. In such cases, H sum can be misleading because it fails to take into account the interaction between competing subgoals in particular, H sum can fail to encourage seeking the subgoals one at a time to minimize resource competition.. Combination via MIN H count addresses these issues partially. However, this heuristic alone does not order enough state pairs, as many useful actions do not achieve subgoals immediately. For this reason, tie-breaking by measuring the distance to the nearest subgoal, as in H min is necessary for an effective heuristic focusing on subgoal count. H min -style approaches to conjunctive goals avoid a central problem of H sum by concentrating

18 on a single subgoal at a time. This idea underlies the approach generally used in planners in the early s, as discussed by Sacerdoti ([Sacerdoti, ]). The recent planner FF-L [Hoffmann et al., ] applies an H min -style approach to a set of subgoals found by automatically extracted planning landmarks for the problem these landmarks include the top-level goal conjuncts. Landmarks are propositions which must be made true at some point by any successful plan. Hoffmann et al. ([Hoffmann et al., ]) have developed methods for automatically finding planning landmarks. Unlike the top-level subgoals, a landmark need not be maintained once it is achieved. However, landmarks can be treated like conjunctive subgoals with a small problem modification as we discussed in the introduction. Consequently, our techniques can be directly applied to leveraging automatically discovered planning landmarks. As discussed in more detail below in the empirical results section, FF-L employs a heuristic approach similar to H min to seek a conjunction of landmarks, showing a substantial empirical gain over state-of-the-art planners that do not leverage landmarks across a wide range of benchmarks. A major weakness of H min is that H min is insensitive to positive and negative effects on subgoals that are not currently closest. Another form of this weakness lies in the heuristic focusing too absolutely on retaining previously achieved subgoals. This is a critical problem in domains where negative interaction between subgoals is strong, and a sequence of irreversibly achieved subgoals is hard or impossible to find [Korf, ]. Our simple method of minimizing H k addresses these weaknesses, while retaining approximately the strength described just above. With values of k less than, H k trades off both behaviors of H min and H sum. The H min -style approach in FF-L avoids this weakness, as discussed below.

19 . Combination via MAX On doains such as Blocksworld or -puzzle there are negative interactions between subgoals, but unlike the transportation domains, a subgoal far from completion suggests the need to destroy many already achieved subgoals. Work to complete nearby subgoals when such distant subgoals remain distant is often wasted effort. The H sum conjunctive heuristic may wrongly reward progress toward a subgoal that must be later destroyed. The above-mentioned subgoal count measure does not handle this well, either, because it becomes very important which subgoal is achieved next, not just that one is achieved. Work to order the subgoals to avoid this problem (i.e., so that achieved subgoals need not generally be reversed) has met only limited success [Koehler and Hoffmann, ; Korf, ]. In such domains, H max may reflect the true distance more accurately. Note, for instance, that greedily reducing the difficulty of the hardest problem optimally solves the famous Sussman anomaly in the Blocksworld (as described, e.g., by Korf ([Korf, ])). H max represents a strategy to concurrently solve the subgoals, switching focus from a subgoal once enough progress is made that it is not the furthest. The critical need for concurrency occurs on some domains, typically with consumable, nonexclusive resources such as fuel or time. For example, on problems of scheduling interdependent jobs with a deadline and limited facility resources, following the H max heuristic is equivalent to focusing on the critical path. Previously H max has not been explicitly noted for the above advantages, but has been favored mainly due to its admissibility, finding application in optimal planning. For an example use of H max, one setting of the heuristic planner HSP uses H max [Bonet and Geffner, ] this setting was found less successful than H sum on most domains. This is also the type of combination heuristic used implicitly in GraphPlan s backward phase [Blum and Furst, ; Haslum and Geffner, ].

20 A third example is the typical use of multiple pattern databases in search [Korf, ] and planning [Edelkamp, ], as discussed above. A significant weakness of H max is, similarly to min again, they generally ignore positive or negative interactions with subgoals that are not as distant as the furthest subgoal. Our simple method H k, with values of k>, addresses these weaknesses, while retaining approximately the strength described just above, by mixing the behaviors of H max and H sum. H k with large values of k approximates a natural extension of H max : s maxsort s iff rsort( h(s )) lex rsort( h(s )), where the function rsort( v) sorts a vector v in non-increasing order, h(s) represents the vector h (s),..., h G (s), and lex represents the lexicographic order. maxsort makes more distinctions than max does. Our results suggest that this difference can be critical. maxsort can be approximated by Hk with large values of k: s maxsort s, δ, k>δ,s Hk s. Empirical Results.. Planners We evaluate the performance of our conjunctive heuristics and previous methods on the same search algorithm that is implemented in FF [Hoffmann and Nebel, ]. The core of FF s search algorithm is enforced hill-climbing, which is incomplete but often very fast. Unlike pure hill-climbing which repeatedly selects single actions with the best one-step-look-ahead heuristic value (which may be worse than the heuristic value of the current state) and often has difficulty with local minima and plateaus, enforced hill-climbing repeatedly uses breadth-first search to find action In case the enforced hill-climbing fails, which does not happen often, FF resorts to an expensive but complete search.

21 sequences that lead to states with heuristic values that are strictly better than the current state. We replace the heuristic function of FF in turn with H min, H max,orh k, and call the resulting planners MIN, MAX, and K, respectively. We compare the performance of these planners with FF, as well as the recent landmark-based planner FF-L [Hoffmann et al., ]. FF-L computes and utilizes landmarks, as introduced in the section Combination via MIN on page. FF-L also computes a partial order L on landmarks. FF-L maintains a set L of landmarks that have not been achieved at least once, and directs a base planner to achieve the landmarks in L, as follows. FF-L constructs a disjunction of all the unachieved landmarks that are minimal in the ordering L, i.e., the disjunction over {l m L, m L l}. After the base planner achieves this disjunction, the achieved landmarks are removed from L, and the process is repeated until L is empty. In the final stage, FF-L uses the base planner to simultaneously achieve all the original goal conjuncts again, as some may have been reversed since they were achieved. The last step is necessary because FF-L does not force the base planner to retain achieved landmarks. When the base planner is FF, the heuristic to the disjunction of unachieved landmarks is the minimum of the heuristics for each landmark. FF-L s behaviors is thus similar to MIN, with five distinctions. First, FF-L considers all the landmarks, which is a superset of the subgoals MIN considers. Second, FF-L does not have the strong bias of MIN on retaining achieved subgoals. Third, unlike MIN, FF-L fails to realize that a problem is unsolvable unless every landmark is unreachable (according to the heuristic) from the current state. Thus, FF-L exhibits a stronger form of MIN s critical weakness of insensitivity to effects on subgoals that are not currently closest FF-L will happily select actions to achieve nearby subgoals while rendering further subgoals unreachable. Fourth, FF-L s heuristic computation is generally quicker than MIN s, because MIN does a separate heuristic computation for every subgoal, each of which takes at least the time for the heuristic computation

22 of the disjunction. Fifth, in FF-L the branching factor for search is typically greatly reduced because FF s heuristic computation has a side effect of recognizing helpful actions [Hoffmann and Nebel, ], so that only a small fraction of all actions are considered during search. In FF-L only the actions useful for the closest subgoal will be considered helpful actions. FF-L computes and orders a set of landmarks none of the other planners we evaluate here use such a computation, relying instead on direct heuristic measures of distance to the top-level goal conjuncts. Yet, surprisingly, our planner K consistently improves upon FF-L in success ratio. This suggests that landmarks are not the critical element in achieving the reported performance for FF-L... Domains We test the mentioned planners on a wide range of benchmark planning domains encoded in PDDL. Our test suite covers all the domains that FF-L was demonstrated on [Hoffmann et al., ]: Blocksworld, Blocksworld-no-arm, Depots, Freecell, Grid, Logistics, Rovers, Tireworld. Please find detailed descriptions of those domains in the referred paper. We also include a representative domain from the search community: -puzzle. For each of the domains, we generate a random set of problem instances, and present success ratio, average runtime, and average plan length for each of the planners. We set the runtime bound long enough on each domain for the success ratio to level-off for most planners. The number of random instances and runtime limit for each domain can be found in Table.. We use the same parameters as [Hoffmann et al., ] to generate random instances for the domains demonstrated there. For the other domain, -puzzle, we generate initial states uniformly at random, and assume that half of the generated instances are solvable. If a domain can scale on size, we have space only to present the

23 Blocksworld -Puzzle Freecell Grid Sample Size Time Limit Planner Runtime (Plan Length) FF-L () () () () MIN () () k= () () k= () () k= () () k=. () () () () k= () () () () FF () () () () k=. () () () () k=. () () () () k=. () () () () k=. () () () - - MAX () - - Table. Number of random instances tested on, runtime limit (in seconds), average runtime (in seconds), and average plan length (enclosed in parentheses). An entry of - represents that no instance is solved. The success ratio corresponding to any entry can be found in Figure. results on the largest size tested in [Hoffmann et al., ]. The results of planners are typically less distinguishable on smaller-sized instances. Heuristic-search planners are generally fast enough when they succeed, and otherwise often either fail quickly or run out of memory slowly. We therefore are focused here on increasing the success ratio in difficult domains. The results on success ratio

24 K FF-L FF success ratio (%) - - log. k Fig... Success Ratio for Blocksworld. are presented in Figure., in which the x axis, representing the value of k used by K, is shown in logarithmic scale. The results on runtime and plan length can be found in Table.. Runtimes and plan lengths should be interpreted carefully. A planner that appears slower may be solving far more problems, including some that are harder. Only solved problems are included in the planner s average runtime and plan length, giving an advantage to planners that fail on hard problems. We consider plan quality the least important of the three measures; but nonetheless neither FF nor FF-L consistently dominates K in plan length across the choices of k that yield superior success ratio... Blocksworld, Blocksworld-no-arm, and Depots In the famous Blocksworld domain, a robot arm must rearrange towers of blocks. The Blocksworld-no-arm domain is an encoding variant of Blocksworld. Without explicit reference to a robot arm, FF s heuristic presents a different local search topology [Hoffmann, ]. The Depots domain enriches Blocksworld with trans-

25 K FF FF-L success ratio (%) - - log. k Fig... Success Ratio for -puzzle. portation, so that blocks can be delivered between any pair of locations by trucks. We present results only on Blocksworld, as the results on the other domains are very similar. Each random problem instance consists of blocks. As shown in the left part of Figure., H max -style heuristics work much better than H sum and H min -style heuristics on the Blocksworld domain. FF only solves % of the problems. When k equals one, K performs slightly better than FF. Ask increases from one, the success ratio jumps up to %. With the help of landmarks, FF-L gets a significant higher success ratio (%) than FF, but is still significantly less reliable than H k with k>, even though no landmarks are leveraged by the latter. Neither MAX nor MIN solves any instance. This suggests H max itself does not order enough pairs to be useful.

26 .. Twenty-four Puzzle The sliding-tile puzzles have long been testbeds for domain-specific search. The -puzzle, which contains almost states, was first optimally solved by Korf ([Korf and Taylor, ]) with human-designed domain-specific heuristics. Here we will show that K with suitable values of k>, employing FF s domain-independent heuristic on the goal conjuncts, can solve most randomly generated instances suboptimally. We generate random instances and assume half of them are solvable. The success ratio results shown in the second part of Figure. indicate that a suitable trade-off between H max and H sum best solves -puzzle. This is reasonable since often correctly positioned tiles must be temporarily moved out of place to achieve other subgoals. % of the solvable instances were solved by K with k =. (likewise for k =. ). The performance of K drops sharply for larger or smaller k. FF, at % success, outperforms H sum (K with k = ) at %, suggesting that FF is benefiting from considering positive interactions. FF-L does not find useful landmarks other than the top-level subgoals, achieving success ratio %. Neither MIN nor MAX solves any instance. The performance difference between FF-L and MIN is likely due to FF-L s tolerance for destroying achieved subgoals, as discussed above... Freecell The Freecell domain is an adaptation of the Solitaire card game. Stacks of cards need to be rearranged, much like in Blocksworld. But certain rules must be respected when moving the cards, one of which introduces a non-consumable resource: Only a limited set of free cells can be temporarily occupied by single cards, unlike Blocksworld, where any number of blocks can be put on the table. In this sense, this domain does not yield to either sequentially solving subgoals or to solving them entirely concurrently (due to congestion).

27 K MIN FF MAX FF-L success ratio (%) - - log. k Fig... Success Ratio for Freecell. In each of our test instances, there are four suits and each suit has cards. They initially form up to ten stacks, and need to be rearranged into four goal stacks, with help of five free cells. As shown in the third part in Figure., the success ratio is % for MIN, % for FF,.% for MAX, and.% for FF-L. Neither a sequential (MIN) nor a concurrent (MAX) strategy works particularly well on the Freecell domain. The trade-off method K does not find significant improvement either. The success ratio of K is slightly higher than that of MIN with k, and is around that of MIN with k>... Grid In the Grid domain a robot needs to transport keys on a two-dimensional grid. Additionally, some grid positions must be opened by matching keys to allow the robot to go through. Each test instance is randomly generated with keys on a grid. The right part in Figure. shows that both MIN and K with small values of k solve all problem instances. FF-L appears less reliable, solving %. For

28 success ratio (%) K MIN FF-L FF - - log. k Fig... Success Ratio for Grid. larger k, the success ratio of K drops dramatically. The success ratio of FF (%) is slightly worse than K s % when k =. These results clearly demonstrate that the Grid domain is best solved by sequentially attacking the subgoals... Logistics, Rovers, and Tireworld Unlike the domains presented above, the Logistics, Rovers and Tireworld domains are solved with % success ratio by FF, K or FF-L given enough time. Since our focus is the success ratio, we will only summarize our findings on these domains. On Logistics and Rovers, K with small values of k have minor improvements over FF upon speed. The performance of K drops dramatically as k goes up from one. On Tireworld, K is slightly slower than FF with k =, and gets worse when k is greater or smaller. On all three domains, FF-L is much quicker than FF or K. Both the domains Logistics and Rovers share the transportation theme with Grid. But unlike on Grid, where two locations can be arbitrarily far away, in these domains each subgoal can be achieved in a small number of steps. This also holds for

29 Tireworld. In consequence, enforced hill-climbing solves a problem on these domains by a sequence of searches, each bound by a small constant in depth. The factors limiting scaling performance are then the heuristic computation for each state and the branching factor in search. As discussed in the previous subsection, Planners, FF-L has an advantage on these two factors.. Chapter Summary Subgoal interactions have been widely studied in planning, e.g. in [Blum and Furst, ; Cheng and Irani, ; Kambhampati et al., ; Koehler and Hoffmann, ; Korf, ]. Here, we first explicitly formulate this difficult problem as building conjunctive heuristics. We showed that a straightforward trade-off method adapts to a wide range of domains with different characteristics of subgoal interactions. Applied to top-level subgoals, our method significantly outperforms two state-of-the-art planners, one of which employs much more complex techniques than ours. It is apparent from Figure. that a simple search over values of the parameter k will easily identify, for each domain considered here, a value that leads to state-ofthe-art planning performance. Our method can be potentially applied beyond the scope of classical planning, as heuristics can measure not only the number of plan steps, but also other forms of cost such as time or resources (as explored by, e.g., [Hoffmann, ]), as well as expected cost [Bonet and Geffner, ] in probabilistic planning. On Tireworld a number of flat tires must be replaced, each requiring a fixed procedure consisting of several steps.

30 . HEURISTICS FOR DISJUNCTIVE GOALS (FUTURE DIRECTION) In this chapter, we generalize the method in previous chapter to compute heuristics for both conjunctive and disjunctive goals. The extended method, referred to as Generalized Mean, has been studied in the mathematics literature [Hardy et al., ] and utilized in such areas as minimax search [Rivest, ] and dynamic programming [Singh, ] for various reasons. However, using generalized mean based heuristics to cope with subgoal interaction is our novel contribution. We will discuss the properties of this extension and its use to compute heuristics in an HSP-style recursive fashion [Bonet and Geffner, ]. Disjunctive goals have different properties with conjunctive goals. Only one of the subgoal disjuncts must be achieved, thus previous methods dominantly estimate disjunctive heuristics as the min of subgoal heuristics. However, when the disjunctive goal itself is a subgoal of a conjunctive goal, the amounts of backup plans for the disjunctive goal is an important indicator of the robustness against interference from other conjunctive subgoals. Our method of computing heuristic will favor states with more backup plans, and thus can reflect collective difficulty of achieving disjunctive goals. It can potentially be very helpful to relaxed-plan based heuristics. Those heuristics are computed with negative effects of actions ignored, thus having much trouble with recognizing negative interactions.. Generalized Mean Values As mentioned in Section., heuristic values are typically from the totally ordered set of non-negative real numbers together with, denoted R + =[, ]. A heuristic

31 p M p (,,, ) Table. Examples of generalized mean values. h(s, g) is if and only if the goal g is achieved in state s, and only if g is unreachable from s. We can partially extend the arithmetic operations of [, ) to R+ as follows: a + = a = if a> a/ = ifa< a/ = if <a< /a = if <a< Let a =(a, a,..., a n ) be a vector of n numbers from R +, and let p be a non-zero real number. We can define the generalized p-mean of a, M p ( a), by M p ( a) =( n a p i ) p. (.) n i= The case p = yields the arithmetic mean and the case p = yields the harmonic mean. We also have the following two facts: (See [Hardy et al., ] for proofs and other facts about generalized mean values.) lim M p( a) = max p lim M p( a) = min p As an example to illustrate the above facts, the values of M p ( a) are given for a = (,,, ) and various p in Table.. i i a i a i

32 . Properties of Generalized Mean as Planning Heuristics For a set of conjunctive or disjunctive subgoals with heuristics h =(h,h,..., h n ), we can use M p ( h) as the composite heuristic. A conjunctive goal is considered achieved if and only if all subgoals are achieved, and unreachable if and only if any of the subgoals is unreachable. The composite heuristic function M p ( h) meets these requirements when p>, since under this condition M p ( h) = if and only if i, h i =, and M p ( h) = if and only if i, h i =. In comparison, a disjunctive goal is considered achieved if and only if any of the subgoal is achieved, and unreachable if and only if all the subgoals are unreachable. The composite heuristic function M p ( h) meets these requirements when p<, since under this condition M p ( h) = if and only if i, h i =, and M p ( h)= if and only if i, h i =. The generalized mean of subgoal heuristics can be viewed as a generalization of our method H k in the previous chapter in the sense that M p induces the same ordering as H k does when p = k>. Acutally M p simply adds a wrapper to H k that scales the composite heuristic values back to the original subgoal heuristics.. Recursive Computation of Planning Heuristics Our method H k in the previous chapter has been applied to top-level conjunctive goals. Here we apply the extended method M p ( h) recursively to goals at all levels. We use this setting mainly because disjunctive goals are not typically present at top-level in planning. As introduced in Sections., instrips, the most widely used planning problem representation, a planning goal is a conjunctive set of literals. Each of the literals can be possibly achieved by any of a set of actions. This set of actions can then be viewed as an disjunctive set of subgoals for the literal. To perform any of the actions, a set of literals must be simultaneously achieved as preconditions. In this way, a

33 planning problem can be recursively decomposed into conjunctive or disjunctive sets of sub-problems. Our approach is to compute a heuristic for the planning problem recursively from the sub-problems. Formally, a state S is a set of literals and the set of goal literals are defined as the preconditions of a special Finish action. Let H r (s, finish) be the heuristic estimate from the state S to the goal and H r (S, g) represent the vector of heuristic estimates from the state S to each of the subgoals in g, then H r (s, o) =+M p ( H r (s, Pre(o))) (.) where o is an action and p>, and if l s H r (s, l) = M p ( H r (s, Adder(l))) otherwise (.) where l is a literal, Adder(l) is the set of actions that include l in their add lists, and p<. Special cases of our method has been used in the HSP planner [Bonet and Geffner, ], where only MIN has been considered as a composite heuristic for disjunctive goals, and only SUM and MAX have been considered as composite heuristics for conjunctive goals. Our method can approximate those methods used in HSP and is flexible enough to address their weaknesses while approximately retaining their strength. Our very preliminary experiments with the above recursive method have not shown consistent improvements over the state-of-the-art planners. We consider this mainly due to the possible problem that positive interactions between subgoals accumulate recursively significantly enough to obscure the benefits of our method. We are considering approaches that incorporate methods in the literature that can reduce positive interaction effect, such as the relaxed plan based heuristic used in the FF planner [Hoffmann and Nebel, ].

34 . RANDOM SYNTHESIS OF PLANNING PROBLEMS: A QUANTITATIVE MODEL OF SUBGOAL INTERACTIONS (FUTURE DIRECTION) In this chapter we develop a model to synthesize random planning problems. This model will quantitatively reflect positive and negative interactions between conjunctive subgoals. We then relate the distribution of true solution lengths for problems generated from this model with our previous method of computing conjunctive heuristics. We introduce this model for two purposes. First, we target for a deeper understanding of the characteristics of various conjunctive goals, especially those favoring a concurrent solution. Such understanding will also indicate whether our method is a general approach beyond the scope our empirical results have justified. Second, this model will serve as a foundation for an on-line learning algorithm that will automatically adapt the parameters of conjunctive heuristics. For simplicity, our model only considers planning problems with two subgoals. We believe the insights developed this way can be extended to problems with arbitrary number of subgoals by considering pairwise subgoal interactions. In our problem setting conjunctive heuristics are solely computed from the subgoal heuristics. In consequence, our model clusters planning states according to the pair of subgoals heuristics on them and considers the states with the same pair of subgoals heuristics indistinguishable. We then study the problem in two folds: how the expected true solution lengths relate to subgoal heuristics and how conjunctive heuristics can reflect such relation. In the following sections, we will first demonstrate the visual patterns of the conjunctive heuristics in Chapter with various parameters. We then generate random

The TorchLight Tool: Analyzing Search Topology Without Running Any Search

The TorchLight Tool: Analyzing Search Topology Without Running Any Search Jörg Hoffmann INRIA Nancy, France joerg.hoffmann@inria.fr Abstract The ignoring delete lists relaxation is of paramount importance