Rational drug design as hypothesis formation Alexander P.M. van den Bosch * Department of Philosophy A-weg 30, 9718 CW, Groningen voice: 0031-50-3636946/6161 fax: 0031-50-3636160 email: alexander@philos.rug.nl http://tcw2.ppsw.rug.nl/~vdbosch April 28, 1997 1. Introduction In this paper I argue that Peter Karp's proposals about 'hypothesis formation as design' and computational models, Gensim and Hypgene (Karp, 1992; Karp, 1993), can be employed to analyse and model rational drug design. Hypgene modifies theories and initial conditions by reasoning backwards from the difference between predictions and experimental observation. I investigate how rational drug design can in its turn be seen as a form of hypothesis formation. The operators of Hypgene can be exploited to propose an intervention or enabling of known biological processes, reasoning backwards from the difference of the set of explanations by hypotheses about the pathology, and a set of desired properties, of a biological system. In this way it is possible to employ the logic of the development of design research as explicated by Kuipers and Vos (Kuipers, Vos & Sie, 1992; Vos, 1991). To model rational drug design I am studying reasoning processes in research into Parkinson's disease. This research contains the development of similar biological theories about neurochemical processes, but is also about drug design problems not addressed by Karp. * I thank Theo Kuipers and Rein Vos who contributed to this paper with helpful discussions. 1
In the following two sections I will first briefly discuss Karp's models Gensim and Hypgene, and the logic of design research, as explicated by Kuipers and others. Then it is demonstrated how this explication suggests a computational model of rational drug design, and a proper definition of progress evaluation in the search process. I end this paper with a simplified example in the case of drug research for Parkinson's disease. 2. Hypothesis formation as design Peter Karp investigated the research of the biological process of attenuation (Karp, 1992; Karp, 1993). He encoded intermediate states of knowledge about biological objects and processes so that his simulator program Gensim could use it to simulate experiments and compute predictions. The Hypgene program takes these predictions as its input and compares it with observations made during an experiment. If there is a discrepancy, Hypgene modifies the assumptions about the initial conditions or the processes to explain the difference between Gensim's prediction and the experimental observation. Karp considers the process of hypothesis formation as employed by Hypgene to be a design problem, and a hypothesis an artefact to be synthesised, subject to design constraints, such as among others predictive success and simplicity. The simulation program Gensim implements a qualitative biochemistry. It represents biological objects in a taxonomic hierarchy in a class knowledge base C. Theories about biological processes such as chemical reactions are represented as rules in a process knowledge base T. The rules define what classes of objects participate in a reaction and what preconditions must be true for a reaction to occur. The rules further specify what new objects are created if the rule's conditions are met. Gensim can simulate an experiment in a separate knowledge base by applying the process-rules incrementally to the given specified objects at the start. Hypgene takes five kinds of information as input: the set of initial objects and conditions Ia, the set of predicted new or changed objects Pa generated by Gensim, the prediction error Error-a, and also access to all elements in C and T. In short, the input contains all elements from the tuple {Ia, Pa, Error-a, T, C}. Hence by definition follows that Pa is entailed by the initial conditions Ia and the rules in T: 2
Ia T = Pa An empirical experiment is anomalous when the set of observed objects Oa is not equal to the predicted set Pa. The prediction error Error-a is by definition Pa Oa, i.e. the symmetric difference between prediction Pa and observation Oa, defined as: Pa Oa := (Pa - Oa) (Oa - Pa) The main goal of Hypgene is to correct Error-a. To achieve that goal Hypgene reasons backwards from the difference between Pa and Oa. Its subgoals are to remove objects from Pa not in Oa, to modify properties of objects in Pa, to modify the quantity of objects in Pa, and to add objects from Oa that where not in Pa. To achieve these subgoals two main types of design operator are employed, those that redesign elements in Ia to I'a, and those that modify T to T' in such a way that ideally: I'a T' = P'a and P'a = Oa It may seem odd to change Ia, but Ia represents an assumption about what objects are present during an experiment. In biological practice knowledge of initial conditions is often uncertain because of the complexity of objects under study and the sometimes unpredictable effects of laboratory techniques. Karp found that it is normal practice in biology to take a closer look at the assumed initial conditions first, before changing hard earned theories about processes. 3. The logic of design research Kuipers and Vos (Kuipers, Vos and Sie, 1992) adhere to the idea that the development of design research can best be described as a more or less systematic attempt to bring together the properties of available materials and the demands derived from intended applications. They proposed a set-theoretic model of this process. In this model the wished-for properties W of an intended product is a subset of all relevant properties RP for a product to be developed. So RP-W are the unwanted properties. For each possible prototype x there is an operational profile, 3
consisting of the operational properties O(x). A problem-state during development can now be described as the symmetric difference W O(x), defined by the set of unrealised wanted properties together with the set of realised properties that are not wanted, i.e.: W O(x) := W-O(x) O(x)-W W O(x) denotes the set of problems, i.e. the qualitative deviation of O(x) from W. The goal of design research is to develop a product x' such that ideally O(x') = W. Kuipers & Vos gave a proper definition to describe transitions of problem states, which gives a basic assessment criterion for the improvement of state transitions. Prototype x 2 is an improvement of x 1 in view of W iff: O(x 2 ) W is a subset of O(x 1 ) W. So a new prototype is an improvement if it has an extra wished-for property or unwished-for property less, i.e. when the set of properties of the new prototype is more similar to the set of desired properties. For most design research it is possible to divide the set of relevant properties into two complementary sets of structural and functional properties S and F. Often first a functional profile WF is determined of what the product is supposed to do. The next question is how this can be realised. Often functional equivalents are possible, so WF does not uniquely determine a structure. Hence we look for an appropriate structural profile AS that causally implies WF. In drug research the determination of the desired functionality WF is normally guided by known characteristics of a disease or diseases. A drug for a given unwanted characteristic can be useful for each disease containing that characteristic. An improvement of a drug's structure and functionality can be defined analogous to the definition above. 4. Design as hypothesis formation The drug design model of Kuipers and Vos reasons about the properties of one object, an appropriate drug. The proposed rational drug design model considers and reasons about properties of all objects contained in a biological system, infering what 4
objects or conditions to introduce to make it function in a desired way. The idea is that rational drug design involves maximally employing known theories and knowledge about biological processes and possible pathological conditions thereof. A proper theory about a disease should be able to explain the pathological characteristics of it. The proposed computational model should be able to use representations of those theories to search for conditions that could be caused by a drug with appropriate structural properties, given the set of wished-for effects for the biological system. So given is a set Tp of theories about pathological biological processes (equal to the functional properties of a biological system) and a set of initial objects and conditions Ip (equal to the structural properties of a system) involved in the explanation of a pathological condition Ep (equal to the functional consequences of the systems structure). The initial problem state is the symmetric difference between the pathological condition Ep and a desired (healthy) condition WF. Now Hypgene-like operators can be employed to search for proper initial conditions the appropriate structure AS of a drug should bring about, reasoning about processes in T without modifying the assumptions about them. The goal is to find initial conditions that enable or prevent processes from T in such a way that the desired condition WF is caused (see figure 1A). Hence we know that the set of theories about pathological conditionstp of a biological system, which is a subset of T, together with Ip explain a given pathological condition Ep, i.e.: Ip Tp = Ep And we ideally want to infer a proper set of initial conditions I AS caused by a drug, or combination of drugs, that according to theory in T will result into the desired condition WF: I AS T = WF Therefore the design goal is to change the symmetric difference between Ep and WF. The main difference with normal Hypgene operation is that, instead of a hypothesis about what the initial conditions were is being altered to explain an anomalous 5
observation, a hypothesis is created about what the initial conditions should be to cause a desired condition. I OS(x) Ip Tp Ep P OF(x) A. IAS T WF S F S F I OS(x) T P OF(x) B. I' T' O S F S F Figure 1. Problem and transition states in: (A) the explanation of pathological condition Ep by theory Tp and initial conditions Ip; and inference to the initial conditions caused by structural properties AS of an appropriate drug; (B) the prediction of the effects of prototype drug x and the modification of theory T about a biological system, and initial conditions I OS(x) to explain observation O. S is the set of relevant structural, F the set of relevant funtional properties of a biological system, and S F are the functional consequences of a systems structure. Of course it would be ideal to infer within the context of known T the suggestion for a drug that causes only WF. Therefore we need a gradual evaluation criterion for improvement of suggestions. Let's say that the moderated design goal is to find the suggestion I OS(x) for a drug x such that its predicted operational profile P OF(x) resembles the desired conditions WF more than the pathological condition Ep, i.e. that: P OF(x) WF is a subset of Ep WF That is, the drug should at least cause one desired condition in WF, or remove one pathological condition in Ep, and ideally have no side-effects outside Ep WF. The evaluation of improvement of more than one drug suggestion can follow the same lines as in the above section. 6
The resulting suggestion OS(x) for a drug can on its turn be used to test the theories used to find the suggestion. If a prototype drug x is created with an operational structure OS(x) that is as close to AS as possible, an experiment can be done and its resulting observation O can be compared with the predicted P OF(x). A discrepancy can be used to employ Hypgene to redesign T and initial conditions I OS(x) surrounding the drug (see figure 1B). 5. Parkinson's disease In this last section a simplified example concerning Parkinson's disease is outlined. The disease profile of Parkinson's Disease includes a shortage of the neurotransmitter dopamine in the basal ganglia of the brain (Kandel, Schwartz & Jessel, 1991; Timmerman, 1992). In the 1960's Birkmayer and Hornykiewics reasoned that it might possibly be helpful to restore dopamine levels to normal via a drug. Providing dopamine as an intravenous drug was not effective because it can not pass the blood-brain barrier. So including the wished-for functionality of the drug is a compound that passes the blood-brain barrier and increases dopamine levels. Consider the following simplified metabolic pathways in the synaptic terminal of the dopaminergic nerve cell. In I, E > O, input compound I is converted by enzyme E to output compound O, T includes two rules: p 1 : L-Dopa, AADC > dopamine P2: dopamine, MOA-B > dopac The pathological conditions Ep includes a decreased dopamine production. The design goal includes a situation where dopamine levels are increased. So we set the goal to increase the quantity of dopamine. Now a Hypgene operator from the type 'Modify initial conditions to increase quantity' can suggest, based op p 1, that L-DOPA should be available in higher quantities. L-Dopa turns out to pass the blood-brain barrier so this seems a valid drug 7
suggestion. It is in fact the first successful approach to decrease Parkinson symptoms. However the enzyme AADC is also present in other parts of the body, so as a consequence dopamine production in those parts also increases, causing unwanted side-effects. So this suggestion causes an unwanted response not in WF but also not part of the original pathological condition. So the next suggestion Hypgene could propose would be a compound which inhibits AADC and does not pass the blood-brain barrier. This could be suggested by an operator of type 'Modify initial conditions to decrease quantity'. The same operator could Hypgene employ on p 2 to suggest an inhibition of MOA-B to decrease the consumption of dopamine, and hence increase its level. Ongoing work on this model should achieve a detailed insight in the methodology of rational drug design research in practice, and could possibly also be used to aid that research in the future. 6. References Kandel, E.R., Schwartz, J.H. & Jessel, T.M. (Ed.). (1991). Principles of Neural Science (3rd ed.). New Jersey: Prentice Hall. Karp, P.D. (1992). Hypothesis Formation as Design. In J. Shrager & P. Langley (Eds.), Computational Methods of Scientific Discovery and Theory Formation (pp. 275-317). Palo Alto: Morgan Kaufmann Publishers, Inc. Karp, P.D. (1993). Design Methods for Scientific Hypothesis Formation and Their Application to Molecular Biology. Machine Learning, 12, pp. 89-116. Kuipers, T.A.F., Vos, R. & Sie, H. (1992). Design Research Programs and the Logic of their Development. Erkenntnis(37), 37-63. Timmerman, W. (1992) Dopaminergic receptor agents and the basal ganglia : pharmacological properties and interactions with the GABA-ergic system. PhDthesis, Groningen University. Vos, R. (1991). Drugs looking for diseases. Innovative drug research and the development of the beta blockers and the calcium antagonists. Dordrecht: Kluwer Academic Publishers. 8