Ontology-Based Diagnosis and Personalization of Medical Knowledge

Departament d Enginyeria Informàtica i M atemàtiques Ontology-Based Diagnosis and Personalization of Medical Knowledge Author: Cristina Romero Tris Director: David Riaño

Cristina Romero Tris cristina.romero@urv.cat Enginyeria Tècnica en Informàtica de Sistemes. Universitat Rovira i Virgili Juny 2009 2

INDEX CHAPTER 1. INTRODUCTION... 4 1.1 GENERAL OBJECTIVES... 5 1.2 CONTEXT OF THE PROJECT... 6 1.3 ORGANIZATION OF THE DOCUMENT... 7 CHAPTER 2. STATE OF THE ART... 8 2.1 MEDICAL CONTEXT... 8 2.2 KNOWLEDGE REPRESENTATION CONTEXT... 9 2.2.1 ONTOLOGY... 11 2.2.2 OWL... 13 2.2.3 PROTÉGÉ... 16 2.2.4 CASE PROFILE ONTOLOGY... 19 2.2.5 JENA... 21 2.2.6 OWL WRAPPER... 21 2.2.7 ONTOLOGY MERGING... 22 CHAPTER 3. SPECIFICATION... 32 3.1 DECISION SUPPORT SYSTEM... 32 3.2 PERSONALIZATION... 34 CHAPTER 4. DEVELOPMENT... 38 4.1 GENERAL METHODOLOGY... 38 4.2 GENERAL CHARACTERISTICS OF THE PROCEDURE... 43 4.2.1 SELECTION OF NESTED CLASSES... 43 4.2.2. SELECTION OF CONFLICTING CLASSES... 46 4.3 THE CAPRAD TOOL... 48 CHAPTER 5. EVALUATIONS... 53 5.1 CAPRAD COMMON USE... 53 5.2 ONTOLOGY TAILORING EVALUATION... 58 5.3 ATYPICAL HEALTH-CARE CASES CONSIDERED... 62 5.3.1 WRONG DIAGNOSIS... 62 5.3.2 COMORBIDITY... 64 5.3.3 MISSING DATA... 65 5.3.4 RELATED DISEASES AND PREVENTION... 67 CHAPTER 6. CONCLUSIONS... 69 6.1 PROJECT OBJECTIVES ACHIEVEMENT... 69 6.2 OBSERVED PROBLEMS... 70 6.3 FUTURE WORK... 71 ACKNOWLEDGEMENTS... 72 APPENDIX... 73 SOURCE CODE STRUCTURE... 73 ALGORITHM SOURCE CODE... 76 3

Chapter 1. INTRODUCTION This project is included within the Medical Informatics field. This area consists of all the computer science technologies and methodologies that are applied in the domain of health to solve medical problems. One of these problems is diagnosis, the subject that this project assists. Diagnosis is the result of a decision-making process made by physicians to identify patients diseases from their signs and symptoms before deciding the corresponding treatment. Depending on the type of patient and the amount of diseases he suffers from, diagnosis can become a difficult process. For that reason, computer science can be helpful to support medical physicians in diagnosing. This project is about designing and developing a knowledge-based application to provide decision support to physicians during the diagnostic process. This application requires formal structures that represent medical knowledge to interact with the physician. Medical domain is characterized by the abundance of existing and constantly growing knowledge. In order to deal with medical knowledge, the application works with ontologies, a specialized knowledge representation paradigm. Ontologies define the terms to describe and represent an area of knowledge together with the relationships between these terms. For this project, an ontology on the medical domain is used: the Case Profile Ontology.

The Case Profile Ontology (CPO) is the knowledge base of the application, and it is both used to extract and to provide information to the physician when he needs it. 1.1 GENERAL OBJECTIVES The main objective of this project is to develop an application to guide physicians in the diagnostic process. The specific objectives of this project are three. Support physicians in the diagnostic process In this project diagnosis is a process that has into consideration several health care concepts as problem assessment, sign and symptom, disease, syndrome, social issue and intervention. In this context, the application is conceived to exploit the knowledge base to help physicians as they are diagnosing a patient. For example, when the physician introduces the set of signs and symptoms that a patient has, the application consults the knowledge base to give the physician a list of all the diseases that are defined by those signs and symptoms. Validate the patient s condition as a whole In this project the patient condition is defined as the set of signs and symptoms, diseases, syndromes, and social issues that are observed for that patient. One of the objectives of the project consists in building a tool to validate the patient condition introduced by the physician. The function of this tool is to check whether there is any medical inconsistency between the introduced patient condition and the theoretical knowledge that the application extracts from the CPO. For example, if the physician selects a disease that, according to the CPO, is not related to any of the signs and symptoms introduced, the application informs the physician of a possible irregularity in the diagnosis. In addition, this objective 5

includes proposing the disease that best fits the introduced signs and symptoms so that the physician can reconsider another diagnosis. Personalize the knowledge in the Case Profile Ontology From the computing point of view, the application is a program that works with ontologies, adapting general knowledge to a particular case. This particular case is defined by the set of problem assessments, signs and symptoms, diseases, syndromes, social issues and interventions of the patient. These concepts describe a medical knowledge which is represented with an ontology. Therefore, while the CPO is a big ontology that contains multiple terms in the health care domain, the output of the application is a sub-ontology of the CPO which only contains the medical terms related to the targeted patient. The process of extracting a sub-ontology with the patient s case knowledge from the CPO will be called Tailoring of the CPO. 1.2 CONTEXT OF THE PROJECT This project is integrated in the European Project K4CARE [1] (IST-2004-026968). K4CARE aims to combine the healthcare and the Information and Communication Technologies (ICT) experiences of several western and eastern EU countries to create, implement, and validate a knowledge-based health care model for the professional assistance to senior patients at home. In modern societies, the care of chronic disabled patients at home involves lifelong treatment under continuous expert supervision that saturates European national health services and increase related costs. K4CARE's main goal has been to design, implement and validate a new ICT knowledge-based Homecare Model by integrating skills, procedures and experiences of several eastern and western European countries as a contribution to 6

the new EU society to manage and respond to the needs of the increasing number of senior population requiring a customized health-care at home. K4CARE is structured in work-packages. Work Package 3 was devoted to provide a patient-case Profile Ontology (CPO) to gather all the medical terminology related to the diagnosis managed in the K4CARE Project. Once this objective was accomplished, the CPO can be used as a tool for new developments. Work Package 5 (WP05) is about the implementation of computer tools for knowledge tailoring. One of the tasks in WP05 is to instantiate the general knowledge in the CPO to the patient that is under treatment by developing intelligent mechanism to merge several classes from the CPO into one sub-ontology representing a complex patient s case. 1.3 ORGANIZATION OF THE DOCUMENT Chapter 2 explains the main concepts and terms used in the project. Chapter 3 focuses on the approach that was followed to achieve the objectives. Chapter 4 details the methodology and procedures to design and build the application. The tests on this application appear on Chapter 5. Finally, chapter 6 specifies the achieved objectives and the most relevant conclusions we have reached. 7

Chapter 2. STATE OF THE ART This section describes the specific concepts which help to understand the subsequent project descriptions. These concepts are classified into two groups: the terms related to medical problems and the terms which describe the means to represent medical knowledge in a computer. 2.1 MEDICAL CONTEXT Medical diagnosis is defined as the identification of a disease by investigating a patient's signs, symptoms and history, which provides a solid basis for the treatment and prognosis of the individual patient [1]. In this process we may distinguish two consecutive phases: the diagnostic procedure and the diagnosis. The diagnostic procedure consists on repeatedly gathering new information about the case to be diagnosed so that the physician can narrow down the list of possible diseases that the patient suffers from. In this part of the diagnosis the physician must work with several health care elements as the available means of assessment of the patient condition, the signs and symptoms, the feasible diseases, syndromes and social issues, as well as with the health care interventions. Once the diagnostic procedure arrives to an acceptable evidence of the patient ailments, the physician determines the condition which constitutes the diagnosis 8 and saves it as a part of the patient's medical record.

In this setting, Medical Informatics is raised with two challenges: one is the construction of efficient computer tools to help physicians to improve the results of diagnostic procedures (i.e., prospective diagnosis); the other one is the construction of computer tools to validate diagnoses (i.e., retrospective diagnosis). The self nature of health care makes that the most promising approaches to these challenges are those which are based on the availability of an explicit representation of medical knowledge that could be interpreted by computers and also used to guide prospective and retrospective diagnoses. 2.2 KNOWLEDGE REPRESENTATION CONTEXT In this section the main computing concepts in the domain of knowledge representation are described. The knowledge representation model of this project is an Ontology called Case Profile Ontology (CPO). This CPO represents know-what knowledge related to common pathologies: not only diseases, social issues and syndromes, but also the sort of interventions to deal with these problems, the signs and symptoms related to these problems and the sort of assessment tests that can be used to evaluate these signs and symptoms. The structure of Figure 1 represents the computer tools that define the framework of this project. This structure is based on an Ontology, and it consists of several layers which contain some relevant tools to work with ontologies. The lowest level is OWL, the language in which the ontologies used in this project are written so that a computer can read and modify them. OWL has a XML format, but there are some tools that allow working without the XML tags such as Protégé [16] (an application to create and edit ontologies) or Jena [18] (a set of libraries to work with OWL ontologies as a Java data structure). Specifically for this project, an application that gathers a subset of functions that Jena offers has been used: the 9

OWL Wrapper. Thanks to these tools it is possible to create Ontology-based applications like the one created for this project, the CAPRAD tool, which has two different tasks: to communicate with the physician as a decision support system to help him during the diagnostic procedure, and to validate the final diagnosis. In addition, the final data obtained after the validation is used to do the tailoring or personalization of a medical ontology to a patient or patient prototype. Figure 1. Ontology tools hierachy 10

Next subsections explain the concept of ontology and the ontology which the application works with, the CPO. There are also descriptions of relevant concepts such as OWL, the Jena libraries, the Wrapper and Protégé. Finally, there is a description of some existing tools that can be helpful to perform the tailoring. 2.2.1 ONTOLOGY Ontologies were created as a solution to the requirements of the Semantic Web. According to Sir Tim Berners-Lee [2], the Semantic Web is an extension of the World Wide Web in which the semantics of information on the web is defined. This makes possible for the web to understand and satisfy the requests of people and machines to use the web content. The intention of the Semantic Web is to provide a language that expresses both data and rules for reasoning about the data. The Semantic Web suggests that a new web concept is needed, so that a better use of all the possibilities offered by the web can be made. A web search engine should not exclusively be able to search, but also to reason. The difficulty lies in finding the language capable of doing it. A solution to this problem is provided by a basic component of the Semantic Web, collections of information called ontologies. In philosophy, an ontology is a theory about the nature of existence, of what types of things exist. But in Artificialintelligence and the Web, researchers use the word ontology to refer a document or file that formally defines the relations among terms. An ontology defines the terms employed to describe and represent an area of knowledge. They are used by people, databases, and applications that need to share information. Ontologies include computer-usable definitions of basic concepts in the domain and the relationships among them. In this way, they make that knowledge reusable. 11

Formally, an ontology can be defined as a structure O= {C,T,R,A,I}, consisting of: C: classes. They represent general concepts in the many domains of interest T: taxonomic relations. They represent inheritance relations (is-a relations) between general concepts (father) and more specific concepts (son). R: associative relations. They represent any other type of semantic association between concepts of the domain. A: attributes. They associate a data value to a concept. Attributes are related to features that describe the concept. I: instances. They are used to represent elements or individuals in the ontology. Each instance must belong to one or more of the concepts of the ontology. Basically, the three main elements which establish an ontology are the classes or general concepts, the relationships that can exist among concepts and the properties that those concepts may have. Figure 2. Example of an ontology using an UML class diagram 12

The Diagram in Figure 2 [3] represents an ontology which contains four classes: Author, Publisher, Title, and Sales, while the lines between classes represent association relationships between classes. Each class has some properties as for example the ISBN, the name, the price or the publication date in the class Title. There are two kinds of properties: object properties and datatype properties. The object properties point at a class of the ontology. The datatype properties do not point at any class, but they have a specific value (a string, an integer, a boolean, etc). For example, if we had a string with the name of a publisher, the ontology of Figure 2 could give us all the information about the authors he published, and also all the titles that each author wrote. Or if we had the title of a book, we could follow the relationships of the ontology to find the biography of the writer. Ontologies are usually expressed in a logic-based language, so that accurate distinctions can be made among the classes, properties, and relations. Some ontology tools can perform automated reasoning using the ontologies, and thus provide advanced services to intelligent applications such as: conceptual search and retrieval [4], decision support [5], natural language understanding [6], knowledge management [7], intelligent databases [8], and electronic commerce [9]. Using ontologies, tomorrow's applications can be "intelligent," in the sense that they can more accurately work at the human conceptual level [10]. 2.2.2 OWL Web Ontology Language (OWL) is a family of knowledge representation languages for defining and instantiating ontologies. It is designed for applications that need to process the content of information instead of just presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML[11] or RDF[12] by providing additional vocabulary along with a formal semantics. 13

OWL has three sublanguages: OWL Lite, OWL DL, and OWL Full[13]. Although OWL is written in XML format, there are several applications that let us work with OWL and ontologies in a more instinctive way. Some of these applications are Protégé, Swoop [14] and Synaptica [15]. For example, Figure 3 and Figure 4 symbolize the same ontology, but while the first one gives us a clear and quick impression of the ontology, the second one cannot be easily read. Figure 3. Graphic view of an ontology example In Figure 4 there is an example of how the ontology can be built in OWL. Some parts of the code have been remarked. The rectangle 1 contains the code to define the class called Money. In the rectangle 2 we can see the definition of the property cost with its range Money and its domain PurchaseableItem. This means that a PurchaseableItem has a cost property that associates it with Money. The rectangle 3 encloses the definition of two more classes: Lens and Camera. These are both defined as subclasses of PurchaseableItem, which entails the creation of the is- a property between them as we can see on Figure 3. 14

<?xml version="1.0" encoding="utf-8"?> <rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns="http://www.xfront.com/owl/ontologies/camera/#" xmlns:camera="http://www.xfront.com/owl/ontologies/camera/#" xml:base="http://www.xfront.com/owl/ontologies/camera/"> <owl:ontology rdf:about=""> <rdfs:comment> Camera OWL Ontology. Author: Roger L. Costello </rdfs:comment> </owl:ontology> 1 <owl:class rdf:id="money"> <rdfs:subclassof rdf:resource="http://www.w3.org/2002/07/owl#thing"/> </owl:class> <owl:class rdf:id="range"> <rdfs:subclassof rdf:resource="http://www.w3.org/2002/07/owl#thing"/> </owl:class> <owl:class rdf:id="purchaseableitem"> <rdfs:subclassof rdf:resource="http://www.w3.org/2002/07/owl#thing"/> </owl:class> 2 <owl:objectproperty rdf:id="cost"> <rdfs:domain rdf:resource="#purchaseableitem"/> <rdfs:range rdf:resource="#money"/> </owl:objectproperty> <owl:class rdf:id="body"> <rdfs:subclassof rdf:resource="#purchaseableitem"/> </owl:class> <owl:class rdf:id="bodywithnonadjustableshutterspeed"> <owl:intersectionof rdf:parsetype="collection"> <owl:class rdf:about="#body"/> <owl:restriction> <owl:onproperty rdf:resource="#shutter-speed"/> <owl:cardinality>0</owl:cardinality> </owl:restriction> </owl:intersectionof> </owl:class> 3 <owl:class rdf:id="lens"> <rdfs:subclassof rdf:resource="#purchaseableitem"/> </owl:class> <ow:class rdf:id="camera"> <rdfs:subclassof rdf:resource="#purchaseableitem"/> </owl:class> </rdf:rdf> Figure 4. Code of an ontology example 15

2.2.3 PROTÉGÉ Protégé is an open-source platform that provides a growing user community with a suite of tools to construct domain models and knowledge-based applications with ontologies. At its core, Protégé implements a set of knowledge-modeling structures and actions that support the creation, visualization, and manipulation of ontologies in various representation formats[16]. Figure 5. Screenshot on the classes tab of the Protégé The Classes Tab that can be seen in Figure 5 is an ontology editor which can be used to define classes, class hierarchies, relationships between classes and properties of these relationships. On the left side we can see the ontology hierarchy, which allows us to know parent-children relationships. It is also important to remark that the ontology, as all the existing ontologies do, has a most general class and, by convention, this class is called Thing. Details of the selected class are shown in the right part of the screen. 16

Figure 6. A screenshot on the properties tab of the Protégé The Properties tab depicted in Figure 6 can be used to edit characteristics of properties in the model. The properties are shown on the left side of the tab. On the right side we can manipulate the properties and do all the operations that can be applied to an ontology. In particular, in Figure 6 is shown that the selected property hasaccommodation is characterized by having Destination as its domain and Accommodation as its range. As an example, we could have Paris as the Destination and Ritz hotel as the Accommodation connected with the hasaccommodation property. 17

Figure 7. A screenshot on the representation tab of the Protégé Similarly to Figure 5, the tab on Figure 7 is used to view the ontology hierarchy, but in a more intuitive way. This feature is not included in the original Protégé installation, but the Protégé community has already contributed numerous extensions of the base platform. One of the most popular of these extensions is the OWLViz, which can be used to visualize OWL ontologies graphically. On Figure 7 we can see the classes represented by an ellipse. The lines between them correspond to the relationship is-a which denotes a father-child connection. Primitive classes (classes that have no equivalent classes) are colored yellow. Defined classes (classes that have at least one equivalent class) are colored orange. 18

2.2.4 CASE PROFILE ONTOLOGY The Case Profile Ontology (CPO) [17] is an OWL compliant ontology developed in the K4CARE Project in order to provide a formal representation of all the health care concepts related to the care of the elder at home (i.e., syndromes, diseases, social issues, signs and symptoms, problem assessments, and interventions) and the relationships and constraints between these concepts. The CPO is one of the components of the K4CARE Knowledge Model. Figure 8. General concepts in the CPO The most important classes and relationships in the CPO are depicted in Figure 8. 1. Syndrome: complex health situation in which a combination of sign and symptoms occurs more frequently than it would be expected on the basis of chance alone and generates a functional decline 2. Disease: In Health care a disease is a physiological or psychological dysfunction 3. Social Issue: matters that can be explained only by factors outside an 19

individual's control and immediate social environment which affect many individuals and a society. Common social issues include poverty, violence, justice, human rights, equality and crime. 4. Sign and Symptom: Symptom is a sensation or change in health function experienced by a patient. Thus, symptoms may be loosely classified as strong, mild or weak. It can be considered as a subjective report as opposed to a sign, which is objective evidence of the presence of a disease or disorder. Symptom may be seen as a physical condition which indicates a particular illness or disorder and it is noticed by the patient, while sign is noticed by the physician or others. 5. Problem Assessment : Problem assessment comprises some aspects that assess the condition of the patient during the first encounter and whenever a reevaluation is required 6. Intervention: action or series of actions undertaken to respond to the needs and problems of the HCP In addition, there are three other classes that, although they do not belong to the group of important classes for the tailoring of the CPO, they appear on it. 7. ICD10: International Statistical Classification of Diseases and Related Health Problems (ICD) provides codes to classify diseases and a wide variety of signs, symptoms, abnormal findings, complaints, social circumstances and external causes of injury or disease. The ICD is designed to promote international comparability in the collection, processing, classification, and presentation of these statistics. 8. ATC: Anatomical Therapeutic Chemical Classification System is used for the classification of drugs. Drugs are divided into different groups according to the organ or system on which they act and/or their therapeutic and chemical characteristics 9. Route of Administration: path by which a drug, fluid, poison or other substance is brought into contact with the body. 20

2.2.5 JENA Jena is a Java framework to build Semantic Web applications. It provides a programmatic environment to extract data from and write to RDF [5] and OWL, and includes a rule-based inference engine [18]. Jena libraries are important in the development of this project because they provide an OWL API to extract data from and write to OWL files. Without the Jena toolkit, we would not be able to work with ontologies written in OWL and an application written in Java. 2.2.6 OWL WRAPPER Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. As a consequence of requiring an easy access to the data stored in OWL that allows high level consultations, an OWL wrapper has been designed and implemented. This tool was authored by Aida Valls, Karina Gibert and Joan Casals. The OWL wrapper has been implemented using the Jena API. Jena provides access to any OWL file, but it does it in a non-intuitive way, because the functions offered by its API are oriented to the OWL file structure, and not oriented to the ontology structure. The OWL Wrapper allows doing an easy and intuitive access to the different classes, properties and restrictions established in OWL ontologies, offering a powerful an easy to use way to access to the different OWL elements of an ontology. The OWL Wrapper allows accessing to an ontology without the necessity of knowing the internal structure of the OWL code, and therefore, the access to the ontology is very intuitive. Moreover, the Wrapper allows the modification of the ontology by using a set of functions which prevent the user to do incorrect modifications and/or creations of elements that can imply inconsistencies inside the ontology structure. 21

2.2.7 ONTOLOGY MERGING In order to help as many people as possible, the more cases considered in the project, the best. A common situation for an ill person is to have more than one disease. Assuming a patient with two diseases, the knowledge about two diseases is extracted from the CPO, and this would result in two subontologies, each one related to one disease. By applying a technique called merging we could get the union of both subontologies in just one, and in this way, achieve the aim of the program: obtaining just one ontology which represented the patient's case. Ontology merging defines the act of bringing together two conceptually divergent ontologies or the instance data associated to two ontologies. This merging process can be performed in a number of ways, manually, semi automatically, or automatically. Manual ontology merging although ideal is extremely labour intensive, and current research attempts to find semi or entirely automated techniques to merge ontologies. These techniques are statistically driven often taking into account similarity of concepts and raw similarity of instances through textual string metrics and semantic knowledge [19]. With the purpose of applying any of these techniques in the project, several existing tools such as PROMPT and SMART (see [20]), Chimaera, FCA-Merge toolset and ONION tools (see [21]), Glue [22], Cupid [23] and OntoMerge, have been studied. PROMPT This tool is a semi automated method implemented as a plug-in for Protégé. The used algorithm appears on Figure 9. First step: to make initial suggestions. The PROMPT plug-in searches similarities of the term names in both ontologies, and it creates a list of possible matches. This list includes merging slots, merging classes, copying a whole class from one ontology, etc. 22

Second step: to select the operation. The user chooses all the that are going to be performed. operations Third step: performing. The PROMPT plug-in performs all the selected operations and all the changes attached to each operation (for example if we decide to merge a class the PROMPT includes the terms but also the slots associated to that class). Make intial suggestions Select the next operation Perform automatic updates Find conflicts Make suggestions Figure 9. the PROMPT algorithm Fourth step: to find conflicts. While merging can appear some conflicts which can be identifiedd by PROMPT: name conflicts, dangling references, redundancy and wrong slot-value restrictions. Fifth step: to make suggestions. The last four steps can be performed iteratively until we get the final ontology. 23

SMART SMART is another plug-in for Protégé very similar to PROMPT. It starts the merging by comparing the class names of both ontologies and identifying the ones with identical name. After that, it proposes that the user merges the pairs of identical named classes or copies the entire classes with unique names. The SMART windoww is composed of panels with different functions: there is a panel which proposes suggestions to merge and explains the reason why they are suggested, another one for the user to create its own merging operations, and other panel at the bottom with a list of identified conflicts and possible solutions SMART also records the original relations on the ontologies and, when we obtain the resulting merged ontology, it tries to restore those relations. The SMART algorithm represented in Figure 10 is very similar to the PROMPT algorithm. Setup: load files, set preferences, etc. Initial suggestions: identical names, synonyms, superclasses for top-level classes in alignment Select operation: choose from suggestion list, create a new operation, etc. Execute operation: perform atuomatic updates, detect conflicts, create suggestions Figure 10. SMART Algorithm 24

First step: to load the two ontologies. Second step: to generate a list of suggestions. For each identical named classes we have the option of merging the classes or removing one of them. For each linguistically similar named classes, establish a link between them. Third step: to select the operations to be performed. Fourth step: to update relations and create new suggestions. Repeat steps three and four until the ontologies are fully merged. Chimaera Chimaera is a tool capable of merging two or more ontologies. The Chimaera tool can deal with different input formats: Ontolingua, KF, Protégé, etc. This tool supports some steps of the merging process such as giving a list of equivalent terms in the source ontologies or making suggestions of the candidates to be merged. Chimaera is different from PROMPT tool because this one does not guide users in the operations performed in each step of the merging. Figure 11. Chimaera structure 25

FCA-Merge toolset The FCA-Merge toolset implements the FCA-Merge algorithm. The FCA-Merge needs two different inputs: on one side the two ontologies to be merged and on the other side a set of documentation on the same domain as the ontologies. This algorithm makes a bottom-up merging design by going through three steps. Figure 12. FCA-Merge steps First step: to get instances and context generation. Instances from the two original ontologies are searched in the set of documents. We get a table for each ontology that relates the concepts of the ontology and the documents where they are found. Second step: to generate the lattice structure. Based on the Stumme y Maedche definition of FCA (formal concept analysis) which works with the tables created on the first step. A node is defined as a combination of columns of the table and it remains in the final structure if and only if it (or its child) is a concept from one of the original ontologies. Otherwise it is pruned and does not appear in the final ontology. Third step: to get the merged ontology. This last step is performed by the user. Considering the last lattice structure, the user must decide if the node really represents a concept or if the node represents a relation. 26

ONION tools The original ontologies are mapped in the integrated ontology to allow interoperability between them. This model creates a library of ontologies from different sources (see Figure 13). Figure 13. Evolution of Ontologies in ONION First step: to create validated sources. Some sources can be selected taking into account their importance in the domain of the ontology. Second step: to do taxonomic analysis. Consider the classifications or the taxonomies extracted from the sources. Third step: to define locally the analysis. Each concept has a natural language definition. Fourth step: to define multi-locally the analysis. The links between the concepts are defined. Fifth step: to build the integrated ontology library. Domain and generic ontologies are put together into one library. 27

Sixth step: to classify the library. Deal with polysemy, homonymy and synonymy. Seventh step: to represent the Ontology. The ontology must be implemented in the language of the library. Eighth step: to formalize text. Ninth step: to adjust concepts. Deal with new concepts, sibling concepts and their problems found before. Tenth step: mapping GLUE Glue uses different techniques to deal with the information, taxonomy structure and data instances of the ontology. Some of these techniques are the Jaccard s coefficient or the relaxation labeling. Figure 14. Development of a GLUE ontology 28

The difference between Glue and other merging tools is that Glue is based on the semantic content of the concepts and slots, not on their syntactic form. CUPID This method is based on the Cupid algorithm which consists of three steps. First step: To compute linguistic similarity of elements pairs Second step: To compute structural similarity of elements pairs Third step: To generate the mapping Linguistic similarity studies the similarity of the names and structural similarity supposes that two nodes are structurally similar if they have similar contexts. OntoMerge OntoMerge is a tool for translating data using one (or more) ontologies into a form using a different ontology. It includes the OntoEngine inference engine and a syntax translator between OWL (DAML+OIL) and Web-PDDL. Comparison of Ontology Merging tools Table 15 includes a comparison of the above tools. The issues to be compared are the used architecture, the type of knowledge they can deal with, whether they result into a merged or a mapped 1 ontology, the use of natural language and the user involvement to perform the process. Tools conforming to mediated mapping and merging architecture such as ONION will probably fail, since they need one or more reference ontologies that may not be always available or may be hard to be constructed. However, point-topoint tools miss the knowledge about structure and domain that a reference 1 Given two ontologies O 1 and O 2, mapping one ontology onto another means that for each entity (concept C, relation R, or instance I) in ontology O 1, we try to find a corresponding entity, which has the same intended meaning, in ontology O 2. 29

ontology can provide to know the semantic similarity relationships between concepts. Bottom-up approaches have a higher grade of precision in their matching techniques than Top-down. The reason is that they are based in instances, which provide a better representation of a concept meaning in a domain. Table 15. Issues concerning existing ontology merging tools Regarding the kind of knowledge employed, all the tools use the same types but one, the ONIONS. This tool incorporates features to do the merging considering the semantic knowledge that concepts have. It is also important to know that some of these tools such as PROMPT and GLUE are not prepared to return a description of the merged ontology in a natural language. This feature is relevant when the results of the merging are read by a person who is not a computer expert; in our case this person would be the physician. Finally, the decision to choose one of these tools can be affected by the degree of user involvement required. The user is usually asked to decide upon merging strategies or to guide the process in case of inconsistency. In this project, the physician cannot be asked to help to do the ontology merging because it is not 30

part of his job and it would also delay his tasks. Consequently, the CUPID would have been the most appropriate tool to use in this project. However, none of these tools were finally used. The reason is that automatic merging tools have not shown that a consistent and automatic merging can be obtained yet. In addition, this project aims not only to merge ontologies but also to extract knowledge during the merging process to support the physician while he is doing diagnostic processes. Then, the final decision was to make a new algorithm based on the ontology-merging idea to obtain the patient s personalized ontology. 31

Chapter 3. SPECIFICATION This section is focused on the description of the observed problems and the decisions that have been made to achieve the objectives presented at the introduction of this project. 3.1 DECISION SUPPORT SYSTEM In order to accomplish the objective of providing support to a physician during the evaluation of a patient s case, it is necessary to construct an application which can answer medical questions on the patient, such as: given a group of signs and symptoms that a patient has, which are the possible diseases he can suffer from? Figure 16. Structure of the CPO 32

The sort of questions that can be answered is given by the information contained in the knowledge base, the Case Profile Ontology (CPO). As Figure 16 shows, the CPO covers several matters: problem assessments, signs and symptoms, social issues, syndromes, diseases, interventions and also some relationships between them. Looking at the social issue, syndrome and disease classes, it is noticed that the three of them have the same relationships within the ontology. Therefore, so as to facilitate the understanding, a new concept called diagnostic which includes diseases, syndromes and social issues is created. Consequently, there are four sections represented in Figure 16 in which the knowledge contained in the CPO can be classified. The four of them are not only the personalization targets that appear in the final patient s ontology, but also terms that help the physician by providing suggestions when he is performing a prospective diagnosis. Figure 17. CPO-Based Prospective Diagnosis 33

CPO-Based Prospective Diagnosis can start with the observation of a set of signs and symptoms (see Figure 17). From these data, the CPO can propose both a feasible diagnosis (i.e., diseases, syndromes and social issues that may affect the patient) and a recommended assessment (i.e., tests on the patient). This information is analyzed by the physician who decides on the convenience of following all the indications, some of them, or none. For example, if asthenia and chest pain are observed for a patient, Chronic Obstructive Pulmonary Disease (COPD) and Heart Failure (HF) will be suggested as possible causes, and anamnesis and consultation will be the recommended assessments. At this point the physician can look for additional signs and symptoms (e.g., arrhythmia) to determine whether it is COPD or HF. The process can also start when a set of diseases, syndromes or social issues are observed (see Figure 17). In this case, the physician is informed of the signs and symptoms the patient should have. If the physician is unaware of some of them, he can discard their appearance in the patient condition. In this process, the physician is also recommended with a set of interventions that the system finds appropriate for this patient. These recommendations may help the physician to decide if the current treatment of the patient is both correct and complete. Prospective Diagnosis can be applied in a continuous loop in which the physician finds out the signs and symptoms and obtain new possible diagnoses which, in their turn, can drive the process to other possible signs and symptoms, and so on. This loop defines a continuous adaptation of the diagnostic procedure. 3.2 PERSONALIZATION In the State of the Art of Chapter 2 we made a study of the existing ontology merging tools. This section explains the reason why the developed tools such as Chimaera, PROMPT, etc. are not suitable to perform the ontology merging we need. 34

The problem with these tools is that they do the merging by applying an algorithm on the two ontologies they want to merge. However, we do not have two ontologies, but one, the CPO. Our purpose is to merge two or more parts of the CPO (for example, we will probably need to merge the two sub-ontologies associated to two different diseases of the CPO). So, if we decided to use the existing merging tools, we would have to make a previous stage to extract the sub-ontologies from the CPO. This would make the whole process more complicated, so to make the application simpler for the physician, these tools will not be used. The final choice was to make a new algorithm, based on the steps proposed by the evaluated merging tools, and whose result is the same as the result obtained using them. Figure 18 represents the CPO, from which we extract two diseases (ontologies 1 and 2). By combining (merging) them, we obtain the result ontology (1+2) which represents the patient s case. Figure 18. Procedure to extract a patient s case Figure 19 modifies Figure 18 by adding a new component. In Figure 18, we extract the sub-ontologies 1 and 2 from the CPO, we merge them and we obtain the ontology 1+2. But instead of that, having the CPO as the starting point, and with no intermediate steps, it is possible to obtain the ontology 1+2, combination of diseases 1 and 2, using the algorithm represented by the broken line. 35

Figure 19. Procedure to obtain the patient s case ontology from the CPO It was decided to create an algorithm adapted to work with medical data and that could be personalized for our own necessities, but with the commitment of arriving at the same 1+2 theoretical ontology. Figure 20. Personalization of Medical Knowledge 36

This created algorithm is represented within the diagram of Figure 20. At the center of the diagram, the resulting personalized ontology appears. This ontology includes the concepts PatientCase and Background to distinguish between the information on the patient and the general health care information that may affect the patient. So, for example, if the patient is diagnosed of HF but arrhythmia (one of the HF signs and symptoms in the CPO) is not observed, then both concepts appear in Background, but PatientCase only refers to HF as a current component of the patient condition, leaving arrhythmia as a future sign that the patient could develop. In Figure 20 during personalization, the physician determines the patient diagnosis (i.e., set of diseases, syndromes, and social issues) or the patient signs and symptoms. This information is used to select the related knowledge in the CPO concerning problem assessment, signs and symptoms, diseases, syndromes, social issues, and interventions. At this point, the physician can confirm or reject part of this knowledge or incorporate new one. All this knowledge will be part of the Background, but only the confirmed and the incorporated knowledge will be part of the PatientCase in the personalized ontology. 37

Chapter 4. DEVELOPMENT This chapter studies in depth the technical issues of the project. First of all, the general methodology to obtain sub-ontologies from the CPO is described. After that, we focus on the complexities found by applying the procedure. Finally, the tool created for the physician to introduce the medical data is explained. 4.1 GENERAL METHODOLOGY The key factor of the application is the way in which the patient s subontology is obtained from the CPO. For that purpose, not only the CPO is needed, but also the decisions that the physician made. Figure 21. Procedure to obtain the subontology 38

There are six main classes in the patient s ontology: Diseases, Social Issues, Syndromes, Signs & Symptoms, Interventions, and Problem Assessments. For each one, the sequence that is followed is represented in Figure 21. The diagram of Figure 21 symbolizes an algorithm in which the inputs are the physician s choices and the CPO. The output is an ontology that contains extracted knowledge from the CPO about the items that the physician selected. Consequently, this procedure is employed once for each of the six main classes to obtain the six sub-ontologies that constitute the final patient s ontology. The six sub-ontologies belong with the six circles of Figure 16, although the sequence that will be followed will only be formed by four steps, which belong with the four subdivisions of Figure 16. For example, with the diseases selected by the physician and the knowledge that the CPO contains, we obtain the diseases sub-ontology. This sub-ontology is an extract of the CPO which only contains terms and relationships that are related to the chosen diseases. In Figure 22 the two possible sequences that can be followed by the physician can be seen. To follow one path or the other depends only on the starting point that the physician chooses to get the patient s case developed. However, choosing the first route or the second one makes no difference from the point of view of the resulting sub-ontology. 39

Figure 22. Sequence to generate the final ontology Both sequences show that the general procedure is completed in four steps. Since the two of them are very similar though changing the order in which the elements are selected, we only detail the first one, in which the physician starts by introducing the diseases, syndromes and social issues. The steps that are followed in the first sequence are illustrated on Figure 24, which represents the changes that the final ontology suffers when adding new components. 1. At the first step, the physician chooses the diseases, social issues and/or syndromes. By applying the blue-box algorithm represented in Figure 21 where the inputs are these choices and the CPO, three different subontologies are obtained: the diseases subontology, the social issues subontology and the syndromes subontology. The three subontologies are hanged under the new class called Diagnostic (Figure 24A) 2. At this step, the program follows the hassignandsymptom and hasintervention relationships in the CPO to extract the signs and symptoms and the interventions related to the diseases, social issues and syndromes that the physician chose at the first step. Considering the offered signs and symptoms and the interventions, the physician makes his choices. The algorithm is applied again and two new subontologies are obtained: the signs and symptoms subontology and the interventions subontology. As it shows Figure 24B, They are both hanged in the 40 general ontology as Diagnostic brothers.

3. By following the isassessedby relationship between the signs and symptoms and the problem assessment, the program shows a list of all the problem assessments related to the signs and symptoms the physician chose at the second step. From that list, the physician selects those which belong to the patient s case. The result of applying the algorithm once again is a problem assessment subontology, which it is also hanged as it shows Figure 24C. 4. At this point the six subontologies associated to the six main classes have been obtained. All of them are put into a group called Background as opposite to the class called PatientCase as it is depicted on Figure 23. Figure 23. Structure of the personalized ontology The PatientCase class is the representation of the patient s condition in one single class. The semantic knowledge in this class is given by the relationships with the diseases, social issues, syndromes, signs and symptoms, interventions and problem assessments of the Background sector. With the PatientCase class settled, the ontology is completed as it shows Figure 24D. 41

A B C D Figure 24. Progress of the final ontology as the algorithm is applied 42

4.2 GENERAL CHARACTERISTICS OF THE PROCEDURE This section details the particularities of the algorithm used in the sequence that was described on the previous section. It includes the algorithm behavior when nested classes are selected, and the method to perform the validation. 4.2.1 SELECTION OF NESTED CLASSES In the CPO all the health care terminology is structured in six main hierarchies of terms: problem assessment, signs and symptoms, diseases, syndrome, social issue, and intervention (see Figure 8). For any two terms t 1, t 2 of the same hierarchy, if they belong to different branches they represent disjoint terms (i.e., t 1 t 2 = ), but if they belong to the same branch then one of them includes the other one (i.e., t 1 t 2 ). For example, as Figure 25 depicts, the hierarchy of interventions contains the disjoint terms Vaccines and Psycholeptics, and the term BenzodiazepineDerivatives which is a kind of psycholeptic, and therefore appears at the same branch (i.e., BenzodiazepineDerivatives Psycholeptics). When two terms t 1, t 2 that belong to the same branch t 1 t 2 ) appear in the description of a patient condition, this means that the patient has not only t 1 but also a t which is a type of t 2 different from t 1 (i.e., t' = t 2 - t 1 ). This t represents a new concept which is not represented by any of the CPO terms. The incorporation of this new term in the hierarchy of the personalized ontology causes this one to be different from the CPO hierarchy. This transformation of hierarchies is explained with the previous example about the interventions BenzodiazepineDerivatives and Psycholeptics in which a class representing all the psycholeptics except Benzodiazepine is introduced in the personalized ontology. 43

Let s imagine a situation in which we have a patient s condition with several diseases that results into a treatment where two of all the interventions are Psycholeptics and Benzodiazepine. In the CPO they appear as it shows Figure 25. Figure 25. Portion of the CPO: the Interventions class and its children. The two classes are related; in fact, Benzodiazepine turns out to be a kind of Psycholeptic. In a different field the ontology with the selected nested classes could be maintained, but in Medicine it is interesting to keep as much information as possible. Therefore, the creation of the personalized ontology remarks that these are different remedies, that there are two interventions with Psycholeptics: the Benzodiazepine and an unknown one. About the last one, we do not know what it is, but we know what it is not: Benzodiazepine. This entails that we need a class to represent the set of all the Psycholeptics excluding the Benzodiazepine. The CPO does not contain one, but we can create it to include it in the resulting patient s ontology. This particular new class will top not only the Psycholeptics children, but also the Benzodiazepine brothers and its 44

brother s children and so on, so recursion 2 must be applied in the algorithm. Thus, that part of the ontology will result as we can see on Figure 26. Figure 26. Representation of our ontology after applying the algorithm to the CPO Figure 26 shows that a new class called Psycholeptics is added. From now on, the patient s treatment is an intervention named Benzodiazepine and an intervention named Psycholeptics. The general idea to perform the algorithm is to make a division in the interventions that the physician has selected: we have a group of interventions which are not related to any of the other intervention and a group with all the intervention who have any of its children among the selected interventions. The cases in the two groups are resolved in two different ways, as it illustrates the pseudo-code included in the Appendix ALGORITHM SOURCE CODE. 2 A computer programming technique involving the use of a procedure, subroutine, function, or algorithm that calls itself one or more times until a specified condition is met at which time the rest of each repetition is processed from the last one called to the first 45

4.2.2. SELECTION OF CONFLICTING CLASSES This subsection is focused on the objective proposed at the start that consisted on building a tool to validate the coherence of the physician s choices as a whole. The specific function of the tool is to compare the selected signs and symptoms that the physician chooses with the knowledge stored in the CPO. If there are relevant inconsistencies between them, the validating tool will inform the physician so that he can proceed consequently. For example, we can think of a physician that is reviewing a diagnosis from a patient s medical history which says that the patient had a specific disease. As it is explained on section 4.1, when he introduces the disease, the set of signs and symptoms related to that disease is shown. However, if the physician ignores the suggestions and change them for new signs and symptoms, we may find that there is another disease that, according to the CPO, is more suitable for the resulting set of signs and symptoms. In this case, the validating tool shows that disease as a proposal of a more appropriate to the CPO diagnosis. The final outcome of the validating tool is a ranking with all the diseases that are likely to be defined by the signs and symptoms introduced by the physician. The difficulty that appears at this point is how to sort the list of diseases. A priori, we can think of sorting the diseases by the number of its related signs and symptoms that match the ones introduced by the physician. With this technique if the physician selects 5 signs and symptoms and 4 of them appear in one specific disease, that disease will be higher on the ranking than another disease that only has 3 signs and symptoms of the group. The problem of this technique is that we are considering that all the signs and symptoms have the same importance, but this supposition is not real. For example, the muscle weakness is a symptom that appears in many diseases, and therefore it contributes with very little information. However, another symptom 46 such as bradycardia is specific for a small group of diseases, so if we find it among

the list of signs and symptoms we will be quite sure that the patient suffers from a disease of that group. From this reasoning we can conclude that the key factor is giving a numerical importance (a weight) to each sign and symptom. The importance will be given by the number of diseases in which the symptom appears. If a symptom is associated to few diseases, it will be very indicative, very relevant; but if it appears in many diseases, the weight of the symptom will be low. For each sign and symptom s in the CPO, we define d s as the number of diseases affected of s (issignof relationship), and w s as the weight of s (see equation 1, where d is the number of diseases in the CPO). The formula is squared to emphasize the difference between weights as it shows Figure 27 when the number of diseases grows. Figure 27. Graph of the curve of the weights 47

With the obtained weights of the signs and symptoms, the probabilities of each disease that will sort our ranking can be calculated. Given a set of signs and symptoms S, their weights are used to calculate the relative weight w i of a disease D i conditioned to S, as equation 2 shows. These weights determine the position of the diseases in the ranking. 2 If the disease that appears at the top of the ranking is the one that the physician selected, it means that according to the CPO the validating tool agrees that it is the disease that the patient suffers from. 4.3 THE CAPRAD TOOL This section describes the means that physicians can use to introduce the information about patients. CAPRAD is an acronym for Case Profile Adapter, a tool that works as an interface for the physicians to be able to introduce the patient s condition information. The main function of CAPRAD is to work as a decision support system, providing physicians with suggestions during the diagnostic process. Simultaneously, the accepted and rejected suggestions or the added items in CAPRAD, are used to build the personalized patient ontology. Therefore, there is a correspondence between the steps followed by the physician in CAPRAD and the steps followed in the personalization explained on section 4.1. 48

Figure 28. First window to choose the initial data Initially, the program allows choosing between two different routes: Diseases, Syndromes and SocialIssues or Signs and Symptoms (Figure 28). The difference between both of them is the starting point. In the first one, the data introduced at the beginning are the patient s diseases, syndromes or social issues. In the second one, the starting point is the signs and symptoms the patient presents, with an aim towards finding out the patient s disease. To choose either of them involves following one of the two paths of Figure 22. In case the physician wanted to follow the first path, on the window in Figure 29 he would choose among all the existing diseases, syndromes and social issues those which belong to the patient case. This is the first step in CAPRAD, which corresponds with the first step represented on Figure 24A. In case the physician wanted to follow the second path, CAPRAD would show a similar screen that would only contain a list with available signs and symptoms. The window in Figure 29 is divided in three different sections: Diseases, Syndromes and Social Issues. Left side includes all the existing items and right side contains the selected ones. The right boxes are empty because the physician is free to choose any Disease, Syndrome and Social Issue without any suggestions from the program. 49

Figure 29. Interface to make the patient s diagnosis The sequence of Figure 22 indicates that after these choices, the physician is given decision support to select the signs & symptoms and the interventions that are related to the Disease, Syndromes and Social Issues. CAPRAD shows the suggestions to the physician using the window depicted on Figure 30. This is the second step in CAPRAD, which corresponds with the second step represented on Figure 24B. 50

Figure 30. Screen with the signs and symptoms and the interventions In the right boxes of Figure 30, the application shows the suggested signs&symptoms and interventions associated to the previously selected disease. The left boxes contain all the existing signs&symptoms and interventions except for those contained in the right boxes. The physician can change the content of the boxes with the Add and Remove buttons. In Figure 31, CAPRAD shows the assessments that are suggested aaccording to the signs and symptoms selected in the previous step and the knowledge from the CPO. 51

Figure 31. Interface to choose the Problem Assessments From all the suggestions that appear on the right box of Figure 31, the physician can accept some and reject others. He can also add available Problem Assessment from the left box that are not proposed by the decision support system. This is the third step in CAPRAD, which corresponds with the third step depicted on Figure 24C. Finally, the validating tool builds the ranking of diseases that are likely to be suffered by the patient. These are offered to the physician, who can omit the suggestions or can go back to the start to change his diagnosis if he considers that the suggestions are relevant. When the physician finishes the review of the validation, the final ontology is saved to a file named with the current data. He can examine it with some application capable of working with ontologies, for example Protégé. 52

Chapter 5. EVALUATIONS This section evaluates the achievement of the objectives proposed at the start of this project. The three fields considered in this chapter are the employment of CAPRAD as a Decision Support System in a common diagnostic process, the reached efficacy when doing ontology personalization, and the study of atypical health care cases that CAPRAD can help to detect. 5.1 CAPRAD COMMON USE In order to evaluate the tool in a common diagnostic process, it is used a made-up example in which a Doctor Zhivago uses CAPRAD tool with an imaginary patient, Mr. Komarovsky. Doctor Zhivago selects a group of problem assessments to start with the diagnostic and be able to evaluate the signs and symptoms that the patient presents. According to the resulting list of signs and symptoms, Doctor Zhivago comes to the conclusion that Mr. Komarovsky suffers from Arthritis. This time Dr. Zhivago decides to use CAPRAD tool in order to confirm his diagnosis and get some suggestions about the interventions for an accurate treatment. Figure 32 represents the medical data that Dr. Zhivago introduces in CAPRAD. 53

Figure 32. Representation of Mr. Almasy s condition. When CAPRAD starts, Dr. Zhivago selects Diseases Syndromes Social Issues from the window of Figure 28. In the second window, represented in Figure 29, he chooses Arthritis from the available diseases. When CAPRAD tool knows the disease of Mr. Komarovsky, the algorithms and the CPO are used to offer the suggestions of the signs and symptoms and the interventions according to Arthritis. Dr. Zhivago observes that among the suggested signs and symptoms appear not only the symptoms that he detected on Mr. Komarovsky, but also some others that Mr. Komarovsky does not present: SS-09.10-SwollenJoint SS-11.02.01-ThoraxDeformity SS-11.02.02-AbnormalVesicularSounds SS-12.06-AbnormalSedimentionRate SS-12.07.04-AbnormalCRP SS-14.04-FluctuantingCourse SS-16.02-AbnormalIADL These signs and symptoms do not belong to Mr. Komarovsky s condition, and therefore Dr. Zhivago removes them. 54

With regard to the interventions that Dr. Zhivago targeted, CAPRAD suggests those that appear in the CPO as Arthritis treatment: IN-02.01-PatientPositioning IN-03.01-MotorRehabilitation IN-07.05-AssistiveDevice PT-61-AntiInflammatoryAndAntirheumaticProducts PT-62-TopicalProductsForJointAndMuscularPain PT-63-MuscleRelaxants PT-64-AntigoutPreparations PT-65-DrugsForTreatmentOfBoneDiseases PT-66-OtherDrugsForDisordersOfTheMusculoSkeletalSystem PT-68-Analgesics After considering each one of the interventions, Dr. Zhivago agrees that they are good treatments for Mr. Komarovsky. However, he considers that it is not necessary to apply all of them, so he prescribes Anti Inflammatory and Antirheumatic Products and Motor Rehabilitation for Mr. Komarovsky. At the next step, Dr. Zhivago introduces the Problem Assessments that he did to Mr. Komarovsky. Finally, the validation tool that follows the CAPRAD suggestions shows the ranking of diseases that are likely to be suffered considering the introduced signs and symptoms. The percentages of each disease can be seen on Figure 33. 55

Figure 33. Mr Kovarasky s diagnosis validation The validation percentages agree with Dr. Zhivago that Arthritis is the most likely disease to be suffered by Mr. Komarovsky. Mr. Komarovsky s medical case is saved in an ontology. Consequently, Dr. Zhivago can review it later in case Mr. Komarovsky gets worse or the physician wants to add new information about the case. Figure 34 is Protégé view of Mr.Komarovsky s ontology. The ontology is divided in two sections: the Background class and the PatientCase class. The Background section contains the knowledge suggested by CAPRAD and new data incorporated by Dr. Zhivago. The PatientCase class has no children and therefore, it has no is-a relationships with any other class of the ontology. 56

Figure 34. Ontology representation of Mr. Komarovsky s case 57

5.2 ONTOLOGY TAILORING EVALUATION In order to analyze the performance of the ontology personalization, several trials have been made. The general idea of these trials is to know what percentage of the CPO is activated when a disease is introduced (Figure 35). With the activation of the CPO we refer to the number of classes from the CPO that appear on the personalized final ontology. Figure 35. On the left, a scheme of the CPO. On the right, the red ones are the classes from the CPO that will appear in the final ontology, and whose percentages are going to be calculated. To know the percentage of activation of the CPO is useful to evaluate two different aspects. 1. Obtain a SUB-ontology. The trials should let us compare the CPO and the resulting ontology sizes. If the size of the final patient s ontology approached the size of the CPO, the effectiveness of the created application would be dubious. 2. Analyze every disease. The objective of the trials is to let us know which parts of the CPO activate when a disease is introduced. For example, we may have a case in which we introduce a disease that provokes the activation of a small number of classes of the Intervention part of the CPO. This could mean that the disease has a low percentage of interventions which can treat it. Each trial consists in emulating a patient s case by selecting a disease and some of its signs&symptoms, problem assessments and interventions. That process 58

results in the obtaining of the patient s ontology. The trial returns the number of classes of each type (signs&symptoms, interventions and problem assessments) that the ontology contains, so that the percentages can be calculated. The initial hypothesis is that having 14 diseases in the CPO, each disease should activate a 7% (1/14) of the CPO, at least. The cases which exceed that figure will not be considered dangerous cases, because it could be attributed to the sharing of interventions or signs&symptoms or problem assessments by two diseases. However, the cases in which the percentage is much lower than a 7% should be considered cases to study. The percentages have been included in Table 36. 59

60 Table 36. Table with the percentages of the classes in the ontology.

Table 36 contains all the diseases that can be found in the CPO. Those that have a number in brackets are special cases, in which including that disease in the ontology causes the inclusion of other N diseases (as many as the number in brackets). The second column represents the total number of classes that the final patient ontology contains. Knowing that the starting ontology, the CPO, is composed of 1208 classes, it has been possible to calculate the third column. This third column represents the percentage of classes that the final ontology contains from the CPO. Without considering the special cases, the maximum number in the percentage is 12,67% while the minimum is 5,46%, and in the worst case of the special cases is 20,12%. From these data we can conclude that: 1. The program reduces considerably the number of classes in the CPO. This allows us to say that, in that respect, the application created in this project has achieved the expected effectiveness. 2. Each disease should activate a 7% of the CPO. The minimum percentage obtained in the trials has been a 5,46%. As this percentage is not much lower than 7%, we can conclude that the methodology does not leave information out. Next columns evaluate the percentage of signs&symptoms in the patient s ontology with respect to the signs&symptoms in the CPO. And the same procedure is applied to interventions and problem assessments. The obtained percentages lead us to two conclusions. 1. Including one of the diseases does not cause the inclusion of most of the signs&symptoms, interventions or problem assessments (the highest percentages are around the 30%). 2. There are some diseases that activate a low percentage of signs&symptoms/interventions/problem assessments, while others activate a high percentage. For example, we can see that the percentage of interventions of Diabetes is 3,45%, and its percentage of problem assessments is 26,98%. With these numbers we could state that the Diabetes has many ways to be detected, but very few to be treated. 61

5.3 ATYPICAL HEALTH-CARE CASES CONSIDERED In addition to common use of CAPRAD, there are other cases that the application is ready for. These cases were used to assess the capability of this tool to deal with important medical problems as wrong diagnosis, comorbidity detection, missing data, and prevention. 5.3.1 WRONG DIAGNOSIS Wrong diagnosis occurs when the physician diagnoses a patient with an incorrect condition. Case 1: Doctor Lecter diagnoses COPD to Mr. Almasy. According to Figure 17, the system proposes the signs and symptoms related to COPD. The physician confirms some of them (Asthenia, Chest Pain, Decreased Exercise Tolerance, Malaise and Fatigue, Tachycardia, Dyspnea, Tachypnea, Cyanosis, Abnormal XRay Lung, and Fluctuating Course), rejects some others (Hypersomnia, Sleep Apnea, Apnea, Bradypnea, Cough, Hyperventilation, Hypoventilation, Intercostal Retraction, Pleuritic Chest Pain, Stridor, Use Accessory Muscle, Abnormal Thorax Examination, Abnormal Arterial Blood Gas, Abnormal Bacteriological Exams, Abnormal Coagulation, and Abnormal Hemogram) and incorporates new ones (Angina Pectoris, Arrhythmia, Palpitation, Edema, and Abnormal EKG). Dr. Lecter also adds the Problem Assessments and the Interventions, and then Mr. Almasy s case results as Figure 37 shows. 62

Figure 37. Case 1 representation According to the data that he introduced, Figure 38 shows that the tool considers that COPD is not the most probable disease for Mr. Almasy to suffer, but Chronic Ischaemic Heart Disease is. The difference between the percentages of probabilities is not very high, so Dr. Lecter reconsiders his diagnosis and realizes that with the signs and symptoms he observed on Mr. Almasy, Chronic Ischaemic Heart Disease, CPOD or even Heart Failure are suitable diseases. Finally the physician decides to change the original diagnosis to CIHD. Figure 38.Case 1 validation 63

5.3.2 COMORBIDITY Comorbidity is defined as the presence of one or more diseases in addition to a primary disease and the effects of such additional diseases. Case 2: Doctor Watson owns the medical history of Mr. Holmes, who was diagnosed and started treatment for Diabetes months ago. However, Mr. Holmes has got worse and the physician observes some signs and symptoms in the medical history such as Headache and Dizziness that are not related to Diabetes. Consequently, the physician is recommended and runs some assessments which result into new signs and symptoms like Epistaxis and Palpitation. The whole case is represented in Figure 39. Figure 39. Case 2 representation 64

Old and new signs and symptoms are then the basis for the system to calculate the percentages of the diseases. The result of these operations is the list that appears on Figure 40. This is used to inform Dr. Watson that there is evidence of Hypertension as an undetected comorbidity for Mr. Holmes. Finally, Hypertension is confirmed. Figure 40. Case 2 validation 5.3.3 MISSING DATA When a diagnosis has missing data, the condition of the patient is hard to be precisely defined. Case 3: Doctor Who owns few data about Mr. O Connell, the patient. Problem assessments determine Chest Pain, Rigidity, PseudobulbarPalsy, Ischuria, Abrasion, Urticaria, Arthralgia, Bulimia, and AbnormalSerumAnalysis. As it shows Figure 41, Dr. Who does not know the diagnosis, and therefore, the treatment. 65

Figure 41. Case 3 representation The system proposes the ranking of diseases shown in Figure 42. According to it, Arthritis (6.76%), CIHD (5.73%), and Iatrogenic Cognitive Impairment (4.96%) are the most likely diseases, but they have low percentages. Figure 42. Case 3 validation 66

Dr. Who suspects that some signs and symptoms are missing. He decides to select CIHD and the system recommends him to look for and confirm some of the additional signs and symptoms: DecreasedExerciseTolerance, MalaiseAndFatigue, AnginaPectoris, Arrhythmia, etc. The physician confirms Arrhythmia and Abnormal EKG. Certainty on CIHD increases to 46.02%. 5.3.4 RELATED DISEASES AND PREVENTION Not all the existing diseases are independent. There are many diseases whose signs and symptoms, if not treated, can cause a new disease. Sometimes when a patient is diagnosed with a disease, these other related diseases must be considered as likely to be developed by the patient in the future. Case 4: Doctor Jekyll diagnoses patient Mr. Hyde with Anaemia. With the problem assessments he registered the signs and symptoms and the interventions that Figure 43 shows. Figure 43. Case 4 representation 67

The ranking of diseases that is given to Dr. Jekyll is depicted in Figure 44. The system confirms that Anaemia is the most certain disease of Mr. Hyde (53.94%), but it also warns on the alternative diseases CIHD (30.54%) and HF (16.16%). Mr. Hyde does not suffer from these diseases yet. However, Dr. Jekyll considers that with the signs and symptoms of his patient, they may be developed in the future. The physician starts preventive measures to avoid them. Figure 44. Case 4 validation 68