Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Similar documents
Answers to end of chapter questions

Chapter 1. Introduction

SAGE. Nick Beard Vice President, IDX Systems Corp.

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Mipomersen (ISIS ) Page 2 of 1979 Clinical Study Report ISIS CS3

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

PFIZER INC. These results are supplied for informational purpose only. Prescribing decisions should be made based on the approved package insert.

CHAPTER 3 RESEARCH METHODOLOGY

A Simple Pipeline Application for Identifying and Negating SNOMED CT in Free Text

Investigator Qualification Evidence Gathering: Our Approach to Data Collection

HOVON 141 CLL. Version 3, 25JUL2018. Table Required investigations at entry, during treatment and during follow up.

Ontology Development for Type II Diabetes Mellitus Clinical Support System

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Recognizing Scenes by Simulating Implied Social Interaction Networks

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Breast cancer. Risk factors you cannot change include: Treatment Plan Selection. Inferring Transcriptional Module from Breast Cancer Profile Data

MAPS Study MP-10 1 Study Synopsis UK April 18, 2011

PhRMA Clinical Study Synopsis Protocol CTN / (A /A ) 21 August 2006 Final PFIZER INC.

FAST FACTS Eligibility Reviewed and Verified By MD/DO/RN/LPN/CRA Date MD/DO/RN/LPN/CRA Date Consent Version Dated

Selecting a research method

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good

Studying First Line Treatment of Chronic Myeloid Leukemia (CML) in a Real-world Setting (SIMPLICITY)

Causal Knowledge Modeling for Traditional Chinese Medicine using OWL 2

Mining Human-Place Interaction Patterns from Location-Based Social Networks to Enrich Place Categorization Systems

Zainab M. AlQenaei. Dissertation Defense University of Colorado at Boulder Leeds School of Business Operations and Information Management Division

DEMOGRAPHICS PHYSICAL ATTRIBUTES VITAL SIGNS. Protocol: ABC-123 SCREENING. Subject ID. Subject Initials. Visit Date: / / [ YYYY/MM/DD]

SUMMARY. Research hypotheses:

Oncotype DX testing in node-positive disease

Cognitive Maps-Based Student Model

EHR Usability Test Report

PFIZER INC. These results are supplied for informational purposes only. Prescribing decisions should be made based on the approved package insert.

Anamnesis via the Internet - Prospects and Pilot Results

A pragmatic approach to HIV hotspot mapping in a developing country. Thursday, July 13, 2017

ClinicalTrials.gov Protocol and Results Registration System (PRS) Receipt Release Date: 09/30/2015. ClinicalTrials.gov ID: NCT

Taking Laboratory Coding for a Spin. Corrie Alvarez, CPC, CPMA, CPC-I, CEDC

Q Methodology to Measure Physician Satisfaction with Hospital Pathology Laboratory Services at a Midwest Academic Health Center Hospital

Harvard-MIT Division of Health Sciences and Technology HST.952: Computing for Biomedical Scientists. Data and Knowledge Representation Lecture 6

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

CHAPTER 3 RESEARCH METHODOLOGY. In this chapter, research design, data collection, sampling frame and analysis

Classification of normal and abnormal images of lung cancer

Cycle 1-6 (28 Days) Days 15&16 1&2 8. (±2 Days) (±2 Days) (±7 Days) (+7 days) X X X X X X X

Required Syllabus Information all must be included in the course syllabus

How Doctors Choose Medicines when Treating Patients with Type 2 Diabetes

The diagnosis of Chronic Pancreatitis

Clinical decision support (CDS) and Arden Syntax

Education and Training Committee 15 November 2012

Case Studies on High Throughput Gene Expression Data Kun Huang, PhD Raghu Machiraju, PhD

A Method for Analyzing Commonalities in Clinical Trial Target Populations

Pathology Of Bone And Joint Disorders Print And Online Bundle With Clinical And Radiographic Correlation

Keeping Abreast of Breast Imagers: Radiology Pathology Correlation for the Rest of Us

Computer based delineation and follow-up multisite abdominal tumors in longitudinal CT studies

Organ Procurement and Transplantation Network

STANDARDS FOR EXAMINING FRICTION RIDGE IMPRESSIONS AND RESULTING CONCLUSIONS (LATENT/TENPRINT)

REGULATIONS FOR THE POSTGRADUATE DIPLOMA IN MOLECULAR AND DIAGNOSTIC PATHOLOGY (PDipMDPath)

FUSE TECHNICAL REPORT

Individual Study Table Referring to Item of the Submission: Volume: Page:

PFIZER INC. These results are supplied for informational purposes only. Prescribing decisions should be made based on the approved package insert.

Meetings and Presentation Opportunities. Rahul Panesar, MD Associate Professor Division of Critical Care Medicine

English 10 Writing Assessment Results and Analysis

Midterm project (Part 2) Due: Monday, November 5, 2018

Criteria for evaluating transferability of health interventions: a systematic review and thematic synthesis

IAASB Main Agenda (September 2005) Page Agenda Item. Analysis of ISA 330 and Mapping Document

Health Screenings Overview

Detection and Classification of Lung Cancer Using Artificial Neural Network

Positioning the Laboratory to Integrate Clinical Care:

PFIZER INC. THERAPEUTIC AREA AND FDA APPROVED INDICATIONS: See USPI.

The clinical trial information provided in this public disclosure synopsis is supplied for informational purposes only.

PFIZER INC. These results are supplied for informational purposes only. Prescribing decisions should be made based on the approved package insert.

Building a Diseases Symptoms Ontology for Medical Diagnosis: An Integrative Approach

Bristol-Myers Squibb

Empirical Formula for Creating Error Bars for the Method of Paired Comparison

The use of Topic Modeling to Analyze Open-Ended Survey Items

Explanation-Boosted Question Selection in Conversational CBR

CHAPTER VI RESEARCH METHODOLOGY

Letter of Amendment # 3 to:

Semantic Alignment between ICD-11 and SNOMED-CT. By Marcie Wright RHIA, CHDA, CCS

Foundations of AI. 10. Knowledge Representation: Modeling with Logic. Concepts, Actions, Time, & All the Rest

Quantitative survey methods

Subject ID: I N D # # U A * Consent Date: Day Month Year

Measuring Focused Attention Using Fixation Inner-Density

Survey Research Centre. An Introduction to Survey Research

CASPER COLLEGE MLTK 1500 HI Clinical Hematology and Hemostasis. Lecture Hours: 2 Lab Hours: 4 Credit Hours: 3

Sonography. 1. Introduction. 2. Documentation of Compliance. 3. Didactic Competency Requirements. 4. Clinical Competency Requirements

The Exploration by Means of Repertory Grids of Semantic Differences Among Names for Office Documents.

CHAPTER 6 DESIGN AND ARCHITECTURE OF REAL TIME WEB-CENTRIC TELEHEALTH DIABETES DIAGNOSIS EXPERT SYSTEM

Saudi Health Interview Survey Results. in collaboration with

Piloting Treatment with IGF-1 in Phelan-McDermid Syndrome

To What Extent Can the Recognition of Unfamiliar Faces be Accounted for by the Direct Output of Simple Cells?

vertaplan the spine surgeon s software vertaplan System for successful reconstruction of the individual sagittal balance

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Principal Investigator: Robert J. Jones, MD, Beatson Cancer Center, 1053 Great Western Road, Glasgow; United Kingdom

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

Innovative Risk and Quality Solutions for Value-Based Care. Company Overview

T. R. Golub, D. K. Slonim & Others 1999

Sponsor / Company: Sanofi Drug substance(s): Insulin Glargine. Study Identifiers: NCT

Health Screening for Nanyang Technological University (NTU)

Sponsor: Sanofi Drug substance(s): SAR342434

POPULATION TRACKER - DREAMED USER GUIDE

Transcription:

Pilot Study: Clinical Trial Task Ontology Development Introduction A prototype ontology of common participant-oriented clinical research tasks and events was developed using a multi-step process as summarized in this section and illustrated in 1. Figure 1: Overview of the workflow used to create an ontology of common clinical trial tasks and events. The figure illustrates the knowledge sources, processes, and resulting intermediate and final knowledge collections used or created during this workflow. Methodology Step 1. A convenience sample of Phase I-III therapeutic clinical trial protocol documents targeting multiple treatment areas were drawn from both the Columbia University

Clinical Trials Network (CTN) and Chronic Lymphocytic Leukemia Research Consortium (CLLRC). These protocol documents were manually evaluated, and those tasks or events that satisfied the following two criteria were abstracted: i. The task or event was participant-centric (i.e., described an activity which pertained or was applied to the research participant), and would yield one or more elements of either quantitative or qualitative data. ii. The task or event occurred during the active phase of the protocol, which was defined as the time frame after initial screening and eligibility assessment (i.e., after the completion of all therapeutic interventions), but prior to long-term follow-up. Whenever possible, these tasks and events were abstracted from temporal grids found in the protocol documents. Step 2. Each event or task abstracted from the protocol documents was mapped to a unique UMLS concept using the free-text search engine available via the UMLS Knowledge Source (UMLSKS) server (http://umlsks.nlm.nih.gov). For those tasks or events that did not result in an exact match using this approach, one of the following strategies was used to assign an adequate UMLS concept: i. For compound concepts (e.g., height and weight measurement ), the text was decomposed to the smallest semantically significant units (e.g., height, weight ) and the free-text matching algorithm was then applied to each component. The UMLS concepts found via this process were then subject to post-coordination.

ii. Any possible synonyms provided via the free-text matching algorithm were explored, and if a suitable semantic match was found, that concept was selected for the purposes of generating a matching UMLS concept. The resulting collection of unique UMLS concepts, identified by Concept Unique Identifiers (CUI s), with associated occurrence frequencies at the protocol, treatment and aggregate levels were recorded for later use and analysis. Step 3. A composite support metric [1] (S A ) for each unique UMLS concept was calculated based upon a summation of support statistics incorporating the protocol, treatment and aggregate level occurrence frequencies (Equation 1). S A = n + p t n t + g p t g Equation 1: Composite support (S A ) for a given concept (A), where n is the total number of occurrences of concept A, t n is the total number of all concept occurrences, p is the number of protocols in which concept A occurs, t p is the total number of protocols, g is the number of treatment groups is which concept A occurs, and t g is the total number of treatment groups. Step 4. The unique concepts in the corpus were rank-ordered in descending order based upon the their respective S A values. Those concepts that composed 95% of the total number of instances given the preceding rank order were selected for subsequent inclusion in a prototype clinical trial task ontology. Step 5. Five subjects with backgrounds in the conduct of clinical research (e.g., physicians, nurses, study coordinators/managers) were recruited from the Columbia

University Medical Center. They each performed an all-in-one card sort of the selected concepts using a Web-based application (www.websort.net) where they were able to view a list of the concepts and place those concepts into groups based upon the similarity of their meanings or any other sorter-selected criteria. The subjects were also asked to provide descriptive names for each group created during the sorting process. Step 6. The results of the card sort were represented using a symmetric agreement matrix where each cell was assigned a numerical score indicating the number of sorters who placed the two concepts indicated by the column and row indices together in a group. The agreement matrix was then analyzed using several techniques, as described in Steps 7-9. Step 7. Agreement statistics were calculated to determine how many sorters agreed on each possible pair-wise grouping of a concept with all remaining concepts. Step 8. Hierarchical cluster analysis was performed, using an average linkage algorithm as implemented in the JMP 5.0.1a statistics package [2] to generate consensus clusters of the sorted concepts. Step 9. Thematic analysis was performed to assess the high-level group names assigned to the concepts that comprised each consensus cluster. To enable this analysis, each group name assigned by the sorters was manually mapped to a semantically similar UMLS concept (using the same method as described in Step 2), thus providing a

consistent nomenclature across sorters. The results of this thematic analysis were then used to organize the subsumed concepts into a basic ontology using parent-child relationships. Results Task and Event Abstraction As described earlier, a convenience sample of 32 Phase I-III therapeutic protocols were selected from the library of protocols currently available within the Columbia University Clinical Trials Network (CTN) and the Chronic Lymphocytic Leukemia Research Consortium (CLLRC). The source organizations of these protocols were masked to prevent the disclosure of any proprietary sponsor information regarding therapeutic agent names. The protocols could be generally classified according to one of six major treatment groups: Endocrine, Gastrointestinal (GI), Hypertension, Neurology, Oncology and Vascular Disease. The distribution of protocols by treatment group is shown in Figure 2.

Hypertension (n=1) 3% Vascular (n=2) 6% Endocrine (n=3) 9% Oncology (n=17) Neurology (n=3) 9% GI (n=6) 19% Figure 2: Distribution of protocols used to develop clinical trial task ontology by treatment group. From the 32 selected protocols, a total of 522 task or event instances satisfying the criteria enumerated earlier, were abstracted and mapped to UMLS concepts. This process yielded 93 unique concepts identified by UMLS CUI s, and a corresponding set of concept instances indicating how often that unique concept occurred within the overall set of 522 tasks or events. A composite support metric (S A ) (Equation 1) was calculated for each unique concept, and the resulting values fell between 0.19 and 2, with an average of 0.46 ± 0.4. The concepts were then arranged in a descending rank order according to the value of the corresponding composite support metric. A threshold value for S A was set by selecting the point in the rank ordering at which 95% of the initial 522 task or event instances were represented. This threshold was found to be 0.19 (Figure 3).

2 1.8 S A = 0.19 100.00% 90.00% 1.6 80.00% 1.4 1.2 1 0.8 0.6 95% 70.00% 60.00% 50.00% 40.00% 30.00% Percent of Total Concept Instances 0.4 20.00% 0.2 10.00% 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 Number of Concepts 0.00% Composite Support Percent of Total Concept Instances Figure 3: Distribution of composite support metric (S A ) by unique UMLS concept versus contribution to overall number of task and event instances. Those unique UMLS concepts with a support metric equal to or greater than the threshold value were selected for subsequent inclusion in the prototype clinical trial task ontology, resulting in a set of 67 concepts (Table 1). Table 1: Concepts selected for use in prototype clinical trial task and event ontology. UMLS Concept Name Composite Support Metric % of Total Concept Instances (n = 522) Clinical Examination 2.00 6.47 Medical History 1.92 8.24 Electrocardiogram 1.50 2.94 Hematology 1.33 3.92 Blood Chemical Analysis 1.30 4.51

UMLS Concept Name Composite Support Metric % of Total Concept Instances (n = 522) Adverse Effects 1.27 2.94 Inclusion and Exclusion 1.26 2.55 Dispensing Medication 1.17 2.16 Obtain or Verify Patient s Informed Consent 1.10 2.75 Laboratory Procedures 1.09 5.29 Pregnancy Tests 1.03 2.16 Vital Signs 1.00 2.16 Urinalysis 0.97 1.76 Patient Outcome Assessment 0.94 2.35 Drug Compliance Checked 0.94 1.96 Blood Pressure Determination 0.80 1.18 Biological Markers 0.80 0.78 Demographics 0.73 1.37 Random Allocation 0.70 1.37 Blood Specimen Collection 0.60 0.59 Pulse Rate 0.60 0.59 Body Weight 0.57 1.37 Bone Marrow Biopsy 0.56 2.35 Assessment Procedure 0.54 2.35 Radiography, Thoracic 0.53 1.18 Drug Kinetics 0.47 1.96 Genotype Determination 0.47 0.78 Glycosylated Hemogolobin A 0.43 0.78 Beta-2-microglobulin Measurement 0.43 1.57 Partial Thromboplastin Time 0.40 0.78 Flow Cytometry 0.40 1.57 Screening Procedure 0.40 0.39 Intravascular HCV RNA, QUAL 0.37 1.37

UMLS Concept Name ASSAY (PCR) Test Composite Support Metric % of Total Concept Instances (n = 522) Biopsy of Liver 0.37 1.18 Immunoglobulin Measurement 0.37 1.18 Questionnaires 0.31 2.35 Therapeutic Procedure 0.31 1.37 Measurement 0.30 0.98 Lymphocyte Marker 0.30 0.78 Ophthalmic Examination and Evaluation Thyroid Stimulating Hormone Measurement 0.30 0.78 0.30 0.78 Status 0.30 0.78 Polymerase Chain Reaction 0.27 0.78 Cytomegalovirus 0.27 0.59 Neoplasms 0.27 0.59 Alpha One Fetoprotein Measurement 0.27 0.59 Test, Lipids Profile 0.24 0.98 Glucose Measurement, Fasting 0.23 0.59 Phospholipids 0.23 0.39 Quality of Life 0.23 0.39 X-Ray Computed Tomography 0.23 0.39 Urine Specimen Collection 0.23 0.39 Creatinine Measurement 0.23 0.39 Electrolytes Measurement 0.23 0.39 Early Morning Urine Sample 0.23 0.39 Waist Circumference 0.23 0.39 Registration Procedure 0.23 0.39 Diagnostic Radiologic Examination and Procedures 0.20 0.59 Phlebotomy 0.20 0.39

UMLS Concept Name Composite Support Metric % of Total Concept Instances (n = 522) Blood Coagulation Tests 0.20 0.20 Diet 0.20 0.20 Echocardiography 0.20 0.20 Endoscopy 0.20 0.20 Insulin 0.20 0.20 Leukapheresis 0.20 0.20 Lymph Nodes 0.20 0.20 Oral Glucose Tolerance Test 0.20 0.20 Card Sorting After the selection of the preceding concept set, five subjects were recruited to participate in a card sorting study using those concepts. The four female and one male subjects ranged in age from 30 to 57 years old (average age = 42). All but one of the subjects had a graduate level education within the areas of nursing, physiology or public health. All of the subjects had significant experience in the area of clinical research (on average 18 years), serving as either research staff (e.g., study coordinator, data manager, research nurse) or clinical investigators. Using a three-category scale consisting of novice, occasional and expert, four subjects identified themselves as expert computer users, and the remaining subject was self-identified as an occasional computer user. These participants conducted an all-in-one card sort of the concept set, creating groups of concepts they considered to be similar and providing names for those groups. This card sort was performed using the Websort [3] Web application (Figure 4).

Figure 4: Example of the Websort web application interface, used to perform an all-in-one card sort of clinical trial task and event concepts. The participants in the card sorting study created 47 unique groups, ranging in size from 1 to 33 concepts, with an average size of 6.9 ± 7 (Figure 5). The observed aggregate agreement was 79.9 ± 13.6 %. In comparison, the predicted aggregate agreement generated using the Simulated Agreement Matrix (SAM) application, was found to be 5.2 ± 9.9 %. The average magnitude of difference between the observed and predicted aggregate agreement was 5.2 standard deviations (Figure 6).

Distribution of Group Sizes 35 30 25 20 15 10 5 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 Number of Groups Figure 5: Distribution of group sizes created by sorters in Sub-Study One. Observed Versus Predicted Aggregate Agreement in Sub-Study 1 100.00% 90.00% Observed Agreement 80.00% 70.00% 60.00% 50.00% 40.00% Average Difference = 5.2 SD 30.00% 20.00% 10.00% 0.00% Predicted Agreement 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 Number of Concepts Figure 6: Observed versus predicted aggregate agreement in Sub-Study One.

Cluster Analysis Cluster analysis was performed using a hierarchical average linkage algorithm as implemented in the JMP 5.0.1a statistics package [2]. 26 consensus clusters were generated, with an average size of 3.5 ± 2.7 concepts (Figure 7). The average Euclidean distance between members of the consensus clusters was 3.89 ± 2.44, with a range of distances between 0 and 7.34 (Figure 8). Distribution of Cluster Sizes 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Number of Clusters Figure 7: Distribution of "consensus cluster" sizes in Sub-Study One.

Distribution of Cluster Distances 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Number of Clusters Figure 8: Distribution of the average inter-member distances in the "consensus clusters" generated during Sub-Study One. Thematic Analysis A thematic analysis of the consensus clusters of concepts in conjunction with the group names assigned to those concepts by the multiple sorters was undertaken in order to develop an organizing taxonomy of the selected concepts. On average, each consensus cluster had 7.42 ± 5.94 thematically unique group names associated with it. When those group names given to a consensus cluster two or more times were selected, there were on average 1.57 ± 1 unique group names per cluster. Several examples of these types of consensus clusters and the thematically unique group names associated with them are provided below (Table 2).

Table 2: Example "consensus clusters" and associated thematically unique group names from Sub-Study One. Concepts in Consensus Cluster Avg Euclidean Distance Between Concepts Thematically Unique Group Names Creatinine Measurement Electrolyte Measurement Glucose Measurement, Fasting Test, Lipids Profile 0.73 Laboratory Procedures Random Allocation Screening Procedure 1.39 Research Administrative Procedure Screening Procedure Pulse Rate Vital Signs Waist Circumference 2.12 Measurement Endoscopy Ophthalmic Examination and Evaluation 5.63 Procedures For the purposes of the ontology construction phase of this analysis, only the seven most frequently occurring thematically unique group names, corresponding to the average number of themes per consensus cluster, were selected for use in organizing the concepts from the initial selected set. These group names are enumerated in Table 3.

Table 3: Seven most common thematically unique group names and number of occurrences associated with "consensus clusters" generated during Sub-Study One. Theme Name Occurrences % of Consensus Clusters (n = 26) Subsumed Concepts % of Initial Concept Set (n = 67 ) Laboratory Procedures 11 (42%) 44 (66%) Research Administrative Procedures 10 (38%) 28 (42%) Procedures 8 (31%) 35 (52%) Screening Procedure 5 (19%) 15 (22%) Diagnostic Radiologic Examination 3 (12%) 10 (15%) Measurement 3 (12%) 15 (22%) Specimen 1 (4%) 10 (15%) Ontology Construction Given the preceding consensus clusters and associated thematic analysis results, a simple prototype ontology was constructed using parent-child relationships. These relationships were instantiated by assigning the seven unique group name concepts selected via thematic analysis the role of parents, and all of the other subsumed concepts from the initial concept set the role of children. Multiple-hierarchies, cases where a child has more than one parent, were allowed in this ontology. The resulting ontology construct was formalized using an ancestor-descendant table (Appendix J), and visualized using the open-source GraphViz application [4] (Figures 9 and 10).

Figure 9: Visualization of prototype clinical trial task and event ontology, showing concepts with the parents Measurement and/or Procedures.

Figure 10: Visualization of prototype clinical trial task and event ontology, showing concepts with the parents Research Administrative Procedures and/or Screening Procedure.

References 1. Han, J. and M. Kamber, Data mining: concepts and techniques. 2001, San Diego: Academic Press. 550. 2. SAS, JMP. 1999, SAS. 3. Wood, L.E., WebSort. 2006. 4. Low, G., graphviz. 2006, pixelglow.