Dmitriy Fradkin. Ask.com
|
|
- Mildred Phillips
- 6 years ago
- Views:
Transcription
1 Using cluster analysis to determine the influence of demographic features on medical status of lung cancer patients Dmitriy Fradkin Askcom Joint work with Dona Schneider (Bloustein School of Planning and Public Policy, Rutgers) and Ilya Muchnik (DIMACS)
2 2 Overview Epidemiology is interested in disease patterns in populations One specific approach is to look for effects of demographic features on the medical characteristics We describe how cluster analysis can be used to discover such effects/relations This is illustrated with an example analysis of Lung Cancer Data from SEER Previously, we have looked for significant variables in this data by analyzing survival prediction models (DIMACS Technical Report ) Here we look for groups of demographic variables that are indicative of certain values for medical variables
3 3 Plan Our Approach Example: Lung Cancer Survival Data Data Preparation Cluster Analysis Analysis of Results Validation
4 4 Our Method 1 Identify a set of demographic features to be used in cluster analysis; keep the medical features for analysis 2 Perform cluster analysis in the space demographic features, obtaining a partition into k clusters 3 Compute statistics (mean, st dev) on the distribution of both demographic and medical features in the clusters 4 Examine the distributions of the medical features for significant differences or similarities across the clusters, and for interactions with demographic features 5 Validate the observed relations using a hold-out set
5 Potential Things of Interest 5 Focusing on medical features, look for: Features whose distribution is different across (almost) all clusters Features whose distribution is the same in (almost) all clusters Clusters that are not separated by any features Clusters that are separable by (almost) all features Clusters that are separable from (almost) all others by a subset of features
6 Lung Cancer Data Analysis 6 Pre-processing Extracting raw data Constructing features Handling missing data Applying cluster analysis Analysis of the Results Validation
7 7 About SEER Data The Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute ( is an authoritative source of information about cancer incidence and survival in the United States Records are stored in rows of fixed width (166 characters), containing 77 fields of fixed length Each patient is uniquely identified by the combination of SEER registry and case number fields (Sometimes there are multiple records for a patient) The SEER database has evolved over time and therefore certain kinds of information available in recent years are not present in older records The year 1988 seems particularly significant, with the introduction of several new fields (such as extent of the disease) and of detailed schemes for several other fields Information for each patient can be partitioned into two sets: demographic and medical
8 Constructing Features 8 The fields in a SEER record can be grouped into 3 types: categorical: m possible values can be represented by m binary variables where x i has value 1 only if the i-th category occurred in the field ordinal: the values in these fields can be ordered but there is no distance function defined An ordinal variable v taking values {1,, m} can be represented by an m-tuple of binary variables v i,i = 1,, m: v i = 1 v i (1) numeric (age): can be partitioned into m intervals and treated as an ordinal variable with m levels
9 Missing Value Analysis 9 If the value of a feature is missing in more than 25% of cases, it is removed If a feature has the same value in 95% or more of cases where the value is not missing, it is removed (constant feature) Those cases that are missing more than 25% of the feature values on the remaining features are removed as well After this processing, 45 features (23 demographic and 22 medical) and 217,558 cases are left Partitioned the data into a training set (120,318) and a validation set (97,240), based on the year of diagnosis ( , and )
10 Applying Cluster Analysis 10 We used K-Means [Forgy 1965,McQueen 1967] - minimizes the sum of intra-cluster variances (weighted by cluster sizes) Specified k = 20 The average cluster size is 6,016 The largest and smallest clusters (11 and 16) have 10,735 and 3,086 points respectively Computed mean values for all features (both demographic and medical) in each cluster
11 Performing Comparisons 11 We can make pairwise comparisons between the clusters for each medical feature and look for statistically significant differences using t-test There are = 190 comparisons for each feature, and 4,180 comparisons altogether At significance level α = 01 we would expect 418 significant results by chance (This is a rough calculation, ignoring dependencies between features, and in comparisons based on the same features) We have 1141 significant results
12 12 Focusing on Clusters Only two pairs of clusters do not differ in any medical features: 2 and 8; 12 and 5 For all other pairs of clusters, there is at least 1 medical feature whose distribution is significantly different in these clusters The largest number of differences in medical feature distributions between clusters is 16 (out of 22) - clusters 11 and 19 Cluster 11: Detroit, White, age > 65, born in US, 61% male Cluster 19: Mixed registries, White, born in US, 80% under 55, 65% male
13 Focusing on Features 13 No feature has different distributions in all clusters Some features have the same distribution in all clusters: Laterality of the tumor Extentions code Several features, such as Histology Code 807* and Surgery Performed have different distributions in a lot of clusters: Site specific surgery (code 10 or higher): 122 pairs are different Histology code 807*: 95 pairs are different Surgery performed: 80 pairs are different Radiation Therapy: 79 pairs are different
14 14 Interesting Clusters "Interesting clusters" - those that have a subset of features with distributions that are significantly different from almost all other clusters Cluster 11: The cluster can be characterized as: Detroit, White, age > 65, born in US, 61% male Compared to other clusters: lowest rate of Site-specific surgery (code 10 or higher)
15 Interesting Clusters (cont d) 15 Cluster 13: The cluster can be characterized as: cases from smaller registries, all above 65 years old, born in US, White, female Compared to other clusters: high rate of Surgery performed; relatively low rates of Stage codes 10 or higher, and 20 or higher Cluster 14: The cluster can be characterized as: under 75, White US born, Iowa, mostly male; Born in North Central region Compared to other clusters: highest rate of Stage Code 32 or higher; highest rate of Site-specific surgery (code 10 or higher); highest rate of Extention code Cluster 18: The cluster can be characterized as: Detroit registry, almost all white, older than 65, largely Born in East North Central Compared to other clusters: relatively low rate of site specific surgery; high rates of surgery and radiation therapy after surgery
16 16 Validation of Results Group together points of the validation set based on the nearest cluster center for the training data Perform significance analysis on the validation set and compare results Observations: Average cluster size is 4,862 Cluster 16 is still the smallest (1,919), but 11 is no longer the largest Features such as Laterality and Extention code still have the same distributions across clusters Clusters 2 and 8; 5 and 12; are still not distinguished by any features The largest number of different features is still 16 (again, clusters 11 and 19)
17 Validation of Results (cont ) 17 Cluster 11: there are few significant differences with other clusters now Cluster 13: still significant differences in frequencies of Stage code 10 or higher, and 20 or higher; Surgery performed Cluster 14: The largest rate of Stage code 32 or higher (significantly different from many clusters); but not significantly different (from other clusters) rate of Site-specific surgery Cluster 18: still high rate of Surgery performed, but other characteristics are not significantly different
18 Validation of Results (cont ) 18 Training Validation Registry: Detroit Registry: Iowa Sex: Male Age 65 or greater Age 75 or greater Extention code Surgery was performed Stage code 10 or higher Stage code 31 or higher Stage code 32 or higher
19 19 Summary We proposed and illustrated a method that uses cluster analysis in a feature subspace to find relations between features, which are validated with statistical tests and hold-out data Such approach can be beneficial to epidemiology in finding relations between different types of features, such as demographic and medical ones, studying changes in such relations, and in formulating focused studies
20 Directions for Future Work 20 Consider temporal effects on feature relations: which relations changed (appeared/disappeared), and whether they were "real" An in-depth epidemiological study of one of the "interesting" clusters Experimental work on other epidemiological datasets (other sources of data, other diseases) Experiments on synthetic data with various relations between features
How to interpret scientific & statistical graphs
How to interpret scientific & statistical graphs Theresa A Scott, MS Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott 1 A brief introduction Graphics:
More informationA Comparison of Breast Cancer Case Attributes by Multiple Primary Rules
A Comparison of Breast Cancer Case Attributes by Multiple Primary Rules Larry F. Ellison (Statistics Canada) Joellyn L. Hotes (NAACCR) NAACCR Annual Meeting June, 2003 Acknowledgements Co-authors: Holly
More informationMedical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?
Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.
More informationNature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.
Supplementary Figure 1 Behavioral training. a, Mazes used for behavioral training. Asterisks indicate reward location. Only some example mazes are shown (for example, right choice and not left choice maze
More informationChapter 1. Introduction
Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a
More informationReliability of Ordination Analyses
Reliability of Ordination Analyses Objectives: Discuss Reliability Define Consistency and Accuracy Discuss Validation Methods Opening Thoughts Inference Space: What is it? Inference space can be defined
More informationAnalysis and Interpretation of Data Part 1
Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying
More informationStatistics: A Brief Overview Part I. Katherine Shaver, M.S. Biostatistician Carilion Clinic
Statistics: A Brief Overview Part I Katherine Shaver, M.S. Biostatistician Carilion Clinic Statistics: A Brief Overview Course Objectives Upon completion of the course, you will be able to: Distinguish
More informationComparing multiple proportions
Comparing multiple proportions February 24, 2017 psych10.stanford.edu Announcements / Action Items Practice and assessment problem sets will be posted today, might be after 5 PM Reminder of OH switch today
More informationAnalysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data
Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................
More informationEstimated Minnesota Cancer Prevalence, January 1, MCSS Epidemiology Report 04:2. April 2004
MCSS Epidemiology Report 04:2 Suggested citation Perkins C, Bushhouse S.. Minnesota Cancer Surveillance System. Minneapolis, MN, http://www.health.state.mn.us/divs/hpcd/ cdee/mcss),. 1 Background Cancer
More informationStatistics and Epidemiology Practice Questions
1. Which of the following is not considered a measure of central tendency? a. Median b. Range c. Mode d. Average 2. Given the following set of values, what is the median? 4 5 9 3 8 3 7 1 5 3 a. 3 b. 5
More informationStandard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC
Standard Scores Richard S. Balkin, Ph.D., LPC-S, NCC 1 Normal Distributions While Best and Kahn (2003) indicated that the normal curve does not actually exist, measures of populations tend to demonstrate
More informationA Comparison of Collaborative Filtering Methods for Medication Reconciliation
A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,
More informationSheila Barron Statistics Outreach Center 2/8/2011
Sheila Barron Statistics Outreach Center 2/8/2011 What is Power? When conducting a research study using a statistical hypothesis test, power is the probability of getting statistical significance when
More informationChapter 13 Cancer of the Female Breast
Lynn A. Gloeckler Ries and Milton P. Eisner INTRODUCTION This study presents survival analyses for female breast cancer based on 302,763 adult cases from the Surveillance, Epidemiology, and End Results
More informationVisualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data
Visualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data Eberechukwu Onukwugha PhD, School of Pharmacy, UMB Margret Bjarnadottir PhD, Smith School of Business, UMCP Shujia
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationMultiple Regression Models
Multiple Regression Models Advantages of multiple regression Parts of a multiple regression model & interpretation Raw score vs. Standardized models Differences between r, b biv, b mult & β mult Steps
More informationMultiple Primary and Histology Site Specific Coding Rules BREAST. FLORIDA CANCER DATA SYSTEM MPH Breast Site Specific Coding Rules
Multiple Primary and Histology Site Specific Coding Rules BREAST 1 Prerequisites 2 Completion of Multiple Primary and Histology General Coding Rules 3 There are many ways to view the Multiple l Primary/Histology
More informationExploratory Quantitative Contrast Set Mining: A Discretization Approach
Exploratory Quantitative Contrast Set Mining: A Discretization Approach Mondelle Simeon and Robert J. Hilderman Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2
More informationColon cancer subtypes from gene expression data
Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et
More informationStatistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.
Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension
More informationComputer based delineation and follow-up multisite abdominal tumors in longitudinal CT studies
Research plan submitted for approval as a PhD thesis Submitted by: Refael Vivanti Supervisor: Professor Leo Joskowicz School of Engineering and Computer Science, The Hebrew University of Jerusalem Computer
More informationA Case-Study of Pancreatic Cancer in Michigan: Challenges in Reconstructing Residential Histories
A Case-Study of Pancreatic Cancer in Michigan: Challenges in Reconstructing Residential Histories 1971 1987 2001 2002 Melissa Slotnick Research Associate BioMedware, Inc. slotnick@umich.edu Background
More informationInvestigating the Reliability of Classroom Observation Protocols: The Case of PLATO. M. Ken Cor Stanford University School of Education.
The Reliability of PLATO Running Head: THE RELIABILTY OF PLATO Investigating the Reliability of Classroom Observation Protocols: The Case of PLATO M. Ken Cor Stanford University School of Education April,
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationMeasuring Focused Attention Using Fixation Inner-Density
Measuring Focused Attention Using Fixation Inner-Density Wen Liu, Mina Shojaeizadeh, Soussan Djamasbi, Andrew C. Trapp User Experience & Decision Making Research Laboratory, Worcester Polytechnic Institute
More informationData Mining. Outlier detection. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Outlier detection Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 17 Table of contents 1 Introduction 2 Outlier
More informationPsy201 Module 3 Study and Assignment Guide. Using Excel to Calculate Descriptive and Inferential Statistics
Psy201 Module 3 Study and Assignment Guide Using Excel to Calculate Descriptive and Inferential Statistics What is Excel? Excel is a spreadsheet program that allows one to enter numerical values or data
More informationIntroduction to Statistical Data Analysis I
Introduction to Statistical Data Analysis I JULY 2011 Afsaneh Yazdani Preface What is Statistics? Preface What is Statistics? Science of: designing studies or experiments, collecting data Summarizing/modeling/analyzing
More informationextraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.
ix Preface The purpose of this text is to show how the investigation of healthcare databases can be used to examine physician decisions to develop evidence-based treatment guidelines that optimize patient
More informationSUPPLEMENTARY INFORMATION. Table 1 Patient characteristics Preoperative. language testing
Categorical Speech Representation in the Human Superior Temporal Gyrus Edward F. Chang, Jochem W. Rieger, Keith D. Johnson, Mitchel S. Berger, Nicholas M. Barbaro, Robert T. Knight SUPPLEMENTARY INFORMATION
More informationAnalysis of Variance: repeated measures
Analysis of Variance: repeated measures Tests for comparing three or more groups or conditions: (a) Nonparametric tests: Independent measures: Kruskal-Wallis. Repeated measures: Friedman s. (b) Parametric
More informationBiostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego
Biostatistics Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego (858) 534-1818 dsilverstein@ucsd.edu Introduction Overview of statistical
More informationA Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs
A Brief (very brief) Overview of Biostatistics Jody Kreiman, PhD Bureau of Glottal Affairs What We ll Cover Fundamentals of measurement Parametric versus nonparametric tests Descriptive versus inferential
More informationPredicting Breast Cancer Survivability Rates
Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer
More informationappstats26.notebook April 17, 2015
Chapter 26 Comparing Counts Objective: Students will interpret chi square as a test of goodness of fit, homogeneity, and independence. Goodness of Fit A test of whether the distribution of counts in one
More informationRelation between Protein Structure, Sequence Homology and Composition of Amino Acids
From: ISMB-95 Proceedings. Copyright 1995, AAAI (www.aaai.org). All rights reserved. Relation between Protein Structure, Sequence Homology and Composition of Amino Acids Eddy Mayoraz RUTCOl~--Rutgers University
More informationANOVA. Thomas Elliott. January 29, 2013
ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the
More informationCommon Statistical Issues in Biomedical Research
Common Statistical Issues in Biomedical Research Howard Cabral, Ph.D., M.P.H. Boston University CTSI Boston University School of Public Health Department of Biostatistics May 15, 2013 1 Overview of Basic
More informationINTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ
INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ OBJECTIVES Definitions Stages of Scientific Knowledge Quantification and Accuracy Types of Medical Data Population and sample Sampling methods DEFINITIONS
More information10. LINEAR REGRESSION AND CORRELATION
1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have
More informationPredictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge
Predictive Diagnosis Clustering to Better Predict Heart Attacks 15.071x The Analytics Edge Heart Attacks Heart attack is a common complication of coronary heart disease resulting from the interruption
More informationTHE IMPACT OF MISSING STAGE AT DIAGNOSIS ON RESULTS OF GEOGRAPHIC RISK OF LATE- STAGE COLORECTAL CANCER
THE IMPACT OF MISSING STAGE AT DIAGNOSIS ON RESULTS OF GEOGRAPHIC RISK OF LATE- STAGE COLORECTAL CANCER Recinda Sherman, MPH, PhD, CTR NAACCR Annual Meeting, Wed June 25, 2014 Outline 2 Background: Missing
More informationEvaluating Classifiers for Disease Gene Discovery
Evaluating Classifiers for Disease Gene Discovery Kino Coursey Lon Turnbull khc0021@unt.edu lt0013@unt.edu Abstract Identification of genes involved in human hereditary disease is an important bioinfomatics
More informationLecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method
Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to
More informationThyroid Cancer in New Mexico
Thyroid Cancer in New Mexico Time Trends and Surgical Treatment Preferences Ihsan Mahdi MD MPH Introduction The incidence of thyroid cancer incidence is increasing more rapidly than any other malignancy
More informationMayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department
Data Mining Techniques to Find Out Heart Diseases: An Overview Mayuri Takore 1, Prof.R.R. Shelke 2 1 ME First Yr. (CSE), 2 Assistant Professor Computer Science & Engg, Department H.V.P.M s COET, Amravati
More informatione.com/watch?v=hz1f yhvojr4 e.com/watch?v=kmy xd6qeass
What you need to know before talking to your statistician about sample size Sharon D. Yeatts, Ph.D. Associate Professor of Biostatistics Data Coordination Unit Department of Public Health Sciences Medical
More informationCHAPTER 8 EVALUATION OF FUNDUS IMAGE ANALYSIS SYSTEM
CHAPTER 8 EVALUATION OF FUNDUS IMAGE ANALYSIS SYSTEM Diabetic retinopathy is very common retinal disease associated with diabetes. Efforts to prevent diabetic retinopathy though have yielded some results;
More informationModelling Spatially Correlated Survival Data for Individuals with Multiple Cancers
Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers Dipak K. Dey, Ulysses Diva and Sudipto Banerjee Department of Statistics University of Connecticut, Storrs. March 16,
More informationSurgical Management of Metastatic Colon Cancer: analysis of the Surveillance, Epidemiology and End Results (SEER) database
Surgical Management of Metastatic Colon Cancer: analysis of the Surveillance, Epidemiology and End Results (SEER) database Hadi Khan, MD 1, Adam J. Olszewski, MD 2 and Ponnandai S. Somasundar, MD 1 1 Department
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationLessons in biostatistics
Lessons in biostatistics The test of independence Mary L. McHugh Department of Nursing, School of Health and Human Services, National University, Aero Court, San Diego, California, USA Corresponding author:
More informationSTATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012
STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements
More informationT. R. Golub, D. K. Slonim & Others 1999
T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have
More informationStepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality
Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,
More informationBusiness Research Methods. Introduction to Data Analysis
Business Research Methods Introduction to Data Analysis Data Analysis Process STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS Introduction Preparation of
More informationSampling and Power Calculations East Asia Regional Impact Evaluation Workshop Seoul, South Korea
Sampling and Power Calculations East Asia Regional Impact Evaluation Workshop Seoul, South Korea Jacobus Cilliers, World Bank Overview What is sampling and why is it necessary? Why does sample size matter?
More informationDHS COMPARATIVE STUDIES
DHS COMPARATIVE STUDIES DHS DEMOGRAPHIC AND HEALTH SURVEYS The Demographic and Health Surveys (DHS) is a 13-year project to assist government and private agencies in developing countries to conduct nationa!
More informationOutlier Analysis. Lijun Zhang
Outlier Analysis Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Extreme Value Analysis Probabilistic Models Clustering for Outlier Detection Distance-Based Outlier Detection Density-Based
More informationIdentifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods
Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract
More informationChapter 02. Basic Research Methodology
Chapter 02 Basic Research Methodology Definition RESEARCH Research is a quest for knowledge through diligent search or investigation or experimentation aimed at the discovery and interpretation of new
More informationMethodological skills
Methodological skills rma linguistics, week 3 Tamás Biró ACLC University of Amsterdam t.s.biro@uva.nl Tamás Biró, UvA 1 Topics today Parameter of the population. Statistic of the sample. Re: descriptive
More informationPropensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)
Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER) Yan Wu Advisor: Robert Pruzek Epidemiology and Biostatistics
More informationThe RoB 2.0 tool (individually randomized, cross-over trials)
The RoB 2.0 tool (individually randomized, cross-over trials) Study design Randomized parallel group trial Cluster-randomized trial Randomized cross-over or other matched design Specify which outcome is
More informationPredicting Sleep Using Consumer Wearable Sensing Devices
Predicting Sleep Using Consumer Wearable Sensing Devices Miguel A. Garcia Department of Computer Science Stanford University Palo Alto, California miguel16@stanford.edu 1 Introduction In contrast to the
More informationWhite Paper Estimating Complex Phenotype Prevalence Using Predictive Models
White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015
More informationFusing Generic Objectness and Visual Saliency for Salient Object Detection
Fusing Generic Objectness and Visual Saliency for Salient Object Detection Yasin KAVAK 06/12/2012 Citation 1: Salient Object Detection: A Benchmark Fusing for Salient Object Detection INDEX (Related Work)
More informationBasic Biostatistics. Chapter 1. Content
Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive
More informationVersion No. 7 Date: July Please send comments or suggestions on this glossary to
Impact Evaluation Glossary Version No. 7 Date: July 2012 Please send comments or suggestions on this glossary to 3ie@3ieimpact.org. Recommended citation: 3ie (2012) 3ie impact evaluation glossary. International
More informationTechnical Specifications
Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically
More informationIdentification of Neuroimaging Biomarkers
Identification of Neuroimaging Biomarkers Dan Goodwin, Tom Bleymaier, Shipra Bhal Advisor: Dr. Amit Etkin M.D./PhD, Stanford Psychiatry Department Abstract We present a supervised learning approach to
More informationSAMPLING ERROI~ IN THE INTEGRATED sysrem FOR SURVEY ANALYSIS (ISSA)
SAMPLING ERROI~ IN THE INTEGRATED sysrem FOR SURVEY ANALYSIS (ISSA) Guillermo Rojas, Alfredo Aliaga, Macro International 8850 Stanford Boulevard Suite 4000, Columbia, MD 21045 I-INTRODUCTION. This paper
More informationModeling Sentiment with Ridge Regression
Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,
More informationUC Berkeley UC Berkeley Previously Published Works
UC Berkeley UC Berkeley Previously Published Works Title Influence of factors unrelated to environmental quality on occupant satisfaction in LEED and non-leed certified buildings Permalink https://escholarship.org/uc/item/52w3025m
More informationMeasurement and meaningfulness in Decision Modeling
Measurement and meaningfulness in Decision Modeling Brice Mayag University Paris Dauphine LAMSADE FRANCE Chapter 2 Brice Mayag (LAMSADE) Measurement theory and meaningfulness Chapter 2 1 / 47 Outline 1
More informationMining Low-Support Discriminative Patterns from Dense and High-Dimensional Data. Technical Report
Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street
More information4/10/2018. SEER EOD and Summary Stage. Overview KCR 2018 SPRING TRAINING. What is SEER EOD? Ambiguous Terminology General Guidelines
SEER EOD and Summary Stage KCR 2018 SPRING TRAINING Overview What is SEER EOD Ambiguous Terminology General Guidelines EOD Primary Tumor EOD Regional Nodes EOD Mets SEER Summary Stage 2018 Site Specific
More informationUnderstanding Statistics for Research Staff!
Statistics for Dummies? Understanding Statistics for Research Staff! Those of us who DO the research, but not the statistics. Rachel Enriquez, RN PhD Epidemiologist Why do we do Clinical Research? Epidemiology
More informationUNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2
UNIVERSITY OF THE FREE STATE DEPARTMENT OF COMPUTER SCIENCE AND INFORMATICS CSIS6813 MODULE TEST 2 DATE: 3 May 2017 MARKS: 75 ASSESSOR: Prof PJ Blignaut MODERATOR: Prof C de Villiers (UP) TIME: 2 hours
More informationValidity and reliability of measurements
Validity and reliability of measurements 2 3 Request: Intention to treat Intention to treat and per protocol dealing with cross-overs (ref Hulley 2013) For example: Patients who did not take/get the medication
More informationSUPPLEMENTARY INFORMATION In format provided by Javier DeFelipe et al. (MARCH 2013)
Supplementary Online Information S2 Analysis of raw data Forty-two out of the 48 experts finished the experiment, and only data from these 42 experts are considered in the remainder of the analysis. We
More informationDEPARTMENT OF EPIDEMIOLOGY 2. BASIC CONCEPTS
DEPARTMENT OF EPIDEMIOLOGY EXIT COMPETENCIES FOR MPH GRADUATES IN GENERAL EPIDEMIOLOGY 1. DEFINITION AND HISTORICAL PERSPECTIVES 1. Historical Trends in the most common causes of death in the United States.
More informationWarfarin Pharmacogenetics Reevaluated Subgroup Analysis Reveals a Likely Underestimation of the Maximum Pharmacogenetic Benefit by Clinical Trials
AJCP /ORIGINAL ARTICLE Warfarin Pharmacogenetics Reevaluated Subgroup Analysis Reveals a Likely Underestimation of the Maximum Pharmacogenetic Benefit by Clinical Trials Gary Stack, MD, PhD, 1,2 and Carleta
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationOutline. Practice. Confounding Variables. Discuss. Observational Studies vs Experiments. Observational Studies vs Experiments
1 2 Outline Finish sampling slides from Tuesday. Study design what do you do with the subjects/units once you select them? (OI Sections 1.4-1.5) Observational studies vs. experiments Descriptive statistics
More informationCHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL
127 CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 6.1 INTRODUCTION Analyzing the human behavior in video sequences is an active field of research for the past few years. The vital applications of this field
More informationChapter 8. Learning Objectives 9/10/2012. Research Principles and Evidence Based Practice
1 Chapter 8 Research Principles and Evidence Based Practice 2 Learning Objectives Explain the importance of EMS research. Distinguish between types of EMS research. Outline 10 steps to perform research
More informationChapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.
Chapter 25 Paired Samples and Blocks Copyright 2010 Pearson Education, Inc. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally related
More informationHelmut Schütz. Satellite Short Course Budapest, 5 October
Multi-Group and Multi-Site Studies. To pool or not to pool? Helmut Schütz Satellite Short Course Budapest, 5 October 2017 1 Group Effect Sometimes subjects are split into two or more groups Reasons Lacking
More informationThe Components of an Abstract. Alfreda Woods, BS, CHES, CTR Program Manager, DCCR DelMarVa-DC Educational Conference October 12, 2018
The Components of an Abstract Alfreda Woods, BS, CHES, CTR Program Manager, DCCR DelMarVa-DC Educational Conference October 12, 2018 1 Abstracts Should Tell A Story 2 The Components of an Abstract Who
More informationEPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)
EPS 625 INTERMEDIATE STATISTICS TO-AY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY) A researcher conducts a study to evaluate the effects of the length of an exercise program on the flexibility of female and male
More informationInstrument for the assessment of systematic reviews and meta-analysis
Appendix II Annex II Instruments for the assessment of evidence As detailed in the main body of the methodological appendix (Appendix II, "Description of the methodology utilised for the collection, assessment
More information8/10/2015. Introduction: HIV. Introduction: Medical geography
Introduction: HIV Incorporating spatial variability to generate sub-national estimates of HIV prevalence in SSA Diego Cuadros PhD Laith Abu-Raddad PhD Sub-Saharan Africa (SSA) has by far the largest HIV
More informationStudent name: SOCI 420 Advanced Methods of Social Research Fall 2017
SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 904: 2:20 3:35pm
More informationPolitical Science 15, Winter 2014 Final Review
Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically
More informationEstimates of the Prevalence of Opiate Use and/or Crack Cocaine Use, 2014/15: Sweep 11 report
September 2017 Updated July 2018 Estimates of the Prevalence of Opiate Use and/or Crack Cocaine Use, 2014/15: Sweep 11 report Gordon Hay 1, Anderson Rael dos Santos 2, Zoe Swithenbank 1 1. Public Health
More informationIdentifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang
Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang Abstract: Unlike most cancers, thyroid cancer has an everincreasing incidence rate
More information