Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes:

Similar documents
sickness, disease, [toxicity] Hard to quantify

observational studies Descriptive studies

Bias and confounding. Mads Kamper-Jørgensen, associate professor, Section of Social Medicine

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Diagnostic tests, Laboratory tests

Lecture 5. Contingency /incidence tables Sensibility, specificity Relative Risk Odds Ratio CHI SQUARE test

Questionnaire design. Questionnaire Design: Content. Questionnaire Design. Questionnaire Design: Wording. Questionnaire Design: Wording OUTLINE

Understanding Statistics for Research Staff!

Critical reading of diagnostic imaging studies. Lecture Goals. Constantine Gatsonis, PhD. Brown University

Bioengineering and World Health. Lecture Twelve

Strategies for Data Analysis: Cohort and Case-control Studies

Unit 4 Probabilities in Epidemiology

Welcome to this third module in a three-part series focused on epidemiologic measures of association and impact.

BMI 541/699 Lecture 16

Introduction to biostatistics & Levels of measurement

Types of Biomedical Research

12/26/2013. Types of Biomedical Research. Clinical Research. 7Steps to do research INTRODUCTION & MEASUREMENT IN CLINICAL RESEARCH S T A T I S T I C

Sensitivity, Specificity, and Relatives

MAESTRO TRIAL FINAL RESULTS. Gisela L.G. Menezes, MD, PhD

Chapter 1: Introduction to Statistics

INTRODUCTION TO STATISTICS SORANA D. BOLBOACĂ

Measures of Association

Causal Association : Cause To Effect. Dr. Akhilesh Bhargava MD, DHA, PGDHRM Prof. Community Medicine & Director-SIHFW, Jaipur

University of Wollongong. Research Online. Australian Health Services Research Institute

Chapter 10. Screening for Disease

Chapter 1: Introduction to Statistics

Case-control studies. Hans Wolff. Service d épidémiologie clinique Département de médecine communautaire. WHO- Postgraduate course 2007 CC studies

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B

Rapid appraisal of the literature: Identifying study biases

Unit 4 Probabilities in Epidemiology

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Introduction to Epidemiology Screening for diseases

Overview. Goals of Interpretation. Methodology. Reasons to Read and Evaluate

Disease Detectives. The starred questions can be used as tie breakers. Total Points: 212

I got it from Agnes- Tom Lehrer

Christina Martin Kazi Russell MED INF 406 INFERENCING Session 8 Group Project November 15, 2014

Bias. A systematic error (caused by the investigator or the subjects) that causes an incorrect (overor under-) estimate of an association.

Example - Birdkeeping and Lung Cancer - Interpretation. Lecture 20 - Sensitivity, Specificity, and Decisions. What do the numbers not mean...

OCW Epidemiology and Biostatistics, 2010 Michael D. Kneeland, MD November 18, 2010 SCREENING. Learning Objectives for this session:

INTRODUCTION TO EPIDEMIOLOGICAL STUDY DESIGNS PHUNLERD PIYARAJ, MD., MHS., PHD.

Two-sample Categorical data: Measuring association

Is There An Association?

Epidemiologic Study Designs. (RCTs)

When Is a Tumor Marker a Laboratory Test?

Introduction: Statistics, Data and Statistical Thinking Part II

Statistics, Probability and Diagnostic Medicine

EPIDEMIOLOGY. Training module

Bias. Zuber D. Mulla

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Epidemiology: Overview of Key Concepts and Study Design. Polly Marchbanks

Chapter 8: Two Dichotomous Variables

Handout 11: Understanding Probabilities Associated with Medical Screening Tests STAT 100 Spring 2016

An Introduction to Epidemiology

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

ADENIYI MOFOLUWAKE MPH APPLIED EPIDEMIOLOGY WEEK 5 CASE STUDY ASSIGNMENT APRIL

Bias. Sam Bracebridge

Health Studies 315: Handouts. Health Studies 315: Handouts

Math Workshop On-Line Tutorial Judi Manola Paul Catalano. Slide 1. Slide 3

Chapter 2. Epidemiological and Toxicological Studies

DATA is derived either through. Self-Report Observation Measurement

(true) Disease Condition Test + Total + a. a + b True Positive False Positive c. c + d False Negative True Negative Total a + c b + d a + b + c + d

Screening for Disease

4 Diagnostic Tests and Measures of Agreement

Risk Ratio and Odds Ratio

PROSTATE CANCER SCREENING SHARED DECISION MAKING VIDEO

RELIABILITY OF OPERATORS DURING THE VISUAL INSPECTION OF PRODUCED PARENTERAL DRUGS

Math Workshop On-Line Tutorial Judi Manola Paul Catalano

5.3: Associations in Categorical Variables

Epidemiologic study designs

PRECISION IMAGING: QUANTITATIVE, MOLECULAR AND IMAGE-GUIDED TECHNOLOGIES

To review probability concepts, you should read Chapter 3 of your text. This handout will focus on Section 3.6 and also some elements of Section 13.3.

The recommended method for diagnosing sleep

COMP90049 Knowledge Technologies

In-house* validation of Qualitative Methods

Trial Designs. Professor Peter Cameron

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Probability Revision. MED INF 406 Assignment 5. Golkonda, Jyothi 11/4/2012

Biochemical investigations in clinical medicine

Genetic Counselor: Hi Lisa. Hi Steve. Thanks for coming in today. The BART results came back and they are positive.

Psychology, 2010, 1: doi: /psych Published Online August 2010 (

Research Methodology Workshop. Study Type

7/17/2013. Evaluation of Diagnostic Tests July 22, 2013 Introduction to Clinical Research: A Two week Intensive Course

General Biostatistics Concepts

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 1.1-1

Biosta's'cs Board Review. Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016

Module 4: Group Exercise Joe s Thanksgiving Dinner The Setting

Cigarette Smoking and Lung Cancer

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

In the 1700s patients in charity hospitals sometimes slept two or more to a bed, regardless of diagnosis.

ALABAMA SELF-ASSESSMENT INDEX PILOT PROGRAM SUMMARY REPORT

C-1: Variables which are measured on a continuous scale are described in terms of three key characteristics central tendency, variability, and shape.

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Measurement. 500 Research Methods Mike Kroelinger

Challenges of Observational and Retrospective Studies

Various performance measures in Binary classification An Overview of ROC study

VARIABLES AND MEASUREMENT

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Transcription:

This sheets starts from slide #83 to the end ofslide #4. If u read this sheet you don`t have to return back to the slides at all, they are included here. Categorical Data (Qualitative data): Data that can be classified as belonging to a distinct number of categories >>result in categorical responses. And this includes: Binary data>> data can be classified into one of 2 possible categories (yes/no, positive/negative) Ordinal data >> data that can be classified into categories that have a natural ordering (i.e... levels of pain: none, moderate, intense>> pain also can be rated on a continuous scale. Note: Categories that are distinct from each other such as gender, religion, marital status etc are calssified as Nominal Data (cannot be ranked in order), in the ordinal type the categorical variables can be ranked in increasing or decreasing order. Nominal data >> data can be classified into >2 categories (i.e... Race: Arab, African, and others) Proportions: Numbers by themselves may be misleading: they are on different scales and need to be reduced to a standard basis in order to compare them. We most frequently use proportions: that is, the fraction of items that satisfy some property, such as having a disease or being exposed to a dangerous chemical. "Proportions" are the same thing as fractions or percentages. In every case you need to know what you are taking a proportion of: that is, what is the DENOMINATOR (n) in the proportion. ** Proportion =xi/n Where X is the value of the i data and N is the number of whole values **whereas the Percentage = p*100% =xi/n (100) We have to differentiate between the proportion and the probability Proportions and Probabilities: We often interpret proportions as probabilities. If the proportion with a disease is 1/10 then we also say that the probability of getting the disease is 1/10, or 1 in 10. Proportions are usually quoted for samples. Probabilities are almost always quoted for populations.

A probability is a hypothetical property. Proportions summarize observations>>> A proportion compares part of the quantity to the whole quantity (the base) whereas the probability is the chance or likelihood of an event to happen >>> Probability is the PROPORTION of times the outcome would occur. Example: Smoking among workers and its effect on disease and health. (Case-Control Study) Smoking Workers Cases Controls No Yes 11 35 No 50 203 Yes Yes 84 45 No 313 270 This study compares patients who have a disease (cases) with patients who do not have the disease or outcome (controls), and looks back retrospectively to compare how frequently the exposure to the risk factor (SMOKING in this case ) is present in each group to determine the relationship between the risk factor (SMOKING) and the disease. For the cases: Proportion of exposure to smoking =number of cases that smoke / whole number of cases =84/(84+313)=0.212 or 21.2% For the controls: Proportion of exposure to smoking =number of controls (who don`t have the disease ) that smoke / whole number of controls =45/(45+270)=0.143 or 14.3%

Prevalence: Disease Prevalence = the proportion of people with a given disease at a given time. (at a specific period of time) Disease prevalence = Number of diseased persons at a given time Total number of persons examined at that time (Both healthy and sick) >>For example if we took a sample of 1000 and screened them for diabetes and we found that 100 are affected then the prevalence of diabetes in that sample would be: prevalence =100/1000 =0.1 =10% Prevalence is usually quoted as per 100,000 people so the above proportion should be multiplied by 100,000. Now how do we interperate the prevalence: Interpretation: At time t:- Cases Prevalence ( old new) Total For example if we were talking about AIDS prevalence in Jordanian population, prevalent cases are all existing (new and old) cases who are present in the population at a particular point in time (time t )having HIV infection and we divide this number over the total population. The main disadvantage of using a prevalent case series is that patients with a long course of disease tend to be over-represented since; all those with a short duration leave the pool of prevalent cases because of either recovery or death. Also it may be difficult to establish temporality (when the person was actually exposed to the risk factor and when he developed the disease) especially in new cases. Problem of exposure: - The main challenge is to identify the appropriate control group; the distribution of exposure among the control group should be representative of the distribution in the population that gave rise to the cases. This has as a consequence that the control group can contain people with the disease under study when the disease has a high attack rate in a population (Not comparable measurement).

Remember: In statistics Validity of an instrument (survey, test, questionnaire, etc.) is the extent to which an instrument measures what it is supposed to measure. It is rare, if nearly impossible, that an instrument be 100% valid, so validity is generally measured in degrees that assess the accuracy of an instrument. External validity (causal, case): is the extent to which the results of a study can be generalized from a sample to a population. An instrument that is externally valid helps obtain population generalizability, or the degree to which a sample represents the population. Content validity: refers to the appropriateness of the content of an instrument. In other words, do the measures (questions, observations, etc.) accurately assess what you want to know? This is particularly important with exams. Consider that a test developer wants to maximize the validity an exam for 7th grade mathematics. This would involve taking representative questions from each of the sections of the material involved and evaluating them against the desired outcomes. Reliability can be thought of as consistency. (Does the instrument consistently measure what it is intended to measure?) A measure is said to have a high reliability if it produces similar results under consistent conditions ((give similar results for similar inputs.)) Cronbach's (alpha) is a coefficient of content consistency It is commonly used as an estimate of the reliability of test scores.>> most fall within the range of 0.75 to 0.83, Zero = not reliable at all, closer to 1 is more reliable!! Cronbach's alpha Reliability α 0.9 Excellent 0.7 α < 0.9 Good 0.6 α < 0.7 Acceptable 0.5 α < 0.6 Poor (weak, can`t be used) α < 0.5 Unacceptable

A test cannot have high validity unless it also has high reliability. Screening Tests: Is a strategy used in a population to identify an unrecognized disease in individuals without signs or symptoms screening tests are somewhat unique in that they are performed on persons apparently in good health such as Mammography to detect breast cancer. These are done by use of machines such as Ultrasound scan, Ophthalmoscope Through screening tests people are classified as healthy or as falling into one or more disease categories. These tests are not 100% accurate and therefore misclassification is unavoidable. There are 2 proportions that are used to evaluate these types of diagnostic procedures. Sensitivity and Specificity: These are used for machines rather than surveys, quationares.etc such as sphygmomanometer, glucometer. Sensitivity and specificity are terms used to describe the effectiveness of screening tests. They describe how good a test is in two ways - finding false positives and finding false negatives Sensitivity :is the Proportion of diseased who screen positive for the disease >> the test's ability to identify a condition correctly for example if we screened 100 patients for having high blood pressure using sphygmomanometer, and they were already diagnosed for having HBP and the test revealed that 90 patient have HBP then the sensitivity of this device is 0.9=90% >> the 10 patients left are considered FALSE NEGATIVE results because they have the disease(positive) but were screened negative!!

A negative test result would definitively rule out presence of the disease in a patient BUT Positive result in a test with high sensitivity is not useful for ruling in disease due to the FALSE POSITIVE results (test result that is erroneously classified in a positive). Specificity : is the Proportion of healthy who screen healthy>> test's ability to exclude a condition correctly, positive result in a test with high specificity is useful for ruling in disease but negative result in a test with high specificity is not useful for ruling out disease because it does not take into account FALSE NEGATIVE results. EXAMPLE: Condition Present Condition Absent Test Positive True Positive (TP) False Positive (FP) Test Negative False Negative (FN) True Negative (TN) Test Sensitivity (Sn) is defined as the probability that the test is positive when given to a group of patients who have the disease. Sn= (TP/ (TP+FN)) x100. It can be viewed as, Sn= 1 - the false negative rate. The Specificity (Sp) of a screening test is defined as the probability that the test will be negative among patients who do not have the disease. Sp = (TN/ (TN+FP)) X100. It can be understood as, Sp= 1-the false positive rate. It is a little bit confusing Sa7??!! SIMPLY. Sn is the probability that a positive test is given when the patient is truly ill, and it is described as a proportion if we took 100 diseased patients and the test showed 80 are positive (truly ill) then the Sn is 0.8 (80/100), in other words, it is 100- (diseased pts screened as healthy {false negative}) =1-0.2>> (20/100) =0.8 SO >>> SN= 1 FN Also (TP+FN) represent the truly diseased patients >>so Sn is the proportion of the truly diseased patients who were screened positive!!

Sb is the probability that a negative test is given when the patient is truly healthy,,, If 100 with no disease are tested and 90 return a negative result, then the test has 0.9 specificity and the remaining 10 patients are false positive results>> SO again it can be described as Sb= 1- { healthy pts screened as diseased (false positive)} Also (TN+FP) represent the truly healthy >>so Sp is the proportion of the truly healthy people who were screened negative!! Positive & Negative Predictive Values: The positive predictive value (PPV) of a test is the probability that a patient who tested positive for the disease actually has the disease. PPV = (TP/ (TP+FP)) X 100. The negative predictive value (NPV) of a test is the probability that a patent that tested negative for a disease will not have the disease. NPV = (TN/ (TN+FN)) X100.

The Efficiency: The efficiency (EFF) of a test is the probability that the test result and the diagnosis agree. It is calculated as: EFF = ((TP+TN)/ (TP+TN+FP+FN)) X 100 {The proportion of correctly diagnosed pts from all of diagnosed ones} Let's take an example: A cytological test was undertaken to screen women for cervical cancer. Test Positive Test Negative Total Actually Positive 154 (TP) 225 (FP) 379 Actually Negative 362 (FN) 516 (TP+FN) 23,362 (TN) 23587(FP+TN) 23,724 Sensitivity = (TP/(TP+FN))x100 = 154/(516)=0.298 *100=29.8% Specificity = (TN/(TN+FP))X100.= 23362/23587=0.99*100= 99% Efficiency =((TP+TN)/(TP+TN+FP+FN)) X 100 = (154+23362)/(154+225+362+23362)=0.976*100=97.6% Relative Risk: First lets define the Incidence : the number of new cases of a condition, symptom, death, or injury that develop during a specific time period, such as a year.incidence shows the likelihood that a person in that population will be affected by the condition(a measure of the risk of developing some new condition within a specified period of time.) Relative risks are the ratio of risks for two different populations (ratio=a/b). Relative Risk disease incidence in group1 disease incidence in group 2

If the risk (or proportion) of having the outcome is 1/10 in one population and 2/10 in a second population, then the relative risk is: (2/10) / (1/10) = 2.0 A relative risk >1 indicates increased risk for the group in the numerator and a relative risk <1 indicates decreased risk for the group in the numerator. RR is usually used for retrospective study(pre-spective), describing a look backwards and examine exposures to suspected risk factor Relative risk >> the chance that a member of a group receiving some exposure will develop a disease relative to the chance that a member of an unexposed group will develop the same disease. P(disease exposed) RR P(disease unexposed) Recall: a RR of 1.0 indicates that the probabilities of disease in the exposed and unexposed groups are identical an association between exposure and disease does not exist. A relative risk of 1 means there is no difference in risk between the two groups. An RR of < 1 means the event is less likely to occur in the exposed group than in the unexposed. An RR of > 1 means the event is more likely to occur in the exposed group than in the unexposed Consider an example where the incidence of developing lung cancer among smokers was 20% and among non-smokers 1% then the RR of cancer associated with smoking would be (20/100)/(1/100)= 20 >>Smokers would be twenty times as likely as non-smokers to develop lung cancer. Odds Ratio: Odds ratio (OR): Is a way to quantify how strongly the presence or absence of property A is associated with the presence or absence of property B in a given population. It is a measure of association between an exposure and an outcome. Odds: the ratio of the probability that the outcome will happen to the probability that the outcome will not happen. The OR represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure. OR is usually used for prospective studies(longitudinal ) >>is a cohort study that follows over time a group of similar individuals (cohorts) who differ with respect to certain factors under study, to determine how these factors affect rates of a certain outcome. "looking forward" The odds ratio is the ratio of the odds of the outcome in the two groups.

OR=1 Exposure does not affect odds of outcome OR>1 Exposure associated with higher odds of outcome OR<1 Exposure associated with lower odds of outcome When is it used? Odds ratios are used to compare the relative odds of the occurrence of the outcome of interest (disease or disorder), given exposure to the variable of interest (health characteristic). The odds ratio can also be used to determine whether a particular exposure is a risk factor for a particular outcome, and to compare the magnitude of various risk factors for that outcome. Odd's Ratio= A/B divided by C/D = AD/BC Where: a = Number of exposed cases (yes/yes) b = Number of exposed non-cases(yes/no) c = Number of unexposed cases(no/yes) d = Number of unexposed non-cases(no/no) >> look to the table below!! Odds Ratio and Relative Risk : Odds ratios are better to use in case-control studies (cases and controls are selected and level of exposure is determined retrospectively "looking back") Relative risks are better for cohort studies (exposed and unexposed subjects are chosen and are followed to determine disease status prospectively "looking forward ") When we have a two-way classification of exposure and disease we can approximate the relative risk by the odds ratio

Disease Exposure Yes No Yes A B A+B No C D C+D Relative Risk=A/(A+B) divided by C/(C+D) Odd s Ratio= A/B divided by C/D = AD/BC Let's take an example to illiustrate: In a Case Control Study to identify the association between Cigarette Smoking and and developing Pancreatic Cancer. Disease: Pancreatic Cancer Exposure: Cigarette Smoking Exposure Disease Yes No Yes 38 81 119 No 2 56 58 First: - Relative risk for exposed vs. non-exposed. ((Ratio of the probability of developing cancer in smokers to the probability of developing cancer in a comparison, non-smokers group.))

Numerator:- proportion of exposed people that have the disease =38/(38+81) Denominator:-proportion of non-exposed that have the disease=2/(2+56) Relative Risk= (38/119)/ (2/58) =9.26 >1 smokers have about 10 more risk to develop pancreatic cancer!! Second : Odd s Ratio for exposed vs. non-exposed (( the odds that cancer will develop in smokers, compared to the odds of developing cancer in the absence of smoking )) To calculate it we have to answer the following questions : Q1: Who are the exposed cases (++ = a)? A1: Smokers who developed pancreatic cancer,a=38 Q2: Who are the exposed non-cases (+ = b)? A2:Smokers who didn`t develop pancreatic cancer b=81 Q3: Who are the unexposed cases ( + = c)? A3: Non-smokers who developed pancreatic cancer c=2 Q4: Who are the unexposed non-cases ( = d)? A4: Non-smokers who didn`t develop pancreatic cancer,d=56 Then we plug the values into the formula: Numerator- ratio of diseased vs. non-diseased in the exposed group Denominator- ratio of diseased vs. non-diseased in the non-exposed group Odd s Ratio= (38/81)/ (2/56) = (38*56)/ (2*81) =13.14 >1 smoking is associated with pancreatic cancer!! I have tried my best to explain everything in the slides cuz the Dr just went through the slides most of the time, so I hope it was useful and sorry for any mistakes!! Rowa2 Lahaseh Done By :