Data Management
DATA is derived either through Self-Report Observation Measurement
QUESTION ANSWER DATA DATA may be from Structured or Unstructured questions? Quantitative or Qualitative? Numerical or Categorical? Continuous or Discrete? AS IS or a Derivative?
Pre-Data Collection Review Data Collection Forms /Methods Consider the following criteria- If QUANTITATIVE VALIDITY (internal) VALIDITY (external) RELIABILITY (objectivity) If QUALITATIVE CREDIBILITY TRANSFERABILITY DEPENDABILITY CONFIRMABILITY
Pre-Data Collection Review Data Collection Forms Internal Validity how much you are able to prove what you want to prove (is often connected to design of study) External Validity how much applicable your findings are to subjects outside of your sample (is often connected to sampling method) Reliability how precise methods are
Reliability and Validity of Instruments Reliability = Consistency Validity = Accuracy
Reliability may mean three different things depending on how data was derived - Stability : Measurements Internal Consistency : Self Reports Equivalence : Observations
Stability extent to which scores are similar on 2 separate administrations of an instrument Test retest reliability Pearson s r Correlation Coefficient Appropriate for relatively enduring attributes Change wording of questions in a functionally equivalent form, or simply to change the order of the questions Alternate/ Equivalent form reliability Pearson s r Correlation Coefficient helps to overcome the "practice effect"
r = 0.95 r = - 0.95 Positive Correlation Negative Correlation
Strength of Linear Association 1 0.8 perfect linear relationship strong relationship (0.80 0.99) 0.5 moderate relationship (0.40 0.79) 0.3 0 weak relationship (0.01 0.39) no linear relationship
Internal consistency extent to which all the instrument s items are measuring the same attribute Split-half technique Cronbach s Alpha, & Spearman- Brown Coefficient Appropriate for most multi-item instruments Coefficient alpha Cronbach s Alpha only
Cronbach s alpha <0.6 Unacceptable reliability 0.6-0.7 Acceptable reliability 0.7-0.8 Good reliability 0.8-0.95 Very Good reliability >0.95 Items redundant
Equivalence degree of similarity between multiple raters/observers using an instrument Interobserver or interrater reliability Cohens Kappa and Multirater Kappa (nominal) Most relevant for structured observations degree of similarity between alternate forms used by raters/observers using an instrument Intra-observer McNemar Statistic (binary data) Most relevant for structured observations
Kappa 0.93-1 0.81-0.92 0.61-0.80 0.41-0.60 0.21-0.40 0.01 0.20 <0 Excellent agreement Very good Good Fair Slight Poor No agreement
Validity degree to which an instrument measures what it is supposed to measure
Validity Face Content Criterionrelated Construct Measurements/questions related to variable/concept degree by which items are representative of characteristic compare to gold standard related to instruments of same characteristic and not others (measures what is suppose to be measured) subjective judgement Expert validation using Content Validity Index Use criteria and calculate validity coefficient Use factor analysis
Sampling how you got your subjects/respondents Sampling Probability or Random Sampling equal probability or there is predictable possibility Non Probability or non random Sampling not everyone has chance
Probability Sampling Simple random sampling Stratified random sampling Cluster sampling Systematic sampling
Multi Stage Sampling Use a combination of methods Mainly utilizes cluster sampling Used for large area sampling
Methods of Non-random Sampling Convenience (and/or volunteer) sampling Accessible samples Snowball sampling referral Purposive sampling
Types of Purposive Sampling Maximum variation sampling get extremes Homogenous sampling get similar cases Extreme (deviant) case sampling get one extreme Intensity sampling get representative for each intensity Typical case sampling Critical case sampling Criterion sampling Theory based sampling Quota Sampling Sampling confirming/disconfirming cases
Terminology (Not the same) Random Sampling method of selection Random Assignment Required in experiments
Eligibility Criteria Inclusion Criteria Exclusion Criteria
Considerations that Affect Sample Size in Quantitative Studies Homogeneity of the population more homogenous less sample Effect size (strength of relationships) greater effect size to be achieved, more sample needed Attrition (loss of subjects) more samples Interest in subgroup analyses - more samples Sensitivity of the measures ability to detect difference; more sensitive, more samples
Sample Size Estimation Equations and methods exist to estimate a theoretical sample size needed to prove a certain effect
Pre-Data Collection Review Data Collection Forms Presentable Easy to fill-up Not too long, not too short Proper format & instructions Easy to understand Correct language & vocabulary Retraceable and sufficient info
Data Collection Proper Avoid bias Design Bias Internal/External Validity Regression effect Attrition effect Measurement Bias Sampling Bias Procedural Bias Type 3 error (wrong problem)
Data Collection Proper Check notes & questionnaires Lacking information Correct information
Data Collection Proper Make data retraceable Number Form or Questionnaire Number Respondent Number Control Number List / Sign-up sheets Ask for name or unique identifier
Post Data Collection Coding manual
Post Data Collection Coding manual Variable Names Variable Labels
Post Data Collection Coding manual Qualitative: Encode as is Nominal: Encode each as separate variables or Encode as one variable but each with assigned codes Ordinal: code according to rank Numerical: code as measured
Post Data Collection Coding manual Encode as is: Words or phrases Encode as assigned: Binary Responses Categorical Answers & Multiple Choice Cafeteria Questions Encode as ranked/ordered: ordinal Likert Scale Semantic Differential Scale Encode as measured: Rating Scale Visual Analog Scale Bio-physiologic Measures Derived/computed variables
Post Data Collection Coding Data Bio-physiologic Measures: check units
Post Data Collection Coding Data Use color
Post Data Collection Coding Data Centralized less errors, but takes longer Decentralized faster but more prone to errors
Post Data Collection Coding Data Vertical vs Horizontal Most statistical softwares read data horizontally; thus if encoded vertically, you need to transpose data
Post Data Collection Coding Data Formatting vs No Formatting Formatting helps to avoid mistakes but is not recognized by statistical softwares and may cause some errors.
Post Data Collection Coding Data Order as to the questionnaire
Post Data Collection Coding Data Post Encoding Checking Count Check frequencies Total/Sum Check for blank or zero information Direction of questions