Getting to know your variables. Jane E. Miller, PhD

Similar documents
Substantive Significance of Multivariate Regression Coefficients

Introduction: Statistics, Data and Statistical Thinking Part II

A Strategy for Handling Missing Data in the Longitudinal Study of Young People in England (LSYPE)

10/4/2007 MATH 171 Name: Dr. Lunsford Test Points Possible

Basic SPSS for Postgraduate

Lecture (chapter 1): Introduction

Interpreting the substantive significance of multivariable regression coefficients. Jane E. Miller, Ph.D. 1

Test Bank for Privitera, Statistics for the Behavioral Sciences

Distributions and Samples. Clicker Question. Review

The Social Norms Review

Dental fluorosis trends in United States oral health surveys:

The Institute of Chartered Accountants of Sri Lanka

Highlights. Attitudes and Behaviors Regarding Weight and Tobacco. A scientific random sample telephone survey of 956 citizens in. Athens-Clarke County

Chapter 1: The Nature of Probability and Statistics

UNOBTRUSIVE RESEARCH 23/11/ Content Analysis. Sociological Research Methods

Statistics are commonly used in most fields of study and are regularly seen in newspapers, on television, and in professional work.

Measurement. 500 Research Methods Mike Kroelinger

Quantitative Data and Measurement. POLI 205 Doing Research in Politics. Fall 2015

full file at

CHAPTER 2. MEASURING AND DESCRIBING VARIABLES

Body Mass Index And Breast Size In Women Same Or

3. For a $5 lunch with a 55 cent ($0.55) tip, what is the value of the residual?

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Basic Statistical Concepts, Research Design, & Notation

DATASET OVERVIEW. The National Health Measurement Study (NHMS) University of Wisconsin-Madison Department of Population Health Sciences.

The Nature of Probability and Statistics

QUESTIONNAIRE USER'S GUIDE

Ch 1.1 & 1.2 Basic Definitions for Statistics

SOUTH AND SOUTHEAST ASIA

Introduction to Statistics

Department of Statistics TEXAS A&M UNIVERSITY STAT 211. Instructor: Keith Hatfield

Population. population. parameter. Census versus Sample. Statistic. sample. statistic. Parameter. Population. Example: Census.

Variability. After reading this chapter, you should be able to do the following:

PubH 7405: REGRESSION ANALYSIS. Propensity Score

DEPARTMENT OF EPIDEMIOLOGY 2. BASIC CONCEPTS

Psy 464 Advanced Experimental Design. Overview. Variables types. Variables, Control, Validity & Levels of Measurement

Ch. 1 Collecting and Displaying Data

Introduction to Meta-Analysis

Chapter 01 What Is Statistics?

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Multiple Choice Questions

Two Branches Of Statistics

Section 1.1 What is Statistics?

CONSOLE GUIDE & SET-UP INSTRUCTIONS STANDARD CONSOLE

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Transforming Concepts into Variables

Baby weigh in Activity 1

Descriptive Statistics Lecture

Quantitative Research Methods and Tools

Prepared by: DR. ROZIAH MOHD RASDI Faculty of Educational Studies Universiti Putra Malaysia

Determining Whether or Not Dental Students Will Immediately Enter Private Practice Upon Graduation. Raymond A. Kuthy Sarah E.

A Probability Puzzler. Statistics, Data and Statistical Thinking. A Probability Puzzler. A Probability Puzzler. Statistics.

Chapter 1: Data Collection Pearson Prentice Hall. All rights reserved

Shiken: JALT Testing & Evaluation SIG Newsletter. 12 (2). April 2008 (p )

Math Workshop On-Line Tutorial Judi Manola Paul Catalano. Slide 1. Slide 3

What are Indexes and Scales

Elementary Statistics:

The knowledge and skills involved in clinical audit

Module One: What is Statistics? Online Session

Quantitative Analysis

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Math Workshop On-Line Tutorial Judi Manola Paul Catalano

Introduction to Biostatics.

Measuring impact. William Parienté UC Louvain J PAL Europe. povertyactionlab.org

4.0 Product Testing. 4.1 Routine Testing

Chapter 2--Norms and Basic Statistics for Testing

Level of Measurements

Community Health Profile: Minnesota, Wisconsin, & Michigan Tribal Communities 2006

Sociology 301. Sampling + Research Ethics + Exam Review. Non-Probability Sampling

ABSTRACT. Effects of Birthplace, Language, and Length of Time in the U.S. on Receipt of Asthma Management Plans Among U.S. Adults with Current Asthma

Title of Resource Fast Food Fun: Understanding Levels of Measurement Author(s) Karla M. Batres Institution Caldwell University


ISV RANDOMIZED CLINICAL TRIAL PUBLIC USE DATASET

Cochrane Central Register of Controlled Trials (CENTRAL) (The Cochrane Library)

Developing and Testing Survey Items

General Biostatistics Concepts

Community Health Profile: Minnesota, Wisconsin & Michigan Tribal Communities 2005

THE BASIC LANGUAGE OF STATISTICS

Statistics Concept, Definition

REVIEW PROBLEMS FOR FIRST EXAM

Chapter 2 Norms and Basic Statistics for Testing MULTIPLE CHOICE

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

Undertaking statistical analysis of

factors that a ect data e.g. study performance of college students taking a statistics course variables include

Good nutrition is an essential part of healthy childhood.

Announcement. Homework #2 due next Friday at 5pm. Midterm is in 2 weeks. It will cover everything through the end of next week (week 5).

General, Organic & Biological Chemistry, 5e (Timberlake) Chapter 2 Chemistry and Measurements. 2.1 Multiple-Choice Questions

An Introduction to Research Statistics

Common Errors. ClinicalTrials.gov Basic Results Database

Americans Current Views on Smoking 2013: An AARP Bulletin Survey

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

The University of North Carolina at Chapel Hill School of Social Work

1.2 Developing learning strategies and study skills. Task 1 Which number-crunching method do you prefer?

AP Psychology -- Chapter 02 Review Research Methods in Psychology

Know Your Number Aggregate Report Comparison Analysis Between Baseline & Follow-up

CONSOLE GUIDE & SET-UP INSTRUCTIONS STANDARD CONSOLE

It is crucial to follow specific steps when conducting a research.

Overview of Multivariate Methods

DO NOT OPEN THIS BOOKLET UNTIL YOU ARE TOLD TO DO SO

Running head: INFLUENCE OF LABELS ON JUDGMENTS OF PERFORMANCE

Transcription:

Getting to know your variables Jane E. Miller, PhD

Why is it important to get to know your variables? Each variable measures A specific concept Numeric values have particular meanings that differ depending on the nature of that concept In a particular context Place, time, and group to whom do the #s pertain In specific units Collected with a particular study design Affects prevalence of and reasons for missing values

Example of failing to get to know variables In a nationally representative survey sample from a developing country circa 2002. Data set downloaded from a research data web site; not cleaned or evaluated before use. Birth weight in grams observed range up to 9999 with a mean of 8000 First red flag: Implausible as an actual birth weight, given its meaning and units. 9,999 grams ~= 22 lbs. 9999 was a code for missing value Lesson: Must become familiar with what a particular value means for that concept, context and units.

Second red flag 2/3 of sample had a birth weight value of 9999 Very high value for a substantial share of the sample Unlikely to be explained solely by outliers data entry errors Lesson: Look at study documentation and questionnaire to find out why this distribution was observed. Occurred due to a skip pattern designed to minimize recall bias in birth weight reporting.

Resources needed for this exercise Documentation on the data source Description of study design Questionnaire Codebook for electronic data file Electronic file of database Statistical software Research question Articles, books, etc. on the topic Dependent and key independent variables Getting to know variables is project-specific

Attributes of data and variables to become familiar with prior to analysis

Analytic sample Before becoming acquainted with variables in the analysis, impose any limits on the analytic sample related to the research question. Exclude cases to whom the topic does not pertain that are part of a group with too few cases for whom a key variable was not collected

Context of measurement When, where, who, e.g., family income will be Higher now than it was 200 years ago in a given place and group Higher in a currently developed than developing country Higher in a sample of all households than in a sample of low-income households

Do data pertain to Individual person? Family? Census tract? Institution? Unit of analysis Knowing unit of analysis helps ascertain plausible range of values e.g., number of persons in a family will be much lower than the population of a census tract or a school

Labeling, coding, and missing value information for the variables To help create a comprehensive record of information on each of the variables in the analysis, fill out a grid like this one, which is available online. Variable name (e.g. acronym on the data set) DOCLY BWGRMS Variable label (descriptive phrase) Saw doctor last year Birth weight Type of variable (nominal, ordinal, interval or ratio) Nominal Coding (for categorical variables) OR Units (for continuous variables) 1 = yes 2 = no Plausible range of values (excluding missing values) Missing value codes (if any) 1, 2 7 = refused 8 = don t know 9 = missing Ratio Grams 0 6000 9999 = missing Skip pattern? (e.g., conditions under which variable not collected) None for this variable Asked only about children < age 5 years. Original or created variable? Original Original

Level of measurement Categorical variables are classified into categories or ranges. Nominal, e.g., gender, race Ordinal, e.g., age group, income range Continuous variables Measured in numeric units, but not grouped. Two types of continuous variables: Interval Zero is not lowest possible value e.g., temperature Fahrenheit Ratio Zero is lowest possible value e.g., temperature Kelvin, height, weight Helps to anticipate limits on range of values

Units of measurement System of measurement: Metric, British or other? E.g., income in dollars or Euros or yen? Level of aggregation E.g., income per hour or per week or per year? Scale E.g., income in dollars or thousands of dollars or millions of dollars?

Missing values Missing values on a variable can occur because they are Not applicable for some respondents Missing by design (e.g., modules given only to a subset of the overall sample) Item non-response Identify missing values as such in the electronic database, so they are treated correctly during analysis.

Plausible values for the concept being measured A value of 10,000 Makes sense in at least some contexts for Annual family income in dollars Population of a census tract An annual death rate per 100,000 persons Does NOT make sense for Hourly income in dollars Height of a person, in inches Number of persons in a family A Likert scale item A proportion An annual death rate per 1,000 persons

Another example of plausible values A value of 1 Makes sense in at least some contexts for Temperature in degrees Fahrenheit or Celsius Change in rating on a 5 point scale Change in death rate per 100,000 persons Percentage change in annual family income Does NOT make sense for Temperature in degrees Kelvin Number of persons in a family A Likert scale item A proportion

Becoming acquainted with the concepts under study To identify plausible ranges of values for each of the dependent and key independent variables, read the literature. Definitional limits E.g., a proportion of a whole must fall between 0 and 1 Conceptually plausible range E.g., birth weight must be positive but low enough that an infant of that size could conceivably be born! Context of measurement (who, when, where)

After Descriptive statistics Imposing restrictions on analytic sample Filling in missing value codes for each variable Complete a grid of d-statistics on each variable to compare across Analytic data set Codebook Articles or books on the topic Variable name #of valid cases for that variable (excl. missing values) Observed values from data set For continuous For categorical variables variables: Frequency distribution Values & range consistent w/ codebook? Reference values from external source For continuous For categorical variables variables: Frequency distribution Min Max Mean Min Max Mean

Check each distribution against the codebook for the original source Check the distribution of values observed in the analytic sample for each variable against the codebook for the data set. range and/or mean values for continuous variables frequency distribution of categorical variables # cases with missing values, by reason for missing value If any distributions are inconsistent, do NOT analyze the data until discrepancies are resolved!

Check each distribution against the literature on similar variables Track down information in the published literature on each of the main variables for a similar population. If the values in the data are substantially different from those used in other studies of the same concepts, do NOT analyze the data until discrepancies are resolved!

Identify reasons for inconsistencies Explain possible reasons for discrepancies between their data and similar data sets, e.g.,: Population studied, e.g., substantially different time, place, and/or subgroup Units of analysis, e.g., family instead of individual Units of measurement, e.g., metric instead of British units Scale, e.g., grams instead of kilograms Transformations of the variables, e.g., percentiles instead of original value

Reasons for getting to know your variables, redux These attributes of the analytic sample and variables are essential information for Data preparation Inclusion criteria for the analytic sample Creation of new variables Choice of pertinent descriptive and multivariate statistics Design of correct charts and tables Writing correct prose Even experienced researchers should complete this assignment when undertaking a project with a new topic or data set.

Exercise yields key information for a research paper on the topic Reading the literature on the topic yields information needed for the introduction literature review discussion sections of a paper Detailed knowledge of study design and variables from documentation, questionnaire and codebook provides information needed in the data and methods results sections of a paper

Suggested readings Miller, J. E. 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition. chapter 4 on levels of measurement, units, standards and cutoffs chapters 7 and 10 on choice of contrasts to suit the variable chapter 13 on data and methods chapters 4 and 13 on missing values and missing by design Chambliss, Daniel F., and Russell K. Schutt. 2012. Making Sense of the Social World: Methods of Investigation, 4 th Edition. Thousand Oaks, CA: Sage Publications, or other research methods book for information on study design, conceptualization, and measurement

Suggested online resources Suggested podcasts: Reporting one number (re: units) Comparing two numbers or series of numbers (re: levels of measurement) Defining the Goldilocks problem Online materials available at http://press.uchicago.edu/books/miller/multivariate/index.html

Contact information Institute for Health, Health Care Policy and Aging Research Rutgers University 112 Paterson Street New Brunswick NJ 08901 jmiller@ifh.rutgers.edu (848) 932-6730