Part 8 Logistic Regression

Similar documents
Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield

Simple Linear Regression One Categorical Independent Variable with Several Categories

Basic Biostatistics. Chapter 1. Content

Daniel Boduszek University of Huddersfield

POL 242Y Final Test (Take Home) Name

Statistical questions for statistical methods

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Correlation and Regression

One-Way Independent ANOVA

Step 3 Tutorial #3: Obtaining equations for scoring new cases in an advanced example with quadratic term

Template 1 for summarising studies addressing prognostic questions

Chapter Eight: Multivariate Analysis

Chapter Eight: Multivariate Analysis

SPSS Correlation/Regression

Intro to SPSS. Using SPSS through WebFAS

MAKING THE NSQIP PARTICIPANT USE DATA FILE (PUF) WORK FOR YOU

SPSS Portfolio. Brittany Murray BUSA MWF 1:00pm-1:50pm

Chapter 9: Comparing two means

MODEL SELECTION STRATEGIES. Tony Panzarella

Section 6: Analysing Relationships Between Variables

WELCOME! Lecture 11 Thommy Perlinger

Binary Diagnostic Tests Two Independent Samples

THE STATSWHISPERER. Introduction to this Issue. Doing Your Data Analysis INSIDE THIS ISSUE

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Lesson: A Ten Minute Course in Epidemiology

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

DOE Wizard Definitive Screening Designs

Prediction Model For Risk Of Breast Cancer Considering Interaction Between The Risk Factors

Multiple Linear Regression Analysis

bivariate analysis: The statistical analysis of the relationship between two variables.

Logistic Regression. The right choices over time greatly improve your odds of a long and healthy life.

isc ove ring i Statistics sing SPSS

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

Logistic Regression Predicting the Chances of Coronary Heart Disease. Multivariate Solutions

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Chapter 11 Nonexperimental Quantitative Research Steps in Nonexperimental Research

Examining differences between two sets of scores

Introduction to SPSS S0

Anticoagulation Manager - Getting Started

Variables and Data. Gbenga Ogunfowokan Lead, Nigerian Regional Faculty The Global Health Network 19 th May 2017

Two-Way Independent ANOVA

Survey of Smoking, Drinking and Drug Use (SDD) among young people in England, Andrew Bryant

NIH Public Access Author Manuscript Parkinsonism Relat Disord. Author manuscript; available in PMC 2009 August 1.

STATISTICAL MODELING OF THE INCIDENCE OF BREAST CANCER IN NWFP, PAKISTAN

CHAPTER VI RESEARCH METHODOLOGY

Binary Diagnostic Tests Paired Samples

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Today: Binomial response variable with an explanatory variable on an ordinal (rank) scale.

Biostats Final Project Fall 2002 Dr. Chang Claire Pothier, Michael O'Connor, Carrie Longano, Jodi Zimmerman - CSU

Regression Including the Interaction Between Quantitative Variables

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Ordinary Least Squares Regression

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Chapter 13 Estimating the Modified Odds Ratio

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

The University of North Carolina at Chapel Hill School of Social Work

CHAPTER ONE CORRELATION

Using SPSS for Correlation

General practice. Role of mechanical and psychosocial factors in the onset of forearm pain: prospective population based study.

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Daniel Boduszek University of Huddersfield

Charts Worksheet using Excel Obesity Can a New Drug Help?

(C) Jamalludin Ab Rahman

Chapter 14: More Powerful Statistical Methods

MULTIPLE OLS REGRESSION RESEARCH QUESTION ONE:

Influence of Hypertension and Diabetes Mellitus on. Family History of Heart Attack in Male Patients

IAPT: Regression. Regression analyses

Social Inequalities in Self-Reported Health in the Ukrainian Working-age Population: Finding from the ESS

m 11 m.1 > m 12 m.2 risk for smokers risk for nonsmokers

Title: Socioeconomic conditions and number of pain sites in women

Bangor University Laboratory Exercise 1, June 2008

Line Murtnes Hagestande

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

12/30/2017. PSY 5102: Advanced Statistics for Psychological and Behavioral Research 2

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

Manual Handling/Manual Tasks Checklist

Hour 2: lm (regression), plot (scatterplots), cooks.distance and resid (diagnostics) Stat 302, Winter 2016 SFU, Week 3, Hour 1, Page 1

WORK FITNESS ASSESSMENT

Testing Means. Related-Samples t Test With Confidence Intervals. 6. Compute a related-samples t test and interpret the results.

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Analysis of Covariance (ANCOVA)

02a: Test-Retest and Parallel Forms Reliability

Lecture 21. RNA-seq: Advanced analysis

Speed & Intensity risk factors in Wellnomics Risk Management. Wellnomics White Paper

Ergonomics Software User s Manual v 4.1 BAKPAK. An Integrated Software Package for the Ergonomic Assessment of Lifting and Lowering Tasks

Linear Regression in SAS

MULTIPLE REGRESSION OF CPS DATA

Biostatistics II

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups

9 research designs likely for PSYC 2100

CHAPTER III METHODOLOGY

3D SSPP Version 6. ANALYSIS & USE GUIDE For Reactive & Proactive Use

PreTect Software Documentation

4Stat Wk 10: Regression

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan

Transcription:

1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences) Part 8 Logistic Regression Quantitative Methods for Health Research: A Practical Interactive Guide to Epidemiology and Statistics, Second Edition. Nigel Bruce, Daniel Pope, and Debbi Stanistreet. 2018 John Wiley & Sons Ltd. Published 2018 by John Wiley & Sons Ltd.

Contents 1 OVERVIEW/REMINDER STUDY DATASET... 3 1.1 Reminder Study Dataset... 3 1.2 Coding Sheet for Low Back Pain Dataset... 4 2 IMPORTING THE STUDY DATASET... 6 3 PRACTICAL EXAMPLE OF MULTIPLE LOGISTIC REGRESSION 6 3.1 Background to the Analysis... 6 3.2 Selection of Predictors for the Regression Model... 7 3.3 Is there a Univariate Association Between the Predictors and Low Back Pain?... 8 3.3.1 Relationship Between Hectic Work and Low Back Pain... 8 3.3.1.1 Interpreting the SPSS Output... 11 3.3.2 Relationship Between Monotonous Work and Low Back Pain15 3.3.2.1 Interpreting the SPSS Output... 16 3.3.3 Relationship Between Stressful Work and Low Back Pain... 17 3.3.3.1 Interpreting the SPSS Output... 18 3.4 Are There Univariate Associations Between the Potential Confounders and Low Back Pain?... 19 3.4.1 Relationship Between Psychological Distress and Low Back Pain... 20 3.4.1.1 Interpreting the SPSS Output... 20 3.4.2 Relationship Between Age and Low Back Pain... 22 3.4.2.1 Interpreting the SPSS Output... 22 3.4.3 Relationship Between Sex and Low Back Pain... 23 3.4.3.1 Interpreting the SPSS Output... 24 3.5 What Are the Independent Effects of the Predictor Variables (Psychological Working Environment) on Low Back Pain?. 25 3.5.1 Multivariate Logistic Regression in SPSS... 25 3.5.1.1 Interpreting the SPSS Output... 28 2

3 1 Overview/Reminder Study Dataset This practical session is a continuation of multiple regression and describes the use of unconditional logistic regression in identifying associations between a dichotomous categorical outcome variable and predictor variables that are continuous and categorical. 1. Univariate and Multivariate Logistic Regression Using SPSS for Windows Throughout the practical session there are questions relating to the SPSS output obtained from the logistic regression analysis. Questions relating to the practical exercises are included in boxes like the one shown below. The answers to these questions are located at the end of this workbook. 1.1 Reminder Study Dataset The dataset for this exercise is a slightly different variation to the low back pain dataset (note the change in name). The dataset relates to information collected from 765 employees selected randomly from North West manual occupational settings. The aim of the study was to see what features of the occupational environment were associated with low back pain. The dataset includes information on demography (age, sex, height, weight, and social class), physical working environment (working postures, manual handling activities, and repetitive upper limb movements the duration of these activities was recorded for 60 minutes of one shift), psychosocial working environment (psychological demands of work), and psychological distress (a score based on responses to a psychological questionnaire a higher score indicates a higher level of psychological distress). The coding sheet for the low back pain dataset is shown below (note the type of variable, variable label, and value labels (if applicable)).

4 1.2 Coding Sheet for Low Back Pain Dataset Name Type Width Dec Label Values Id Numeric/continuous 4 0 Study number (unique identifier) compno Numeric/categorical 1 0 Type of company 1 = Post office, 2 = Supermarket, 3 = Store, 4 = Factory, 5 = Hospital Age Numeric/continuous 3 0 Age of subject Sex Numeric/nominal 1 0 Sex of subject 1 = Male, 2 = Female height Numeric/continuous 3 0 Height of subject (cm) weight Numeric/continuous 5 1 Weight of subject (kg) class Numeric/categorical 1 0 Social class 1 = I, 2 = II, 3 = IIIN, 4 = IIIM, 5 = IV, 6 = V backpain Numeric/nominal 1 0 Reported back pain Sit Numeric/continuous 4 1 Minutes seated stand Numeric/continuous 4 1 Minutes stood liftone Numeric/continuous 4 1 Minutes lifting with one hand liftboth Numeric/continuous 4 1 Minutes lifting with both hands onesho Numeric/continuous 4 1 Minutes carrying on one shoulder liftsho Numeric/continuous 4 1 Minutes lifting above shoulder level push Numeric/continuous 4 1 Minutes pushing weights Pull Numeric/continuous 4 1 Minutes pulling weights repwrist Numeric/continuous 4 1 Minutes with repetitive wrists reparm Numeric/continuous 4 1 Minutes with repetitive arms hectic Numeric/categorical 1 0 Finds work hectic/too fast 1 = Yes, 0 = No 0 = Never, 1 = Occasionally, 2 = Half the

5 monot Numeric/categorical 1 0 Finds work monotonous/boring stress Numeric/categorical 1 0 Finds work too stressful/anxious psycho Numeric/continuous 3 0 Psychological distress score time, 3 = Always 0 = Never, 1 = Occasionally, 2 = Half the time, 3 = Always 0 = Never, 1 = Occasionally, 2 = Half the time, 3 = Always

6 2 Importing the Study Dataset 1. Double click on SPSS Statistics 24 icon, When asked What would you like to do? : 2. Click on Open an existing file 3. Click on OK Note: The database for the following exercises is called backpain(logreg).sav: 4. Click on in the Look in box to generate a list of the drives 5. Click on backpain(log-reg).sav 6. Click on Open 3 Practical Example of Multiple Logistic Regression 3.1 Background to the Analysis To demonstrate the practical application of multiple logistic regression in SPSS, we will investigate the relationship between psychosocial working environment and low back pain. Although the back pain dataset has not come from a case control study, it will be treated as case control data in this practical with cases being those employees reporting back pain and controls being those employees not reporting back pain. In total there were 765 employees, of whom 198 (25.9%) were cases and 567 (74.1%) were controls. Therefore, there are approximately three controls for each case, but they are not matched. We are interested in identifying the relationship between psychosocial working environment (in terms of self-reported psychological demands associated with manual work) and low back pain. In particular we would

7 like to measure the independent effects of work speed (whether workers find their work hectic or too fast), work monotony (whether workers find their work unstimulating), and work stress (whether the work carried out by employees causes them anxiety or stress). The study hypothesis is that a poor psychosocial working environment, in terms of these psychological demands, may increase muscular tension through psychological stress, which in turn increases the risk of low back pain. 3.2 Selection of Predictors for the Regression Model The variables in the database that are of interest for this investigation include: Dependent (outcome) variable Low back pain categorical (dichotomous) Independent (explanatory) variables (including potential confounders) Psychosocial Working Environment Hectic work Monotonous work Stressful work All variables relating to psychosocial working environment are categorical variables with four point response scales (0 = never, 1 = occasionally, 2 = half the time, 3 = always). As we shall see, these categorical independent variables can be compared to the dependent variable (back pain) by creating dummy variables based on the response categories. Psychological Distress Psycho This is a continuous variable relating to a score on a questionnaire measuring psychological distress. The score ranges from 12 (no distress) to 48 (severely distressed). This variable will be included in the

8 logistic regression as a possible confounder to the relationship between psychosocial working environment and low back pain. Demographic Characteristics Age (years) continuous variable Sex a dichotomous (categorical) variable 3.3 Is there a Univariate Association Between the Predictors and Low Back Pain? Before constructing a multivariate model, we need to examine the association between the independent variables of interest (representing psychosocial working environment) and the dependent variable (low back pain) by carrying out univariate analysis using logistic regression. 3.3.1 Relationship Between Hectic Work and Low Back Pain You will find the logistic regression command in the Regression menu: 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic The main dialog box is similar to the standard regression option box. There is a space to place a dependent variable (outcome).

9 4. Enter back pain into the Dependent variable box There is also a box for specifying the covariates (predictor variables). 5. Enter Is work hectic? into the Covariates box The next step is to convert the four categories of hectic work (never, occasionally, half the time, and always) into dummy variables. 6. Click on in the dialogue box 7. Enter Is work hectic? into the Categorical Covariates box

10 We also need to specify which category of hectic work will be used as the reference group. The reference category for this variable is never (thus employees who find their work is too hectic or too fast for varying lengths of their shift will be compared to those employees who never find their work hectic or too fast). Because never is the first category of hectic work we will need to tell SPSS this: 8. Check the First box under Reference Category 9. Click on the box 10. Click on Continue Before carrying out the logistic regression there are various options that can be selected to further describe how well the selected independent variable(s) predict the dependent variable. For the purposes of this practical we will only select one option (addition of confidence intervals). 11. Click on Options We need to specify that we want confidence intervals around the odds ratios (the exponential of the Beta coefficients): 12. Check the CI for exp(b) box 13. Click on Continue The logistic regression for hectic work and back pain can now be carried out.

11 14. Click on OK to run the analysis 3.3.1.1 Interpreting the SPSS Output The first three tables provide general information about the analysis that has just been carried out. We can see that only one person was excluded from the analysis. The reason for this will have been that there was missing data relating to the employees exposure to hectic work. We can also see that there was no need to change the coding of back pain into 0 and 1 for the logistic regression (a 0 and 1 coding was already used for back pain). If other values had been used (e.g., 1 and 2) SPSS would automatically convert the values into 0 and 1 for the analysis (0 representing the lowest category).

12 Finally we can see that, because there are four categories for Hectic, three dummy variables have been created based on the response categories of 1 (occasionally), 2 (half the time), and 3 (always). Thus, the reference category (never) has values of zero (.000 in the table), zero, zero for each dummy variable. The occasionally category has values of one (1.000 in the table), zero, zero, and so on. The next three tables relate to the model when only the constant is included (i.e., no predictor variables are included). Because these tables do not directly tell us about the association between the dependent variable and independent variable(s), we will not consider these further. For additional information about interpretation of these tables, refer to the text Discovering Statistics using SPSS for Windows by Andy Field (Sage Publications, 2001 Chapter 5). The remaining output shows the results from the new model (including hectic as an independent variable). The Omnibus Tests of Model Coefficients (OTMC) Table gives an estimation of the goodness-of-fit of the model in a similar way to the ANOVA Table in linear regression. The Chi-square value for the model is interpreted in a similar way to the F-ratio; a measure of how much the model has improved the prediction of the outcome compared to the

13 level of inaccuracy (random error) of the model. If the model is a good one, then we expect the improvement in prediction due to the model to be large and the difference between the model and the observed data to be small. In short, a good model should have a large Chi-square. The significance for the Chi-square is calculated using critical values of the Chi-distribution, for the corresponding degrees of freedom. The Chi-square value for this univariate model is statistically significant (p < 0.05). Therefore, there is less than a 5% chance that a Chi-square value this large would happen by chance alone. In short, the logistic regression model with hectic work predicts low back pain significantly well. The next (Model Summary) table gives additional information regarding the predictive properties of the model. The interpretation of this table will not be described in detail, except to point out that the R Square values can be interpreted in a similar way to those in Multiple Linear Regression. In short, taking the Nagelkerke R Square value, we can see that 1.6% of the variation in the probability of having low back pain is predicted by the model (including hectic work as a predictor). For additional information about interpretation of the Model Summary table refer to Discovering Statistics using SPSS for Windows by Andy Field (Sage Publications, 2001 Chapter 5). The third piece of SPSS output relates to the Classification Table. Again, this does not provide much additional information to logistic regression analysis and will not be considered further. See Discovering Statistics using SPSS for Windows by Andy Field (Sage Publications, 2001 Chapter 5).

14 The final table to be considered is the Variables in the Equation Table (interpreted in the same way as the Coefficients Table when carrying out linear regression in SPSS). It is important to note that the Beta values for the logistic regression are log odds and, as such, are difficult to interpret as they stand. To obtain the odds ratio describing the association between the dependent variable (low back pain) and the independent variables (psychosocial working environment), it is therefore necessary to take the exponential of the Beta. This is given as Exp(B) in SPSS. The SPSS table providing the output for the univariate logistic regression analysis of the hectic work variable is shown below: We can see from the model investigating the association between hectic work and low back pain that the odds ratio increases with the amount of time employees spend carrying out work they believe to be too hectic or fast. For employees who report their work to be occasionally too hectic or fast the odds ratio (Exp(B)) is 1.97. This odds ratio of 1.97 means that employees are twice as likely to experience low back pain if they occasionally carry out work that is too hectic or fast relative to those who never carry out such work the reference category. The odds ratio for low back pain increases to 2.6 for employees carrying out hectic work half the time and to 3.09 for employees who always report their work as being too hectic or fast, relative to those who never report such work. However, we can see from the table that only the association between low back pain and the third dummy variable (work is hectic always compared to never ) has achieved statistical significance (p < 0.05). This is confirmed when looking at the 95% confidence intervals. For employees reporting carrying out hectic work occasionally and for half the time the confidence intervals span unity; hectic work ( occasionally and half the time ) can be associated with both a decrease and an

15 increase in the risk of back pain relative to never carrying out hectic work. We can therefore conclude that hectic work is univariately associated with low back pain leading to an increasing risk with the length of time spent carrying out hectic work. The association between low back pain and work that is always hectic relative to never hectic is statistically significant. 3.3.2 Relationship Between Monotonous Work and Low Back Pain Now repeat the univariate logistic regression for the variable relating to monotonous work: 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic 4. Click on Reset to clear the previous analysis 5. Enter back pain into the Dependent variable box 6. Enter Is work monotonous? into the Covariates box The next step is to convert the four categories of monotonous work (never, occasionally, half the time, and always) into dummy variables. 7. Click on in the dialogue box 8. Enter Is work monotonous? into the Categorical Covariates box 9. Check the First box under Reference Category 10. Click on the box 11. Click on Continue 12. Click on Options 13. Ensure the CI for exp(b) box has been checked 14. Click on Continue

16 15. Click on OK to run the analysis 3.3.2.1 Interpreting the SPSS Output From the Case Processing Summary Table: How many people were excluded from the analysis (number of individuals without information about monotonous work)? From the Categorical Variables Coding Table: What was the number of individuals in each category of monotonous work? Never: Occasionally: Half the time: Always: Looking at the OTMC Table: What is the Chi-square value? What does this value tell us about the predictive properties of the model (including the predictor monotonous work)? How does this compare to the model incorporating hectic work? Looking at the Variables in the Equation Table: What are the odds ratios (and 95% confidence intervals) for the risk of low back pain according to the amount of time spent carrying out monotonous work? Never: OR = 1.0 Occasionally: OR = 95% CI = Half the time: OR = 95% CI = Always: OR = 95% CI = What does this tell us about the association between monotonous

17 work and low back pain? Conclusion: Is monotonous work univariately significantly associated with low back pain? 3.3.3 Relationship Between Stressful Work and Low Back Pain Now repeat the univariate logistic regression for the variable relating to stressful work: 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic 4. Click on Reset to clear the previous analysis 5. Enter back pain into the Dependent variable box 6. Enter Is work stressful? into the Covariates box 7. Click on in the dialogue box 8. Enter Is work stressful? into the Categorical Covariates box 9. Check the First box under Reference Category 10. Click on the box 11. Click on Continue 12. Click on Options 13. Ensure the CI for exp(b) box has been checked 14. Click on Continue

18 15. Click on OK to run the analysis 3.3.3.1 Interpreting the SPSS Output From the Categorical Variables Coding Table: What was the number of individuals in each category of stressful work? Never: Occasionally: Half the time: Always: Looking at the OTMC Table: What is the Chi-square value? What does this value tell us about the predictive properties of the model (including the predictor stressful work)? How does this compare to the other models (hectic and monotonous work)? Looking at the Variables in the Equation Table: What are the odds ratios (and 95% confidence intervals) for the risk of low back pain according to the amount of time spent carrying out stressful work? Never: OR = 1.0 Occasionally: OR = 95% CI = Half the time: OR = 95% CI = Always: OR = 95% CI = What does this tell us about the association between stressful work and low back pain? Conclusion: Is stressful work univariately significantly associated

19 with low back pain? 3.4 Are There Univariate Associations Between the Potential Confounders and Low Back Pain? We also need to identify whether the potential confounders of interest are associated with low back pain in our dataset, before adjusting for them in the multivariate logistic regression. The potential confounders include: Psychological distress: Employees who have high levels of psychological distress might be more likely to report a poor psychosocial working environment and such distress has been observed to be related to the experience of pain. Sex: Age: Females have been found to report a greater amount of pain than males and might be more likely to report working in a poor psychosocial environment. As people get older they experience a greater amount of musculoskeletal pain. In addition, older people might be more likely to report working in a poor psychosocial environment than younger people.

20 3.4.1 Relationship Between Psychological Distress and Low Back Pain The commands for univariate logistic regression will be the same except that Psychological Distress is represented by a continuous variable (there is no need to specify that dummy variables are to be created). 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic 4. Enter back pain into the Dependent variable box 5. Enter Psychological distress score into the Covariates box 6. Click on Options 7. Check the CI for exp(b) box 8. Click on Continue 9. Click on OK to run the analysis 3.4.1.1 Interpreting the SPSS Output The tables are interpreted in the same was as shown in Section 3.3.1.1. You will notice a third table representing dummy variables has not been created. From the Case Processing Summary Table we can see that four people were excluded from the analysis due to missing psychological distress data.

21 The OTMC Table shows us that the Chi-square value for this univariate model is statistically significant (p < 0.0005). Therefore, there is less than a 0.001% chance that a Chi-square value this large would happen by chance alone. The logistic regression model with psychological distress predicts low back pain significantly well. We can also see that psychological distress has a larger Chi-square value (27.092) than the three models containing the independent variables representing psychosocial working environment (more than three times greater than that obtained for hectic work). If we consider the final table (Variables in the Equation Table), we can see that the association between psychological distress and low back pain is significant (p < 0.0005). We can also see that the Exp(B) or odds ratio for this association is 1.098. For each unit increase in psychological distress score the increased risk of having low back pain is 9.8%. We can also see that the 95% confidence interval for this odds ratio does not span unity, consistent with the hypothesis that psychological distress is significantly associated with low back pain. We can therefore conclude that psychological distress is univariately associated with low back pain.

22 3.4.2 Relationship Between Age and Low Back Pain Now repeat the univariate logistic regression for the variable relating to age: 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic 4. Click on Reset to clear the previous analysis 5. Enter back pain into the Dependent variable box 6. Enter Age into the Covariates box 7. Click on Options 8. Ensure the CI for exp(b) box has been checked 9. Click on Continue 10.Click on OK to run the analysis 3.4.2.1 Interpreting the SPSS Output How many people were excluded from the analysis due to missing information about age? What is the Chi-square value for the goodness-of-fit of the model including age? What does this value tell us about the predictive properties of the model (including age as an independent variable)?

23 What are the odds ratio and 95% confidence interval for the association between age and low back? What does this tell us about the association between age and low back pain? Conclusion: Is age univariately significantly associated with low back pain? 3.4.3 Relationship Between Sex and Low Back Pain Now repeat the univariate logistic regression for the variable relating to sex: 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic 4. Click on Reset to clear the previous analysis 5. Enter back pain into the Dependent variable box 6. Enter Sex into the Covariates box 7. Click on Options 8. Ensure the CI for exp(b) box has been checked 9. Click on Continue 10. Click on OK to run the analysis

24 Note: Because sex is a dichotomous categorical variable it is not necessary to specify dummy variables in SPSS. 3.4.3.1 Interpreting the SPSS Output How many people were excluded from the analysis due to missing information about sex? What is the Chi-square value for the goodness-of-fit of the model including sex? What does this value tell us about the predictive properties of the model (including sex as an independent variable)? What are the odds ratio and 95% confidence interval for the association between sex and low back (Note that SPSS will always take the smaller category as the referent. Hence males (1) form the referent group and the excess risk is assessed in females (2))? What does this tell us about the association between sex and low back pain? Conclusion: Is sex univariately significantly associated with low back pain?

25 3.5 What Are the Independent Effects of the Predictor Variables (Psychological Working Environment) on Low Back Pain? The full model will measure the independent effects of each of the predictor variables representing psychological working environment (hectic work, monotonous work, and stressful work) on the outcome (low back pain) after adjusting for the potential confounders (psychological distress and age). The model will be constructed using an unconditional multivariate logistic regression in SPSS. 3.5.1 Multivariate Logistic Regression in SPSS 1. Click on Analyze 2. Select Regression 3. Select Binary Logistic 4. Click on Reset to clear the previous analysis 5. Enter back pain into the Dependent variable box Now we must enter all the independent variables into the Covariates box. 6. Enter Is work hectic?, Is work monotonous?, Is work stressful?, Psychological distress; and age into the Covariates box It will be necessary to specify that hectic work, monotonous work, and stressful work are all categorical variables requiring dummy variables: 7. Click on the Categorical box 8. Enter Is work hectic?, Is work monotonous? and Is work stressful? into the Categorical Covariates box 9. For each variable: Check the First box under Reference Category and press Change

26 You should now have the following dialogue box: 10. Click on Continue to return to the main dialogue box We will also want to display the confidence limits for the Exp(B) coefficients (odds ratios): 11. Click on Options 12. Ensure the CI for exp(b) box has been checked 13. Click on Continue Before running the multivariate logistic regression it is important to discuss the method of carrying out the regression. The default method of conducting the regression is the Forced Entry Method. This method of regression places all the covariates (predictors) into the regression model in one block. An alternative method of regression (this is also true for multiple linear regression) is to use stepwise procedures (either forward or backward). These procedures are described in detail in Discovering Statistics using SPSS for Windows by Andy Field (Sage Publications, 2001 Chapter 5).

27 When forward stepwise regression is applied, the computer begins with a model that only includes the constant and then adds single predictors into the model based on a specified criterion. This criterion is the value of a score statistic: The variable with the most significant score statistic is added to the model. The computer proceeds until none of the remaining predictors have a significant score statistic (significance = p <0.05). At each step, the computer examines the variables entered into the model to see whether any should be removed (there are three removal criteria). Predictors in the model that have significance values above the default removal criterion of 0.1 will be removed. When backward stepwise regression is applied the same removal criteria are used, but instead of starting the model with only the constant, it begins the model with all the predictors included. The computer then tests whether any of these predictors can be removed from the model without having a substantial effect on how well the model fits the observed data. The first predictor to be removed will be the one that has the least impact on how the model fits the data. The main consideration of the choice of method of regression is whether you are testing a theory or merely carrying out exploratory work. Stepwise procedures are most appropriate when carrying out exploratory work (where no previous research exists on which to base hypotheses for testing and in situations where causality is not of interest and you merely wish to find a model to fit your data). Because the current example has a clear hypothesis (a poor psychosocial working environment is associated with back pain) that is supported by previous research, we will use the Forced Entry method of logistic regression. 14. Click on OK to run the analysis

28 3.5.1.1 Interpreting the SPSS Output The tables are interpreted in the same was as shown in Section 3.3.1.1. From the Case Processing Summary Table: How many people were excluded from the analysis due to missing information about one or more of the predictor variables? From the OTMC Table: Does the model including the five explanatory variables significantly predict the variation in low back pain? Is the model any better at predicting the variation in low back pain than any single predictor (from the univariate analysis)?

29 From the Variables in the Equation Table: What has been the effect on the association between hectic work and low back pain of adjusting for the confounders and the other predictors? What does this tell us about the relationship between hectic work and low back pain? How is finding work stressful all the time associated with low back pain? How is finding work monotonous all the time associated with low back pain?

Are the two confounders still significantly associated with low back pain? 30