FACTOR ANALYSIS Factor Analysis 2006

Similar documents
+ Statistics to this point

Principal Components Factor Analysis in the Literature. Stage 1: Define the Research Problem

Subescala D CULTURA ORGANIZACIONAL. Factor Analysis

CHAPTER VI RESEARCH METHODOLOGY

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Subescala B Compromisso com a organização escolar. Factor Analysis

APÊNDICE 6. Análise fatorial e análise de consistência interna

Factor Analysis. MERMAID Series 12/11. Galen E. Switzer, PhD Rachel Hess, MD, MS

Chapter 17: Exploratory factor analysis

Extraversion. The Extraversion factor reliability is 0.90 and the trait scale reliabilities range from 0.70 to 0.81.

Teachers Sense of Efficacy Scale: The Study of Validity and Reliability

Internal structure evidence of validity

Exploratory Factor Analysis

RESULTS. Chapter INTRODUCTION

THE DIMENSIONALITY OF THE AARHUS UNIVERSITY QUALITY IN THE PHD PROCESS SURVEY

Psychometric Instrument Development

Factorial Validity and Reliability of 12 items General Health Questionnaire in a Bhutanese Population. Tshoki Zangmo *

Reliability of Ordination Analyses

Paul Irwing, Manchester Business School

The Development of Scales to Measure QISA s Three Guiding Principles of Student Aspirations Using the My Voice TM Survey

Small Group Presentations

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Clustering Autism Cases on Social Functioning

isc ove ring i Statistics sing SPSS

Computing composite scores of patients report of health professional behaviour Summary, Methods and Results Last updated 8 March 2011

Psychometric Instrument Development

Key words: State-Trait Anger, Anger Expression, Anger Control, FSTAXI-2, reliability, validity.

Identifying or Verifying the Number of Factors to Extract using Very Simple Structure.

Psychometric Instrument Development

Psychometric Instrument Development

Self-Compassion, Perceived Academic Stress, Depression and Anxiety Symptomology Among Australian University Students

Preliminary Conclusion

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

What Causes Stress in Malaysian Students and it Effect on Academic Performance: A case Revisited

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Chapter 9. Youth Counseling Impact Scale (YCIS)

Multifactor Confirmatory Factor Analysis

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS

Daniel Boduszek University of Huddersfield

Psychometric Properties of Farsi Version State-Trait Anger Expression Inventory-2 (FSTAXI-2)

Use of Structural Equation Modeling in Social Science Research

HANDOUTS FOR BST 660 ARE AVAILABLE in ACROBAT PDF FORMAT AT:

LISREL analyses of the RIASEC model: Confirmatory and congeneric factor analyses of Holland's self-directed search

Principles of Exploratory Factor Analysis1

PERSONAL SALES PROCESS VIA FACTOR ANALYSIS

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

AN EMPIRICAL STUDY OF YOUTHS AND CRIME

IS THE POTENTIAL BEING FULLY EXPLOITED? ANALYSIS OF THE USE OF EXPLORATORY FACTOR ANALYSIS IN MANAGEMENT RESEARCH: YEAR 2000 TO YEAR 2006 PERSPECTIVE.

International Conference on Humanities and Social Science (HSS 2016)

Psychology Research Methods Lab Session Week 10. Survey Design. Due at the Start of Lab: Lab Assignment 3. Rationale for Today s Lab Session

Dimensionality, internal consistency and interrater reliability of clinical performance ratings

Aesthetic Response to Color Combinations: Preference, Harmony, and Similarity. Supplementary Material. Karen B. Schloss and Stephen E.

The Harris-Westin s Index of General Concern About Privacy: An Attempted Conceptual Replication.

Development and Psychometric Properties of the Relational Mobility Scale for the Indonesian Population

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

Measuring Self-Serving Cognitive Distortions: An analysis of the. Psychometric Properties of the How I think Questionnaire (HIT-16-Q)

Introduction to Factor Analysis. Hsueh-Sheng Wu CFDR Workshop Series June 18, 2018

Running head: NESTED FACTOR ANALYTIC MODEL COMPARISON 1. John M. Clark III. Pearson. Author Note

The complete Insight Technical Manual includes a comprehensive section on validity. INSIGHT Inventory 99.72% % % Mean

Basic concepts and principles of classical test theory

Religiosity and Death Anxiety

Connectedness DEOCS 4.1 Construct Validity Summary

Exploratory Factor Analysis Student Anxiety Questionnaire on Statistics

Personality and Individual Differences

A Review and Evaluation of Exploratory Factor Analysis Practices in Organizational Research

CHAPTER 4 RESEARCH RESULTS

A CONSTRUCT VALIDITY ANALYSIS OF THE WORK PERCEPTIONS PROFILE DATA DECEMBER 4, 2014

Factor analysis of alcohol abuse and dependence symptom items in the 1988 National Health Interview survey

The measurement of media literacy in eating disorder risk factor research: psychometric properties of six measures

Basic SPSS for Postgraduate

CHAPTER ONE CORRELATION

The Construct Validity and Internal Consistency of the Adult Learning Inventory (AL-i) among Medical Students

was also my mentor, teacher, colleague, and friend. It is tempting to review John Horn s main contributions to the field of intelligence by

Multiple Act criterion:

2013 Supervisor Survey Reliability Analysis

CHAPTER TWO REGRESSION

Neurotic Styles and the Five Factor Model of Personality

Unit 1 Exploring and Understanding Data

One-Way Independent ANOVA

Comparison of Factor Score Computation Methods In Factor Analysis

Measuring mathematics anxiety: Paper 2 - Constructing and validating the measure. Rob Cavanagh Len Sparrow Curtin University

Confirmatory Factor Analysis. Professor Patrick Sturgis

Confirmatory Factor Analysis of Preschool Child Behavior Checklist (CBCL) (1.5 5 yrs.) among Canadian children

The Youth Experience Survey 2.0: Instrument Revisions and Validity Testing* David M. Hansen 1 University of Illinois, Urbana-Champaign

Technical Whitepaper

Chapter 4 Data Analysis & Results

While many studies have employed Young s Internet

(C) Jamalludin Ab Rahman

Oak Meadow Autonomy Survey

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Testing the Multiple Intelligences Theory in Oman

A critical look at the use of SEM in international business research

Surveys: What Does One Need to Know More About Them? By Harjanto Djunaidi 1. A Selected Paper

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Validity and Reliability of Sport Satisfaction

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW

Transcription:

FACTOR ANALYSIS Factor Analysis 2006 Lecturer: Timothy Bates Tim.bates@ed.ac.uk Lecture Notes based on Austin 2005 Bring your hand out to the tutorial Please read prior to the tutorial A statistical tool to account for variability in observed traits in terms of a smaller number of factors Factor = "unobserved random variable" Measured item = Observed random variable Values for an observation are recovered (with some error) from a linear combination of (usually much smaller set of) extracted factors. 1 2 Visually FA as a Data reduction technique Simplify complex multivariate datasets by finding natural groupings within the data May correspond to underlying dimensions. Subsets of variables that correlate strongly with each other and weakly with other variables in the dataset. Natural groupings (factors) can assist the theoretical interpretation of complex datasets Theoretical linkage of factors to underlying (latent) constructs, e.g. extraversion, liberal attitudes, interest in ideas, ability 3 4 1

EAMPLE DATASET 210 students produced self-ratings on a list of trait adjectives. Correlations above 0.2 marked in bold 1 2 3 4 5 6 7 8 9 10 11 2 0.27 3 0.37 0.53 4 0.40 0.30 0.38 5 0.17-0.07-0.09-0.08 6 0.17-0.05-0.06 0.10 0.59 7 0.19 0.01-0.05 0.05 0.38 0.42 8 0.06-0.02-0.02 0.02 0.51 0.54 0.48 9-0.25-0.05-0.15-0.20-0.06-0.11-0.14-0.07 10-0.24-0.10-0.09-0.10-0.03-0.02-0.13 0.08 0.38 11-0.21-0.08-0.22-0.12 0.00-0.03-0.07 0.03 0.49 0.38 12-0.01 0.02-0.10-0.04 0.07 0.09 0.06 0.04 0.34 0.40 0.40 1. ASSERTIVE, 2. TALKATIVE, 3.ETRAVERTED, 4. BOLD 5. ORGANIZED 6. EFFICIENT, 7. THOROUGH, 8. SYSTEMATIC 9. INSECURE 10. SELF-PITYING, 11 NERVOUS, 12. IRRITABLE Clear structure in this sorted matrix How easy would this be to see in a larger matrix? 5 THE THREE FACTORS FROM THE EAMPLE DATA I (C) II (N) III (E) EFFICIENT 0.82 ORGANIZED 0.80 SYSTEMATIC 0.79 THOROUGH 0.71 NERVOUS 0.75-0.15 IRRITABLE 0.14 0.73 INSECURE -0.14 0.73-0.16 SELF -PITYING 0.72 ETRAVERTED -0.12-0.10 0.79 TALKATIVE 0.75 BOLD 0.69 ASSERTIVE 0.24-0.21 0.65 The numbers are factor loadings = correlation of each variable with the underlying factor. Loadings less than 0.1 omitted.) Can construct factor score (multiplied factor loadings) N =(0.75*Nervous) + (.73*Irritable) + (.73*Insecure) + (.72*Self-pity) (.10*Extraverted) (.21*Assertive) Main loadings are large and highly significant. Smaller (cross-)loadings may be informative. Factors are close to simple structure. 6 OBJECTIVES AND OUTCOMES OF FACTOR ANALYSIS Aim of factor analysis is to objectively detect natural groupings of variables (factors) Can deal with large matrices, uses (reasonably) objective statistical criteria. Can obtain quantitative information e.g. factor scores. Factors are (should be) of theoretical interest. In the example the factors correspond to the personality traits of Extraversion, Neuroticism and Conscientiousness Exploratory method, uncovering structure in data Confirmatory factor analysis (model testing) is also possible. SOME TECHNICAL REQUIREMENTS FOR A FACTOR ANALYSIS TO BE VALID AND USEFUL Simple structure Each item loads highly on one factor and close to zero on all others Factors have a meaningful theoretical interpretation Rotation Factors retain most of the variance in the raw data Parsimony compared to starting variables achieved without loss of explanatory power Factors are Replicable 7 8 2

Assumptions DATA QUALITY Large enough sample So that the correlations are reliable Somewhat normal variables, No outliers No variables uncorrelated with any other No variables correlated 1.0 with each other Remove one of each problematic pair, or use sum if appropriate. Sample Size Rough rule is that 300 is OK, smaller numbers may be OK. Subjects/variables ratio Much discussion (less agreement) Values between 2:1 and 10:1 have been proposed as a minimum. Simulations suggest that overall sample size is more important. Well-defined factors (large loadings) will replicate in smaller samples than poorly-defined ones (small loadings) 9 10 STAGES OF ANALYSIS Examine data for outliers and correlations Choose number of factors Scree plot Rotate factors if necessary Interpret factors Obtain scores Check reliability of scales defining factors Further experiments to validate factors Partitioning item variance Variance of each item can be thought of in three partitions: 1. Shared variance Common variance, explained by factors + Unique variance Not explained by other factors 2. Specific variance 3. Error variance Communality The proportion of common variance for a given variable Sum of squares of item factor loadings Large communalities are required for a valid and useful factor solution 11 12 3

Computing a Factor Analysis Two main approaches Differ in estimating communalities Principal components Simplest computationally Assumes all variance is common variance (implausible) but gives similar results to more sophisticated methods. SPSS default. Principal factor analysis Estimates communalities first 13 How many Factors? Initially unknown Needs to be specified by the investigator on the basis of preliminary analysis No 100% foolproof statistical test for number of factors Similar problems with other multivariate methods 14 How many factors? EQ Scree There are potentially as many factors as items We don t want to retain factors which account for little variance. Most commonly-used method to decide the number of factors is the scree plot of the eigenvalues Variance explained by each factor. A point of inflection or kink or in the scree plot is a good method of making a cut-off Scree plot for emotional intelligence items 8 7 6 5 4 3 2 1 0 0 2 4 6 8 10 12 14 15 16 4

Goldberg Scree Scree diagram for Goldberg trait adjectives 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 Food and health Scree Scree diagram for food and health behaviour items 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0 2 4 6 8 10 12 14 16 17 18 IQ Scree OTHER METHODS FOR FACTOR NUMBERS Scree plot for ability test scores, Swedish Twin Study 6 5 4 3 2 1 0 0 2 4 6 8 10 12 14 19 Eigenvalues > 1 Eigenvalues sum to the number of items, so an eigenvalue of >1 = more informative than a single average item Not a useful guide in practice Parallel Analysis Repeatedly randomise the correlation matrix and determine how large an eigenvalue appears by chance in many thousands of trials. Excellent method Theory-driven Extract a number of factors based on theoretical considerations Hard to justify 20 5

How to align the factors? The initial solution is unrotated Two undesirable features make it hard to interpret: Designed to maximise the loadings of all items on the first factor Most items have large loadings on more than one factor Hides groupings in the data UNROTATED FACTORS FOR THE EAMPLE DATA I II III EFFICIENT 0.45 0.69 0.02 ORGANIZED 0.37 0.71-0.04 SYSTEMATIC 0.37 0.70 0.04 THOROUGH 0.45 0.55-0.02 NERVOUS -0.56 0.33 0.40 IRRITABLE -0.34 0.37 0.56 INSECURE -0.62 0.21 0.38 SELF-PITYING -0.52 0.28 0.42 ETRAVERTED 0.46-0.41 0.51 TALKATIVE 0.36-0.31 0.58 BOLD 0.48-0.24 0.45 ASSERTIVE 0.64-0.10 0.33 21 22 ROTATION DETAIL (1) Rotation shows up the groups of items in the data. Orthogonal rotation Factors remain independent Oblique rotation Factors allowed to correlate Theoretical reasons to choose a type of rotation (e.g. for intelligence test scores); May explore both types Choose oblique if there are large correlations between factors, orthogonal otherwise. 23 Item loadings on the first 2 factors +1 N -1-1 C 24 +1 6

Lack of Simple Structure +1 N Rotation Defines New Axes Which Reveal the Item Groups -1 C +1-1 25 26 Oblique Rotation ROTATION -DETAIL (2) Rotated and un-rotated solutions are mathematically equivalent Rotation is performed for purposes of interpretation. Most common types: Orthogonal Varimax (maximizes squared column variance) Most common Oblique Direct oblimin 27 28 7

INTERPRETING FACTORS Done on the basis of large loadings Often taken to be above 0.3. Size of loading which should be considered substantive is sample-size dependent. For large samples loadings of 0.1 or below may be significant but do not explain much variance. Well-defined factor should have at least three high-loading variables Existence of factors with only one or two large loadings indicates factors over-extracted, or multi co-linearity problems. Assigning meaning to factors. FACTOR SCORES Factor scores Estimate of each subject s score on the underlying latent variable Calculated from the factor loadings of each item. Simple scoring methods Often used for, e.g., personality questionnaires is to sum the individual item scores (reverse-keying where necessary). This method is reasonable when all variables are measured on the same scale; What if you have a mix of items measured on different scales? (e.g. farmer s extraversion score, farm annual profit, farm area). 29 30 EAMPLE FACTOR STRUCTURE OF DIETARY BEHAVIOUR Research question: Is there a dimension of healthy vs. unhealthy diet preferences? (Mac Nicol et al 2003) 451 schoolchildren completed a 35-item questionnaire mainly on food items regularly consumed (also some general health behaviour items) Subjects:variables 12.9. Population not representative for SES. Scree suggested three factors, two diet related F1: Unhealthy foods (chips, fizzy drinks etc) F2 Healthy foods (fruit, veg etc) Validation Higher SES and better nutrition knowledge associated with healthier eating patterns. Factor reliabilities low Problem of yes/no items Sample in-homogeneity. EAMPLE FACTOR STRUCTURE OF AN EI SCALE How many factors in a published emotional intelligence scale, and can it be improved by adding more items? (Saklofske, etal. 2003; Austin et al., 2004). 354 undergraduates completed a 33-item EI scale for which previous findings on the factor structure had given contradictory results. Scree plot (and some confirmatory modelling) suggested four factors, one with poor reliability. The factor structure has been replicated although other factor structures have been reported. A longer 41-item version of the same scale was constructed with more reverse-keyed items than the original scale, and also with additional items targeted on the low-reliability factor (utilisation of emotions). Completed by 500 students and was found to have a three-factor structure. Reliability of utilisation subscale increased, but still below 0.7. 31 32 8

EAMPLE 4 ABNORMAL PERSONALITY EAMPLE - THE ATTITUDES TO CHOCOLATE QUESTIONNAIRE How does personality disorder relate to normal personality? Deary et al. (1998). Scale-level analysis of DSM-III-R personality disorders & EPQ-R Sample = 400 students Joint analysis gives four factors: N+ Borderline, Self-defeating, Paranoid P+ Antisocial, Passive-aggressive, Narcissistic E+ avoidant(-), histrionic P(-) Obsessive-compulsive, Narcissistic 80 items on attitudes to chocolate were constructed using interviews and related literature. Aspects assessed included difficulty controlling consumption, positive attitudes, negative attitudes, craving. Self-report chocolate consumption was obtained; participants also performed a bar-pressing task with chocolate button reinforcements delivered on a progressive ratio schedule. Factor analysis gave three factors (eigenvalue 1 criterion) 33.2%, 14.1% & 6.1% of the variance. Third scale had low reliability Probably over-factored. Follow up paper (Cramer & Hartleib, 2001) has confirmed the first two factors. 33 34 Factors Found End of Lecture I 1. Craving I like to indulge in chocolate I often go into a shop for something else and end up buying chocolate), 2. Guilt I feel guilty after eating chocolate 3. Functional approach I eat chocolate to keep my energy levels up when doing physical exercise. High-craving individuals reported Consuming more bars per month Were prepared to work harder to get chocolate buttons Methodology 16 th November Change of Location Lecture Theatre 5 Appleton Tower 35 36 9

INTERPRETING FACTORS Done on the basis of large loadings Often taken to be above 0.3. Size of loading which should be considered substantive is sample-size dependent. For large samples loadings of 0.1 or below may be significant but do not explain much variance. Well-defined factor should have at least three high-loading variables Existence of factors with only one or two large loadings indicates factors overextracted, or multi-colinearity problems. Assigning meaning to factors. FACTOR SCORES Factor scores Estimate of each subject s score on the underlying latent variable Calculated from the factor loadings of each item. Simple scoring methods Often used for, e.g., personality questionnaires is to sum the individual item scores (reversekeying where necessary). This method is reasonable when all variables are measured on the same scale; What if you have a mix of items measured on different scales? (e.g. farmer s extraversion score, farm annual profit, farm area). 37 38 USING FACTORS Scale Reliability Naming use content of high-loading items as a guide Assess internal reliability for each factor Scores unit weighting best for comparison between samples Validation do factor scores correlate as expected with other variables? Issues of convergent/divergent validity with other tests if relevant. Factor Derived Scales can be assessed as with any other scale For instance using Cronbach s Alpha Check alpha if item deleted to identify poorly-functioning items Adequate reliability is defined as 0.7 or above 39 40 10

STATISTICAL TESTS FOR DATA QUALITY Bartlett s test of sphericity Kaiser-Meyer-Olkin test of sampling adequacy Range = 0.0-1.0 Bartlett s test of sphericity. Tests that the correlations between variables are greater than would be expected by chance p-value should be significant i.e., the null hypothesis that all offdiagonal correlations are zero is falsified 41 42 Identity matrix Matrices 1s on the diagonal and zeros elsewhere. Each item correlates only with itself Bartlett s test tests that the matrix is significantly different from an identity matrix. Singular matrix A matrix in which one or more offdiagonal elements = 1 Cannot be factor analysed Solution = remove duplicate items. KMO Sampling Adequacy Range = 0.0-1.0 Should be > 0.5 Low values indicate diffuse correlations with no substantive groupings. KMO statistics for each item Item values below 0.5 indicate item does not belong to a group and may be removed 43 44 11

SPSS Example raw data Path to follow is analyse, data reduction, factor. ETRACTION Select scree plot for initial run. Choose number of factors. ROTATION Select rotation method Increase number of iterations for rotation if necessary (default 25) DESCRIPTIVES KMO and Bartlett tests Reproduced correlations and residuals SCORES Save as variables 45 46 Example correlations Example: KMO Bartlett 47 48 12

Example: Eigenvalues Example: Scree plot 49 50 Example components External Validity Factor scores can be used in further analyses e.g. are there M/F differences in scores on N, E, C? Do the factor scores correlate with other measures Exam anxiety, subjective reports of life quality, number of friends, exam success Biological Validity Map onto brain structures, neurotransmitters, genes 51 52 13

How to assess FA Adequacy of sample size Sample size Two things matter: ratio of subjects to Items Total sample size Item to subject ratio is important Can get away with smaller numbers when communalities are high (i.e. factors well-defined) Restriction of range (subjects too similar) reduces correlations Items per factor. Need at least three per factor, four is better. Some published analyses discuss factors with only one item loading! Use of eigenvalue>1. Often seen in papers where factor number comes out implausibly high. Rotation. Orthogonal forced when oblique should have been tried. Scores. SPSS and other packages give scores which are sample-dependent. Use of unit weighting of items is better practice. 50 very poor 100 poor 200 fair 300 good 500 very good >1000 excellent Comfrey and Lee (1992, p. 217) 53 54 Item-subject ratios. Nested Analysis With too many items and too few subjects, the data are over-fitted g Unreplicable results Bobko & Schemmer, 1984 Subjects to items 5:1 (Gorsuch, 1983, p.332; Hatcher, 1994, p. 73) 10:1 (Nunnally, 1978, p. 421) g f d g c g r g s Subjects to parameters measures MacCallum, Widaman, Preacher, & Hong (2001) Subject: factor ratio Item communalities Specific tests Item loadings 55 56 14

Structural Equation Modeling & Factor Analysis SEM incorporates path analysis and factor analysis A confirmatory factor analysis is an SEM model in which each factor (latent variable) has multiple indicators but there are no direct effects (straight arrows) connecting the variables Factor Analysis & Path Analysis SEM can be extended to models where each latent variable has several indicators, and there are paths specified connecting the latent variables. 57 58 Example 1 Example 2 59 60 15

Example 3 Summary What is factor analysis? Statistical method Accounting for variability in observed traits ("observed random variables") In terms of a smaller number of factors ("unobserved random variables") Allows recovery of values for a subject from a linear combination of the extracted factors. (with some error) Can think of the factors as Independent and items as dependent variables 61 62 Summary cont. What is a scree plot? What is an identity matrix? What are communalities? What is a factor loading? What is a factor score? Bartlett s test of sphericity? KMO? What is a good number of subjects? Why do we rotate factors? Does FA test causes? How can we model and test causes and (model latent structure?) 63 16