Making a psychometric. Dr Benjamin Cowan- Lecture 9

Similar documents
Reliability and Validity checks S-005

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

By Hui Bian Office for Faculty Excellence

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1)

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Unit outcomes. Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2018 Creative Commons Attribution 4.0.

Problem Situation Form for Parents

Worries and Anxiety F O R K I D S. C o u n s e l l i n g D i r e c t o r y H a p p i f u l K i d s

Designing a Questionnaire

Survey Research. We can learn a lot simply by asking people what we want to know... THE PREVALENCE OF SURVEYS IN COMMUNICATION RESEARCH

ADMS Sampling Technique and Survey Studies

Product Interest and Engagement Scale, Beta (PIES-beta): Initial Development

CHAPTER III RESEARCH METHODOLOGY


Internal structure evidence of validity

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Psychometric Properties of the Mean Opinion Scale

Procedia - Social and Behavioral Sciences 205 ( 2015 ) th World conference on Psychology Counseling and Guidance, May 2015

FACTOR ANALYSIS Factor Analysis 2006

Psychometric Instrument Development

Psychometric Instrument Development

Examining the Psychometric Properties of The McQuaig Occupational Test

Lecture Week 3 Quality of Measurement Instruments; Introduction SPSS

Psychometric Instrument Development

Higher Psychology RESEARCH REVISION

Dimensionality, internal consistency and interrater reliability of clinical performance ratings

Measurement and Descriptive Statistics. Katie Rommel-Esham Education 604

Importance of Good Measurement

Psychometric Instrument Development

Comparing 3 Means- ANOVA

Cognitive Restructuring

CHAPTER 3 METHOD AND PROCEDURE

Validation of Scales

Child Outcomes Research Consortium. Recommendations for using outcome measures

PSYCHOMETRIC PROPERTIES OF CLINICAL PERFORMANCE RATINGS

Research Approaches Quantitative Approach. Research Methods vs Research Design

Project 3: Design a Questionnaire

MN 400: Research Methods CHAPTER 8. Survey Methods: Communication with Participants

Designing Experiments... Or how many times and ways can I screw that up?!?

Chapter 6. Methods of Measuring Behavior Pearson Prentice Hall, Salkind. 1

Free Choice or Forced Choice? Your Choice

Identifying or Verifying the Number of Factors to Extract using Very Simple Structure.

Personality and Individual Differences

HARRISON ASSESSMENTS DEBRIEF GUIDE 1. OVERVIEW OF HARRISON ASSESSMENT

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Adult Autism Team NHSGG&C. Dr Gwen Jones Edwards Anne Marie Gallagher

Chapter 9: Comparing two means

International Conference on Humanities and Social Science (HSS 2016)

Quality of Life Assessment of Growth Hormone Deficiency in Adults (QoL-AGHDA)

Teachers Sense of Efficacy Scale: The Study of Validity and Reliability

Psychometric properties of the Chinese quality of life instrument (HK version) in Chinese and Western medicine primary care settings

Factor Analysis of Gulf War Illness: What Does It Add to Our Understanding of Possible Health Effects of Deployment?

Attention and Concentration Problems Following Traumatic Brain Injury. Patient Information Booklet. Talis Consulting Limited

Take new look emotions we see as negative may be our best friends (opposite to the script!)

CHAPTER 2. RESEARCH METHODS AND PERSONALITY ASSESSMENT (64 items)

Reliability, validity, and all that jazz

VARIABLES AND MEASUREMENT

Attitude = Belief + Evaluation. TRA/TPB and HBM. Theory of Reasoned Action and Planned Behavior. TRA: Constructs TRA/TPB

So far. INFOWO Lecture M5 Homogeneity and Reliability. Homogeneity. Homogeneity

Chapter 17: Exploratory factor analysis

Chapter -6 Reliability and Validity of the test Test - Retest Method Rational Equivalence Method Split-Half Method

Chapter 9. Youth Counseling Impact Scale (YCIS)

Motivational Interviewing

The measurement of media literacy in eating disorder risk factor research: psychometric properties of six measures

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

PSYC2600 Lecture One Attitudes

A report about. Anxiety. Easy Read summary

1. Before starting the second session, quickly examine total on short form BDI; note

Key words: State-Trait Anger, Anger Expression, Anger Control, FSTAXI-2, reliability, validity.

ANSWERS: Research Methods

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

Factor Analysis. MERMAID Series 12/11. Galen E. Switzer, PhD Rachel Hess, MD, MS

She has an extensive psychiatric history, with numerous admissions, and minor selfharm.

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

DATA is derived either through. Self-Report Observation Measurement

The Relationship between YouTube Interaction, Depression, and Social Anxiety. By Meredith Johnson

Validity. Ch. 5: Validity. Griggs v. Duke Power - 2. Griggs v. Duke Power (1971)

Coeliac Disease: Eating Attitudes and Behaviours. Rosie Satherley, Professor Suzanne Higgs, Dr. Ruth Howard University of Birmingham

Evaluating Quality in Creative Systems. Graeme Ritchie University of Aberdeen

Meeting-5 MEASUREMENT 8-1

Measurement. 500 Research Methods Mike Kroelinger

What Constitutes a Good Contribution to the Literature (Body of Knowledge)?

Talking to someone who might be suicidal

Towards an instrument for measuring sensemaking and an assessment of its theoretical features

Value of emotional intelligence in veterinary practice teams

Measuring Anxiety Towards Wiki Editing: Investigating the Dimensionality of the Wiki Anxiety Inventory-Editing

PSI R ESEARCH& METRICS T OOLKIT. Writing Scale Items and Response Options B UILDING R ESEARCH C APACITY. Scales Series: Chapter 3

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Communication. Jess Walsh

Reliability AND Validity. Fact checking your instrument

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Introduction to Survey Research. Clement Stone. Professor, Research Methodology.

CHAPTER III RESEARCH METHOD. method the major components include: Research Design, Research Site and

CHAPTER 3 METHODOLOGY, DATA COLLECTION AND DATA ANALYSIS

These questionnaires are used by psychology services to help us understand how people feel. One questionnaire measures how sad people feel.

Review and Wrap-up! ESP 178 Applied Research Methods Calvin Thigpen 3/14/17 Adapted from presentation by Prof. Susan Handy

Validity refers to the accuracy of a measure. A measurement is valid when it measures what it is suppose to measure and performs the functions that

Personality Traits Effects on Job Satisfaction: The Role of Goal Commitment

Transcription:

Making a psychometric Dr Benjamin Cowan- Lecture 9

What this lecture will cover What is a questionnaire? Development of questionnaires Item development Scale options Scale reliability & validity Factor Analysis

What is a questionnaire? Some concepts are difficult to measure directly using measurements like time, accuracy etc Attitudes, emotions, opinions We need to design psychometrics for these if we are to research them

Why would we want to make a psychometric? If we are looking at a new concept that hasn t been measured before Happens a lot in HCI with developments of new technologies Because a metric needs to measure something specific for it to have value, we need to design or tweak existing measures for new technologies Need to add items and re-test

Example- Anxiety towards facebook posting Let s say we wanted to make a measure of how anxious people were about posting to facebook This measure (our questionnaire) is made of attitude phrases (or items).

Stages of item development Literature review What are the key concepts in studying anxiety? Measure review What is available? How is anxiety currently measured? Focus groups/interviews What is important in facebook anxiety? Questions about facebook and negative emotions Gives an indication of how people describe the concepts, thus improving item wording

Generating Items- Interviews Conversation with a purpose 4 main types Unstructured Semi Structured Structured Group

Unstructured interviews Exploratory Talk around an area Planning the areas for discussion rather than specific questions Can explore topics as they come up

Structured Interviews Predetermined questions Standardised for all interviewees

Semi Structured Interviews Basic script used with all participants Mix of Structured and Unstructured Interview There are some questions that are covered with all and the rest is a free flowing conversation

What interview type to use? Depends on: How specific you need to get Purpose of the interview

Stages of item development This will allow you to get an idea of: Potential items Potential categories that need to be covered (factors) Pilot study Large number of items Participants rate: Clarity of wording Clarity of concept in the item Experts in the area to review items

The good, the bad, the ugly Good item Clear, well worded, one concept, to the point. I feel stressed when using facebook Bad item Can be clearly worded but does not cover one concept I feel stressed because of so many people on facebook and it is hard to use Ugly item Poorly worded and doesn t cover one concept Stress is something I feel all of the time when using facebook because people on it are plentiful and it s difficult This can happen when questionnaires are mis-translated.

Common scales used Likert Scales (Likert, 1926) 3 point, 5 point, 7 point, 9 point More points, the larger the variance of responses on item Arguments over which is best but 5 point is most common The use of a neutral point is also debated Semantic Differential Uses two polar opposite adjectives at the end of a scale Which to use? Strong-Not Strong (bad) Strong- Weak (good)

Important concepts in item response Response Acquiescence set A propensity for participants to answer positively to items Balancing psychometric as much as possible (positively and negatively worded items) Item Randomisation Social Desirability Responding with what you feel is socially appropriate

So We have our items We have piloted them with participants We now need to assess how good our questionnaire (or psychometric) is Good psychometrics have: High reliability High validity Possess a set of norms (baselines/guides)

Reliability Stability of the test score over time Test-Retest Reliability Internal consistency of the test Internal consistency reliability The extent to which the items are measuring the same underlying concept

Test-Retest Reliability Test at Time 1 6 month gap Test at Time 2 Testing same participants on the measure on two occasions Scores are then correlated to see strength of relationship Over 0.7 is good test- retest reliability

Why would the correlation not be perfect? Between times there may be changes on the variables Some people may have become less anxious over time Test Error N feeling ill, bored, tired.

Internal consistency reliability The extent to which each item measures the same underlying concept In our facebook posting anxiety scale we would expect all the items to be measuring elements of anxiety not measuring usability of facebook

Internal consistency measures Split Half method Divide measure in two randomly and correlate the scores on the two halves together Cronbach alpha (most commonly used) Average correlation of all possible split half correlations. 0.7 seen as a good alpha

What can impact on this reliability The number of items More items mean more of concept can be covered Weighing up number of items and boredom 10 items considered minimum for reliable test Can a measure be too internally consistent? (Cattell, 1957) Using items which effectively measure the same thing E.g. I like facebook and Facebook is something I like They are the same item, just different wording Leads to a bloated specific

Cronbach alpha analysis The analysis looks at all correlations of the item scores with the total questionnaire score (itemtotal correlations) Items with Item-total correlations of lower than 0.3 should be removed as they do not correlate well The test output also gives us an idea of what alpha would be without each item- great for item removal

Validity of a test A test can be reliable but not valid It could be high in reliability but not measuring what it proclaims to measure It is not as simple as looking at the item wordings to deduce this We need to identify whether our measure behaves as predicted

Validity Assessment Face validity The items seem to be worded right for the concept being measured This is a poor test of validity E.g. I am quite easily distracted - looks fine but can be interpreted differently by participants Concurrent Validity Correlation of test with other benchmark test that was given at the same time Dubious when there is no clear benchmark

Validity Assessment Predictive Validity The measure is able to predict some criterion E.g. facebook anxiety relates to posting behaviour Need to be aware that modest relationships are likely Many other factors important to posting behaviour closeness of facebook friends, drunken messaging? Sometimes clear criterions are not available Beware of the difference between statistical significance and psychological significance

Construct Validity (Cronbach & Meehl, 1955) Allows a collection of results to lead us to validity conclusions rather than just one Usually the case that not all hypotheses are confirmed Validity is therefore not as equivocal as reliability Interpretive and subjective

Construct Validity (Cronbach & Meehl, 1955) Construct Validity A bank of hypotheses based on the knowledge of our concept Our Hypotheses for Facebook anxiety Should correlate positively and highly with other measures of anxiety (concurrent validity) Should correlate positively with someone s fear of negative evaluation (concurrent validity) Should not correlate with personality tests that don t measure anxiety High scorers, compared with low scorers should show less activity on facebook, and more leaving facebook (predictive validity)

Norms We need to test our measures on A significant representative proportion of the population (1000 s of respondents) A sample of people we d expect to be high or low on the measure (for discriminatory markers) This is built up over years of use

Now we have Gathered our items Assessed their reliability Assessed their validity We are assuming at present that facebook anxiety is uni-dimensional. This might not be true, there may be many factors to it, which we have picked up in our measure

What are factors? Each questionnaire item gives a score There will items that correlate heavily together Factor analysis is fundamentally used to: reduce the data into the smallest number of explanatory concepts A factor is a combination of variables, the grouping of which indicates a relationship

What are factors? Each item has a factor loading correlation of that item with the factor Some items will have high loadings, some low or no loading at all on a specific factor Loadings of 0.4 are seen as helpful in defining a factor Items should only load heavily on one factor If they don t they are candidates for rewording

Shared Variance Correlation co-efficient represents The amount of agreement (or shared variance) between two sets of scores Square the correlation coefficient to get % agreement Variable x variance Shared (Common) Variance Variable y variance

Shared Variance & Communality By squaring the factor loading we can: Identify how much shared variance there is between the item and the factor They can be thought of as the contribution that the item makes to the factor If we do this for each factor loading an item has we get the item s communality the amount of variance shared between the item and all the factors

Factor Extraction Eigenvalues Indicate the importance of the factor extracted in explaining the variance in the data There will be few with high eigenvalues and lots with low Makes sense to keep the most important factors Rule of thumb is keep factors with eigenvalues > 1 (as an eigenvalue of 1 represents a significant amount of variation). The number to extract is identified using a Scree Plot (Cattell, 1966) Y axis is eigenvalues X axis is the number of factors

Scree Plot Eigenvalues Point of Inflexion Number of Factors

Factor Rotation Looking for best fit - factor structure with clearest interpretation Sometimes this involves rotation to get the clearest, simplest factor structure A simple factor structure is one that has a few high loading items and the rest being near 0 (Cattell, 1978)

Methods of Rotation The method you choose depends on how correlated you feel the factor scores should be Based on theoretical reasoning We would expect our questionnaire To have factors- 1) anxiety about social posting, 2) anxiety about interface interaction, 3) social confidence For the scores from this to be correlated

Methods of Rotation We would therefore use a method that takes this correlation into consideration- Direct Oblimin This is an oblique method of rotation (allows the factors to correlate)

Methods of Rotation If we felt they should not correlated then we could have used Varimax method This is an example of orthogonal rotation- ensures the extracted factors are not correlated.

Considerations Sample size Number of people in the sample debated 100 for stable factors (Kline, 1999)

Using Factor Analysis in questionnaire construction Give participants questionnaire Conduct factor analysis Any that load highly on more than one factor, check for concept clarity Check that all those with loadings >0.3 cover the most of what we need in the scale, if not write more items Replicate this on each new sample Validate the scale factors and calculate their reliability

Making a psychometric Takes a lot of time To develop the items To test on wide range of samples To test a large bank of hypotheses on relationships to ensure its validity Sometimes it cannot be avoided

Readings Kline, P. (2000). A Psychometrics Primer, Chapter 3. Free Association Books- 14.95 from Amazon Kline (1994). An easy guide to factor analysis (available in library) Field, A. (2007).Chapter 15- Exploratory Factor Analysis