THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS

Size: px

Start display at page:

Download "THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS"

Leona Byrd
5 years ago
Views:

1 THE COURSE EXPERIENCE QUESTIONNAIRE: A RASCH MEASUREMENT MODEL ANALYSIS Russell F. Waugh Edith Cowan University Key words: attitudes, graduates, university, measurement Running head: COURSE EXPERIENCE QUESTIONNAIRE October 1997 Address correspondence to Dr Russell F. Waugh at Edith Cowan University, Pearson Street, Churchlands, Western Australia, 6018

2 Abstract Course Experience Questionnaire data from University X in Australia for 1994 graduates (1635, response rate 44%), 1995 graduates (2430, response rate 66%) and 1996 graduates (2702, response rate 67%) were analysed using a Rasch measurement model. The whole scale (25 items) and each of the five sub-scales were analysed for each year separately, to investigate its conceptual design and validity. The results show that, taken together, at least 17 of the 25 items can form a valid scale measuring graduate perceptions of their courses for each of the three data groups. Of the five sub-scales, Good Teaching and Generic Skills are only moderately valid and reliable for use and interpretation separately from the main scale. Introduction The Course Experience Questionnaire consists of 25 items in a Likert format with five response categories. The questionaire is used by most of the 37 universitites in Australia to gather data about teaching and course quality, as perceived by graduates about four months after graduation. The questionnaire is given out annually to all graduates, by individual universities, along with the Graduate Destination Survey, and the results are sent to the Graduate Careers Council of Australia who produce reports covering all the universities (Johnson, 1997; Johnson, Ainley & Long, 1996). It is used to measure graduates' perceptions of the quality of their completed courses (see the questionnaire and Johnson, 1997, p.3). The items are conceptualised from five aspects relating to course experiences and the learning environment. These are good teaching (6 items, 1994; 7 items, ), clear goals and standards (5 items, 1994; 4 items, ), appropriate assessment (4 items), appropriate workload (4 items), and generic skills (6 items), and a single item on overall satisfaction. The development of the Course Experience Questionaire is given in Ainley and Long (1994, 1995), Johnson, (1997), Johnson, Ainley & Long (1996), and Ramsden (1991a,b). It evolved from work carried out at Lancaster University in the 1970s. The original questionnaire was based on a model of university teaching which involves curriculum, instruction, assessment and learning outcomes and contained more items than is currently used (see Ramsden, 1991a,b). It was intended that students would evaluate descriptions of distinct aspects of their learning environment. It has since been revised for use in Australia and the Graduate Careers Council of Australia added items relating to generic skills to the questionnaire. The questionnaire now focuses on student perceptions of five aspects relating to courses and the learning environment. For recent commentary on the questionaire, see Johnson (1997), Johnson, Ainley and Long (1996), and Wilson, Lizzio and Ramsden (1996), and for earlier development work see Ramsden (1991a,b), Linke (1991), Entwistle and Ramsden (1983), and Marton and Saljo (1976). The Course Experience Questionnaire has been analysed by traditional measurement techniques. Using large multi-disciplinary 1992, 1993 and 1994 samples of students and

3 graduates (N=2130, 1992, N=1362, 1993 and N=7370, 1994), the questionnaire was found to have reasonable internal reliablity (Cronbach alphas between 0.67 and 0.88) and good construct validity as judged by appropriate factor loadings on each of the sub-scales (Wilson, Lizzio & Ramsden, 1996). In another study providing different results and using different techniques, Sheridan (1995) used graduate samples of about 400 from three universities and analysed the Course Experience Questionnaire using the Extended Logistic Model of Rasch (Andrich, 1988a,b; Rasch, 1980/1960). Sheridan (1995, p21) suggested that the Course Experience Questionnaire should continue to be treated as five separate measures of an overarching construct called course experience, but indicated that 'doubt exists regarding the measurement quality of the CEQ overall and of the continued use of this instrument in its present form'. He was critical of the sub-scales, the lack of labels on the five sub-scales, the mixing of positively and negatively worded items which can cause a respondent interaction effect, the use of the neutral response category (which atracts many different types of response) and the use of the Likert format (disagree to neutral to agree order). A more favoured format from a measurement perspective is a clearly ordered one like never to allthe-time (Sheridan, 1995; Treolar, 1994). The analysis by Sheridan (1995) showed that the Course Experience Questionnaire subscales are suspect, except for the Good teaching Scale (reliabilities greater than 0.8). The sub-scales lack reliablity and the thresholds, that check on the consistency of the response categories, show that graduates from three universities reacted differently to the items. This means that a degree of interaction between the items and the three university groups existed, making comparisons between university groups invalid or suspect (Sheridan, 1995, p15). Furthermore, items 25, 16, 7, and 20 exhibit misfit to the model and should be discarded or reworded. Aims of the Study The present study aims to investigate the psychometric properties and the conceptual design of the Course Experience Questionnaire as an instrument to measure the perceptions of course experiences of university graduates after they have graduated. It aims to do this by analysing the psychometric properties of the whole scale and the five subscales separately, using the Extended Logistic Model of Rasch (Andrich, 1988a, 1988b, 1978; Rasch, 1960/1980). This model creates an interval level scale from the data where equal differences between numbers on the scale represent equal differences in graduate perception measures and item difficulties, as appropriate, and it does this by calibrating both item difficulties and graduate perceptions on the same scale. The conceptual and theoretical design of the Course Experience Questionnaire is based on five aspects of the graduates' course experiences. In order to investigate the meaning of the questionnaire, it is necessary to investigate the sub-scales separately, as well as the whole questionnaire. Theoretically, in an ideal conception of the questionnaire, the items of the sub-scale should be related well enough to fit the Rasch model as separate sub-scales and together as a whole scale. This is a major benefit of using Rasch model analysis because it helps in the theoretical development of the variable and its meaning. METHOD Data Data for the present study were analysed in three groups. The first group were 1635, 1994 graduates from University X in Australia (44% response rate). The second group were 2430,

4 1995 graduates from University X who responded to the same survey (except that question 16 was changed, see the appendix). This represented a 66% response rate. The third group were 2696, 1996 graduates from University X (67% response rate). Graduates covered all six Faculties of the university, namely, the Academy of Performing Arts, Arts, Business, Education, Health and Human Sciences, and Science, Engineering and Technology. Measurement Taken individually, the 25 items of the Course Experience Questionnaire can be used to interpret the responses of graduates to their perceptions about the university courses that they undertook. This could provide a view of their perceptions from a qualitative point of view on each item. However, if data on the 25 items are aggregated in some way or used to create a scale and then interpreted, then seven criteria have to be met before it can be said that the items form a valid and reliable scale. The seven measurement criteria have been set out by Wright and Masters (1981). They involve, first, an evaluation of whether each item functions as intended; second, an estimation of the relative position (difficulty) of each valid item along the scale; third, an evaluation of whether each person's responses form a valid response pattern; four, an estimation of each person's relative score (perception) on the scale; five, the person scores and the item scores must fit together on a common scale defined by the items and they must share a constant interval from one end of the scale to the other so that their numerical values mark off the scale in a linear way; six, the numerical values should be accompanied by standard errors which indicate the precision of the measurements on the scale; and seven, the items should remain similar in their function and meaning from person to person and group to group so that they are seen as stable and useful measures. The present study used these seven criteria to analyse the 25 items of the Course Experience Questionnaire and its five sub-scales. Measurement Model The Extended Logistic Model of Rasch (Andrich, 1978,1988a,b; Rasch, 1980; Wright, 1985) was used with the computer program Quest (Adams & Khoo, 1994) to create a scale, satisfying the seven measurement criteria of Wright and Masters (1981). The scale is based on the log odds (called logits) of graduates' agreeing with the items. The items are ordered along the scale at interval measurement level from easiest with which to agree to hardest with which to agree. Items at the easiest end of the scale (those with negative logit values) are answered in agreement by most students and those items at the hardest end of the scale (those with positive logit values) are most likely to be answered in agreement only by students whose perceptions are strongly positive. The Rasch method produces scale-free graduate perception measures and sample-free item difficulties (Andrich, 1988b; Wright and Masters, 1982). That is, the differences between pairs of graduate perception measures and item difficulties are expected to be sample independent. The program checks on the consistency of the graduate responses and calculates the scale score needed for a 50 per cent chance of passing from one response category to the next; for example, from strongly disagree to disagree, from disagree to neutral, neutral to agree and from agree to strongly agree for each item. These scale scores are called threshold values; they are calculated in logits and they must be ordered to represent the increasing perception needed to answer from strongly disagree to disagree, to neutral, to agree, to strongly agree. Items whose thresholds are not ordered - that is, items for which the

5 students do not use the categories consistently - are not considered to fit the model and would be discarded. The program checks that the graduate responses fit the measurement model according to strict criteria. The criteria are described by Adams and Khoo (1994), Wright and Masters (1982) and Wright (1985). The fit statistics are weighted and unweighted mean squares that can be approximately normalised using the Wilson-Hilferty transformation. The normalised statistics are called infit t and outfit t and they have a mean near zero and a standard deviation near one when the data conform to the measurement model. A fit mean square of 1 plus x indicates 100x per cent more variation between the observed and predicted response patterns than would be expected if the data and the model were compatible. Similarly, a fit mean square of 1 minus x indicates 100x per cent less variation between the observed and predicted response patterns than would be expected if the data and the model were compatible. In this study, each item had to fit the model within a 30 per cent variation between the observed and expected response pattern or it was discarded. With such items, the graduate responses are not consistent with the responses on the other items in the scale and there is not sufficient agreement amongst graduates as to the position (difficulty) of the items on the scale. Reliability is calculated by the Item Separation Index and the Graduate Separation Index. Separation indices represent the proportion of observed variance considered to be true. A value of 1 represents high reliability and a value of zero is low (Wright & Masters, 1982). A combination of data is required as evidence for the construct validity of the Course Experience Questionnaire. The Item and Graduate Separation Indices need to be high; the observed and expected item response patterns need to fit the measurement model according to strict criteria; the thresholds relating to passing from one category response to the next need to be ordered; and there needs to be a conceptual framework (theoretical or practical) linking the items of the scale together. Data Analysis The 1995 data were analysed with all the 25 items together and with each of the sub-scales of the Course Experience Questionaire separately. Six interval level scales were created for the 1996 data: one for all 25 items and one for each of the five sub-scales. These analyses were repeated for the 1995 and 1994 data. RESULTS In the interest of conciseness and brevity, not all the results are presented here; only what are considered the most important. For example, the values for the standard errors of measurement for each item difficulty and graduate perception measure are not presented; the threshold values for each response category of each item for each of the scales created are not included; and the derived scales showing the positions of the items and the graduate perceptions for each of the five sub-scales with each of the three data groups are not presented. Items 21 and 25 did not fit the model using any of the three data groups and so the items were discarded. Of the remainder, item 9 did not fit the model for the 1994 data, items 3 and 8 did not fit the model for the 1995 data, and items 8,9 and 17 did not fit the model for the 1996 data. A good scale was created with 17 items for the 1996 data, with 21 items for the 1995 data and with 22 items for the 1994 data. The main results are set out in ten Tables and two Figures. Table 1 shows the summary statistics relating to the items fitting the model for each of the three data groups. Table 2

6 shows the summary statistics relating to graduate perceptions for each of the three data groups. Tables 3 and 4 show similar data for the Good Teaching Sub-Scale, Tables 5 and 6 for the Clear Goals and Standards sub-scale, Tables 7 and 8 for the Appropriate Assessment Sub-Scale, Tables 9 and 10 for the Apropriate Workload Sub-Scale and Tables 11 and 12 for the Generic Skills Sub-Scale. Table 13 shows the item difficulties for the Course Experience Questionnaire items fitting the model for each of the three data groups. Figure 1 shows the Course Experience Questionnaire Scale with graduate perception measures and item difficulties calibrated on the same scale for the 1996 data. Figure 2 shows the items fitting the model for the 1996 data. Table 1 Item statistics for the Course Experience Questionaire Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of items Non fit items 9,21,25 3,8,21,25 4,8,9,16,17,21,23,

7 Table 2 Graduate statistics for the Course Experience Questionaire Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of graduates %graduate responses 44% 66% 67% Notes for Tables Items 4,8,12,13,16,19,21,23 are reversed scored 2. Mean and SD are the mean and standard deviation of the item thresholds or the graduate attitude scores for the scale and sub-scales, as appropriate. A threshold for an item step between two categories of the same item is the attitude score that is required for a graduate to have a 50% chance of passing that step (such as passing from agree to strongly agree on a Likert item). 3. Separation indices represent the proportion of observed variance considered to be true. A value of 1 represents high separability and a value of zero is low separability(wright & Masters, 1982; Wright, 1985). A separability value of 0.9 or more is sought for a good scale.

8 4. Infit mean refers to mean squares, unweighted. 5. Outfit mean refers to weighted mean squares. 6. Infit t and outfit t refers to the normalised t values using the Wilson-Hilferty transformation. 7. When the data are compatible with the model, the expected values of the mean squares are approximately one and the expected values of the t-scores are approximately zero INFIT MNSQ item 1. *. 2 item 2. *. 3 item 3. *. 5 item 5. *. 6 item 6. *. 7 item 7. *. 10 item 10. *. 11 item 11. *. 12 item 12. *. 13 item 13. *. 14 item 14. *. 15 item 15. *. 18 item 18. *. 19 item 19. *. 20 item 20. *.

9 22 item 22. *. 24 item 24. *. ====================================================================== ========== Figure 2 Fit mean square data of the Course Experience Questionaire items (17) that fit the model for the 1996 data Table 13 Difficulties of the items in logits for the three CEQ scales created from 1994, 1995 and 1996 data Item No No fit No fit

10 No fit No fit 9 No fit No fit No fit No fit Did not fit the measurement model No fit Did not fit the measurement model Note on Table 13

11 The items have similar difficulties across all three data groups. This supports one of the aspects of a good scale, invariance of difficulties Positive graduate perceptions Difficult items 4.0 logits X X X 3.0 logits XX X X XX XX 2.0 logits XXX XXX XXXXXX XXXXX XXXXXXXXXXX

12 XXXXXXXX 1.0 logits XXXXXXXXXXXXXXX XXXXXXXXX XXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXX Items 7 XXXXXXXXXXXXXXXXXXX Items 15,18 XXXXXXXXXXXXXXXXX Items 1,20, logits XXXXXXXXXXXXXXX Items 3,6,10,12,13,14 XXXXXXXXXXXXXXXX XXXXXXXXX Items 2,5,11,22 XXXXX Items 19 XXXXXXX XXXX -1.0 logits XX XXX X X -2.0 logits Negative graduate perceptions Easy items Figure 1. Scale for the Course Experience Questionnaire using 1996 data

13 Notes on Figure 1 1. Each X represents 12 graduates. 2. The item difficulties and the graduate perceptions are calibrated on the same scale. The scale is measured in logits which is the log odds of graduates agreeing with the items. 3. N=2702, 1996 graduates 4. L=17 as 8 items (4,8,9,17,21,23,24 & 25) did not fit the model and were discarded. 5. The graduate perception scores range from -1.5 logits to +3.7 logits and the item difficulties range from -0.5 logits to +0.5 logits. This means that the difficulties of the 17 items are not targeted appropriately for the graduates. They are too easy and more difficult items need to be added. 7. The difficult items are at the top of the right hand side of the scale. Only graduates with strong positive perceptions can agree with these items. The easy items are at the bottom right hand side of the scale. Most graduates agree with these. DISCUSSION Psychometric Characteristics: Course Experience Qustionnaire The values of the infit mean squares and outfit mean squares are approximately 1 and the values of the infit t-scores and outfit t-scores are approximately zero (see Tables 1 & 2). For each item, the mean squares are within 30% of the expected values, calculated according to the model (see Figure 2). These indicate that the final sets of items of the Course Experience Questionnaire for each data group have a strong fit to the measurement model. This means that there is a strong agreement between all graduates as to the difficulties of all items located at different positions on the scale. However, the items are not as well targeted as they could be (see Figure 1) and some more easy and more difficult items are needed. The threshold values are ordered from low to high indicating that the graduates have answered consistently with an ordered response format from strongly disagree to disagree, neutral, agree and strongly agree. The Indices of Graduate Perception and Item Separation range from 0.87 to 0.90 (see Tables 1 & 2) indicating that the errors are low and that the power of the tests of fit to the measurement model is good. The item difficulties and the graduate perception measures are calibrated on the same scale and each item has a similar difficulty value on the scale for each of the three data groups (see Table 13). This supports the view that sample-free item measures have been created. It can therefore be claimed that the items of the Course Experience Questionnaire which fit the model have sound psychometric properties and that a good scale has been created. Meaning of the Scale The items fitting the model that make up the variable, graduate perceptions about their courses of study at universitites, relate to their experiences of teaching quality, goals and

14 standards, assessment, workload and generic skills learned. These items define the variable. They have good content validity and they are derived from a conceptual framework based on previous research. This, together with the previous data relating to reliability and fit to the measurement model, is strong evidence for the construct validity of the variable. This means that the graduate responses to the items 'hang together' sufficiently well to represent the unobservable trait graduate perceptions about their courses of study at universities. This trait involves and is related to the five aspects of the learning environment. Items 21 and 25 do not fit the model because graduates cannot agree as to their position (difficulties) on the scale. Item 25 on course satisfaction does not relate to any of the five aspects of the learning environment and on this point is measuring something different about courses. Item 21 on pressure in the course does not relate to appropriate workload and is measuring something different. It is suggested that this item be reworded in a positive sense to focus on time pressure to learn what is rewuired due to workload in the course. The Sub-Scales of the Course Experience Questionnaire The Good Teaching Sub-Scale The five items (3,7,15,18 and 20, see the appendix; items 16 and 17 do not fit the model for the 1995 and 1996 data) that make up the sub-variable, Good Teaching, relate to graduate perceptions of how staff motivate, comment, help, explain and make the subjects interesting. The five items have a good fit to the measurement model; they have ordered thresholds indicating that the responses are answered consistently; good graduate separability indices ranging from 0.81 to 0.83; and reasonable item separability indices from to +0.80, except for an item separability index of with 1994 data (see Tables 3 & 4). The difficulties of the items are not as well targeted against the graduate perceptions as they could be (scale not included here). It would seem that this sub-scale could be used separately from the full scale, if needed. Overall, it can be said that, while the Good Teaching Sub-Scale has some satisfactory psychometric properties, there is room for improvement. The scale could be improved by adding more easy and hard items to better target the graduates and by rewording items 16 and 17. Items 16 (1995 data) and 16 and 17 (1996 data) do not fit the model because graduates cannot agree on their position (difficulty) on the scale. It is suggested that item 17 be reworded along the lines of 'The teaching staff gave me helpful feedback on my set work/assignments'. Item 16 probably has to be discarded. Table 3 Item statistics for the Good Teaching Sub-Scale

15 Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of items Non-fit items none 16 16, Table 4 Graduate statistics for the Good Teaching Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of graduates

16 The Clear Goals and Standards Sub-Scale The four items (1,6,13,24, see the appendix) that make up the sub-variable, Clear Goals and Standards, relate to graduate perceptions of the standards and goals expected in the course. (Item 16 did not fit the measurement model for the 1994 data and was included with the Good Teaching Sub-Scale for the 1995 and 1996 data groups, where it also didn't fit the model). The four items have a good fit to the measurement model; they have ordered thresholds indicating that the responses are answered consistently, and moderate graduate and item separability indices from to for the three data groups, except for the 1996 data group where the index is 0.28(see Tables 5 & 6). The difficulties of the items are not as well targeted against the graduate perceptions as they could be because there are too few items (scale not included here). While the four items of the Clear Goals and Standards Sub-Scale have moderately good psychometric properties, they should not be used as a separate scale without major modification. It is suggested that some more easy and hard items be added. Table 5 Item statistics for the Clear Goals & Standards Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of items Non-fit items 16 none none

17 Table 6 Graduate statistics for the Clear Goals & Standards Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of graduates The Appropriate Assessment Sub-Scale The three items (8,12 and 19, see the appendix) that make up the sub-variable, Appropriate Assessment, relate to graduate perceptions of memorisation and the learning of facts in the course. The three items have a satisfactory fit to the measurement model. However, while they have ordered thresholds indicating that the responses are answered consistently, they also have low graduate and item separability indices from to for the three data groups (see Tables 7 & 8). The low reliability is directly attributable to the low number of items and hence to the poor targeting of the items. The current three items of the Appropriate Assessment Sub-Scale do not have sound psychometric properties and cannot be used as a separate scale without major modification. It is suggested that more easy and hard items be added to better target the graduates. Table 7 Item statistics for the Appropriate Assessment Sub-Scale

18 Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of items Non-fit items none none none Table 8 Graduate statistics for the Appropriate Assessment Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of graduates

19 The Appropriate Workload Sub-Scale The four items (4,14,21,23, see the appendix) that make up the sub-variable, Appropriate Workload, relate to graduate perceptions of the amount of work, the pressure and lack of time to comprehend everything in the course. While item 21 fits the sub-variable, it does not fit the full Course Experience Questionnaire Scale, indicating that all graduates can agree on its common difficulty in the sub-scale but not in the full scale. The four items have a reasonable fit to the measurement model in the sub-scale. However, while they have ordered thresholds indicating that the responses are answered consistently, they only have moderate graduate and item separability indices from to for the three data groups (see Tables 9 & 10). The moderate separability is directly attributable to the low number of items and to the poor targeting of the items (not shown here). The current four items of the Appropriate Workload Sub-Scale do not have sound psychometric properties and cannot be used as a separate scale without major modification. Similar modifications should be made to this sub-scale as to the previous sub-scale. Table 9 Item statistics for the Appropriate Workload Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of items Non-fit items none none none

20 Table 10 Graduate statistics for the Appropriate Workload Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of graduates The Generic Skills Sub-Scale The six items (2,5,10,11,22, see the appendix; item 9 did not fit the model for any group and item 10 did not fit the model for the 1994 group), that make up the sub-variable, Generic Skills, relate to graduate perceptions of their problem-solving ability, analytical skills, communication skills, ability to plan ahead and work as a team member, and to their confidence in tackling unfamiliar problems, as developed in the course. The five items have a good fit to the measurement model. They have ordered thresholds indicating that the responses are answered consistently and moderate graduate and item separability indices (see Tables 11 & 12). This sub-scale could be used separately from the full 23 item scale, if needed. Overall, it can be said that, while the Generic Skills Sub-Scale has some satisfactory psychometric properties, the scale could be improved by adding more easy and hard items to better target the graduates. Item 9 seems out of place because many courses do not aim to develop team work. The item could be changed to read 'The course helped to develop my general ability to see other points of view'. Item 10 could be modified to read ' As a result of my course, I can try to solve unfamiliar problems. Table 11 Item statistics for the Generic Skills Sub-Scale

21 Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of items Non-fit items 9, Table 12 Graduate statistics for the Generic Skills Sub-Scale Mean SD Separability Infit mean Outfit mean Infit t Outfit t No. of graduates

22 Problems with the Likert format Although the Likert format is commonly used in attitude measures, its use has been called into question in modern attitudinal measurement (Andrich, 1982; Andrich, de Jong & Sheridan, 1994; Dubois & Burns, 1975; Sheridan, 1993,1995). Three issues have been questioned. The first relates to the middle or neutral category and its interpretation. The second relates to the use of the response categories which are not considered to represent a true ordering from low to high; and the third relates to the mixing of negatively and positively worded items to avoid the fixed response category syndrome. The middle category between agree and disagree attracts a variety of responses such as 'don't know', 'unsure', 'neutral' and 'don't want to answer' and these are at odds with the implied ordered responses from strongly disagree to strongly agree. With a Rasch measurement analysis, this difficulty of graduate interpretation would be indicated by reversed thresholds and misfit to the measurement model. Strangely, in the present study, all the thresholds were nicely ordered and this problem did not seem to be present, although there is still the problem of interpretation of the results. That doesn't mean that the problem will not be present in other administrations of the Course Experience Questionnaire; it just did not show as a problem with the present study. From a measurement perspective and for some graduates, the range from strongly disagree to strongly agree is not ordered from low to high. Again, in a Rasch measurement analysis, this would be indicated by reverse thresholds and misfit to the model. While the present study did not show any reverse thresholds, the problem of interpretation of the results is still present. It is suggested that the items of the Course Experience Questionnaire should be modified to overcome these two problems of interpretation. One way to do this is to change the response format to a clearly increasing order such as never, sometimes, a great deal or all the time. Another is to use numbers or a range of numbers starting from zero. Although it has been common practice in attitude measures to mix negatively and positively worded items to avoid the fixed category response syndrome, its practice has been called into question during Rasch attitude measurement analysis (Andrich & van Schoubroeck, 1989; Sheridan, 1995). It is claimed that mixing negatively and positively worded items causes many respondents to link answers between items and this relates to an interaction effect between items and different groups of respondents resulting in the loss of invariance of the items. In the present study, this could explain why items 16, 21 and 25 did not fit the model. It could also partially explain the variation in the extent of fit of items to the model and be related to the lower separability (reliability) of the items in the sub-scales. Conclusions Taken separately, each of the 25 items of the Course Experience Questionnaire can be used to provide qualitative data about graduate perceptions of the courses they have completed at university. Taken together, 17 items for the 1996 data, 21 items for the 1995 data and 22 items for the 1994 data form valid and reliable scales. Some improvements could be made to the scales by adding some more easy and hard items to better target the graduates. The conceptual design of the Course experience Questionnaire from the five aspects, good teaching, clear goals and standards, appropriate assessment, appropriate

23 workload and generic skills, is confirmed and the Rasch measurement model has been useful in examining its meaning and conceptual design. Of the five aspects of the Course Experience Questionnaire, only Good Teaching and Generic Skills form moderately valid and reliable sub-scales, which could be used and interpreted separately. Both the valid and not-so-valid sub-scales could be improved by increasing the number of easy and hard items to provide better targeting. References Adams, R. J. & Khoo, S.T. (1994). Quest: the interactive test analysis system. Camberwell, Victoria: ACER. Ainley, J. & Long, M. (1995). The 1995 Course Experience Questionnaire: an interim report. Camberwell, Victoria: ACER. Andrich, D. (1988a). A General Form of Rasch's Extended Logistic Model for Partial Credit Scoring. Applied Measurement in Education,1 (4), Andrich, D. (1988b). Rasch models for measurement. Sage university paper on quantitative applications in the social sciences, series number 07/068. Newbury Park, California: Sage Publications. Andrich, D. (1982). Using latent trait measurement to analyse attitudinal data: a synthesis of viewpoints. In D. Spearitt (Ed.), The improvement of measurement in education and psychology, pp Melbourne: ACER Andrich, D. (1978). A rating formulation for ordered response categories, Psychometrika,43, Andrich, D., de Jong, J.H. & Sheridan, B. (1994). Diagnostic opportunities with the Rasch model for ordered response categories. Paper presented at the IPN Symposium on Applications of Latent Trait and Latent Class Models in the Social Sciences, Akademie Sankelmark, Germany, May Andrich, D. & van Schoubroeck, L. (1989). The General Health Questionnaire: a psychometric analysis using latent trait theory. Psychological Medicine, 19, Ajzen, I. (1989). Attitude structure and behaviour. In A. Pratkanis, A. Breckler and A. Greenwald (Eds.), Attitude Structure and Function. New Jersey: Lawrence Erlbaum and Associates. Dubois, B. & Burns, J.A. (1975). An analysis of the question mark response category in attitudinal scales. Educational and Psychological Measurement, 35, Enwistle, N.J. & Ramsden, P. (1983). Understanding student learning. London: Croom Helm.

24 Johnson, T. (1997). The 1996 Course Experience Questionnaire. Parkville, Victoria: Graduate Careers Council of Australia. Johnson, T., Ainley, J. & Long, M. (1996). The 1995 Course Experience Questionnaire. Parkville, Victoria: Graduate Careers Council of Australia. Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140. Linke, R. D. (1991). Report of the research group on performance indicators in higher education. Canberra: AGPS Marton, F. & Saljo, R. (1976). On qualitative differences in learning II: outcome as a function of the learners's conception of the task. British Journal of Educational Psychology, 46, Ramsden, P. (1996). The validity and future of the Course Experience Questionnaire. Paper delivered at the Australian Vice-Chancellors' Committee Course Experience Symposium, 3-4 October, 1996, at Griffith University, Queensland. Ramsden, P. (1992). Learning to teach in higher education. London: Routledge. Ramsden, P. (1991a). Report on the Course Experience Questionnaire trial. In R. Linke, Performance indicators in higher education, Vol.2, Canberra: AGPS. Ramsden, P. (1991b). A performance indicator of teaching quality in higher education: the course experience questionnaire. Studies in Higher Education, 16, Rasch, G. (1980/1960). Probabilistic models for intelligence and attainment tests (expanded edition). Chicago: The University of Chicago Press (original work published in 1960). Sheridan, B. (1995). The Course Experience Questionnaire as a measure for evaluating courses in higher education. Perth: Measurement, Assessment and Evaluation Laboratory, Edith Cowan University. Sheridan, B. (1993). Threshold location and Likert-style questionnaires. Paper presented at the Seventh International Objective Measurement Workshop, American Educational Research Association Annual Meeting in Atlanta, USA, April, Treolar, D. (1994). Course Experience Questionnaire: reaction. Paper presented at the Second Graduate Careers Council of Australia Symposium in Sydney, NSW. Wilson, K.L., Lizzio, A. & Ramsden, P. (1996). The use and validation of the Course Experience Questionnaire (Occasinal Paper no. 6). Brisbane: Griffith Institute for Higher Education, Griffith University. Wright, B.D. (1985). Additivity in psychological measurement. In E.E. Roskam(Ed.), Measurement and Personality Assessment. Amsterdam: North-Holland:Elsevier Science Publishers B.V. pp Wright, B. & Masters G.(1982). Rating scale analysis: Rasch measurement. Chicago: MESA press.

25 Wright, B. & Masters, G. (1981). The measurement of knowledge and attitude (Research memorandum no. 30). Chicago: Statistical Laboratory, Department of Education, University of Chicago. Appendix THE COURSE EXPERIENCE QUESTIONNAIRE The Good Teaching Sub-Scale (6 items) 3. The teaching staff of this course motivated me to do my best work. 7. The staff put a lot of time into commenting on my work. 15. The staff made a real effort to understand difficulties I might be having with my work. 16*. Feedback on my work was usually provided only in the form of marks or grades (1995, 1996 version). 17. The teaching staff normally gave me helpful feedback on how I was going. 18. My lecturers were extremely good at explaining things. 20. The teaching staff worked hard to make their subjects interesting. The Clear Goals and Standards Sub-Scale (5 items) 1. It was always easy to know the standard of work expected. 6. I usually had a clear idea of where I was going and what was expected of me in this course. 13*. It was often hard to discover what was expected of me in this course. 16* The course was overly theoretical and abstract (1994 version). 24. The staff made it clear right from the start what they expected of students. The Appropriate Assessment Sub-Scale (3 items) 8* To do well in this course all you really needed was a good memory.

26 12* The staff seemed more interested in testing what I had memorised than what I understood. 19* Too many staff asked me questions just about facts. The Appropriate Workload Sub-Scale (4 items) 4* The work load was too heavy. 14. I was generally given enough time to understand the things I had to learn. 21* There was a lot of pressure on me as a student in this course. 23* The sheer volume of work to be got through in this course meant that it couldn't all be thoroughly comprehended. The Generic Skills Sub-Scale (6 items) 2. The course developed my problem-solving skills. 5. The course sharpened my analytic skills. 9. The course helped me develop my ability to work as a team member. 10. As a result of my course, I feel confident about tackling unfamiliar problems. 11. The course improved skills in written communication. 22. My course helped me to develop the ability to plan my own work. Overall Satisfaction (1 item) 25. Overall, I was satisfied with the quality of this course. Note: Items marked * are reversed scored. Course Experience Questionnaire 6 Course Experience Questionnaire 7

A TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT. Russell F. Waugh. Edith Cowan University

A TEST OF A MULTI-FACETED, HIERARCHICAL MODEL OF SELF-CONCEPT Russell F. Waugh Edith Cowan University Paper presented at the Australian Association for Research in Education Conference held in Melbourne,