A New Approach to Examining Validity

Similar documents
Test-Taking Strategies and Task-based Assessment: The Case of Iranian EFL Learners

Introduction. 1.1 Facets of Measurement

How Do We Assess Students in the Interpreting Examinations?

Psychological testing

Rater Reliability on Criterionreferenced Speaking Tests in IELTS and Joint Venture Universities

THE PROFESSIONAL BOARD FOR PSYCHOLOGY HEALTH PROFESSIONS COUNCIL OF SOUTH AFRICA TEST DEVELOPMENT / ADAPTATION PROPOSAL FORM

Houghton Mifflin Harcourt Avancemos!, Level correlated to the

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages

Perception-Based Evidence of Validity

Holt McDougal Avancemos!, Level correlated to the. Crosswalk Alignment of the National Standards for Learning Languages

Performance-Based Assessments. Performance-Based Tests

Looking at the 'Dark Side' of Creativity

Quality Assessment Criteria in Conference Interpreting from the Perspective of Loyalty Principle Ma Dan

2014 Philosophy. National 5. Finalised Marking Instructions

ITEM ANALYSIS OF MID-TRIMESTER TEST PAPER AND ITS IMPLICATIONS

4.0 INTRODUCTION 4.1 OBJECTIVES

AN ANALYSIS ON VALIDITY AND RELIABILITY OF TEST ITEMS IN PRE-NATIONAL EXAMINATION TEST SMPN 14 PONTIANAK

SEMINAR ON SERVICE MARKETING

CLASSROOM PARTICIPATION OF DEAF AND HARD OF HEARING STUDENTS IN A SIGN BILINGUALISM AND CO-ENROLLMENT (SLCO) EDUCATION SETTING

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

THE IMPACT OF EMOTIONAL INTELLIGENCE IN THE CONTEXT OF LANGUAGE LEARNING AND TEACHING

Author's response to reviews

Cognitive Interviews in Languages Other Than English: Methodological and Research Issues

What is the Dissertation?

PM-SB Study MI Webinar Series Engaging Using Motivational Interviewing (MI): A Practical Approach. Franze de la Calle Antoinette Schoenthaler

LEARNING. Learning. Type of Learning Experiences Related Factors

Title:Video-confidence: a qualitative exploration of videoconferencing for psychiatric emergencies

Chapter Three: Sampling Methods

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

A framework for predicting item difficulty in reading tests

2 Types of psychological tests and their validity, precision and standards

GENERALIZABILITY AND RELIABILITY: APPROACHES FOR THROUGH-COURSE ASSESSMENTS

Motivational Interviewing for Family Planning Providers. Motivational Interviewing. Disclosure

Chapter 1. Research : A way of thinking

DON M. PALLAIS, CPA 14 Dahlgren Road Richmond, Virginia Telephone: (804) Fax: (804)

Chapter 1. Research : A way of thinking

COACHING I 7. CORE COMPETENCIES

Item Writing Guide for the National Board for Certification of Hospice and Palliative Nurses

CONSIDERATIONS IN PERFORMANCE-BASED LANGUAGE ASSESSMENT: RATING SCALES AND RATER TRAINING

Boise State University Foundational Studies Program Course Application Form

Neuroscience and Generalized Empirical Method Go Three Rounds

CHAPTER III RESEARCH METHOD

Validity Arguments for Alternate Assessment Systems

Exploring the Efficiency of an Online Bridge Between Working Interpreters, Student Interpreters, and the Deaf Community

Rating the construct reliably

THE EFFECTIVENESS OF RATER TRAINING IN IMPROVING THE SELF-ASSESSMENT INTRA-RATER RELIABILITY OF ENGLISH SPEAKING PERFORMANCE.

PLANNING THE RESEARCH PROJECT

TRANSLATING RESEARCH INTO PRACTICE

INTERVIEWS II: THEORIES AND TECHNIQUES 1. THE HUMANISTIC FRAMEWORK FOR INTERVIEWER SKILLS

Understanding Your Coding Feedback

Multivariable Systems. Lawrence Hubert. July 31, 2011

WHAT IS THE DISSERTATION?

SPEAKING Assessment Criteria Glossary (from September 2018) (public version) I. Linguistic Criteria. Intelligibility. Fluency

PRELIMINARY EXAM EVALUATION FACULTY SCORE SHEET

What Solution-Focused Coaches Do: An Empirical Test of an Operationalization of Solution-Focused Coach Behaviors

New Mexico TEAM Professional Development Module: Autism

Patient Reported Outcomes (PROs) Tools for Measurement of Health Related Quality of Life

GE SLO: Ethnic-Multicultural Studies Results

Principles of Sociology

AMERICAN SIGN LANGUAGE GREEN BOOKS, A TEACHER'S RESOURCE TEXT ON GRAMMAR AND CULTURE (GREEN BOOK SERIES) BY CHARLOTTE BAKER-SHENK

INTERVIEWS II: THEORIES AND TECHNIQUES 6. CLINICAL APPROACH TO INTERVIEWING PART 2

BASIC VOLUME. Elements of Drug Dependence Treatment

CHAPTER III METHODOLOGY

Choose an approach for your research problem

Families with Young Children who are Deaf and Hard of Hearing in Minnesota

College of Education and Human Services Exceptional Student & Deaf Education Course Descriptions

BIOSTATISTICAL METHODS

In this chapter we discuss validity issues for quantitative research and for qualitative research.

News English.com Ready-to-use ESL/EFL Lessons

DOING SOCIOLOGICAL RESEARCH C H A P T E R 3

Audio: In this lecture we are going to address psychology as a science. Slide #2

Chapter.3 METHODOLOGY. The aim of this study is to analyse the types of politeness strategies found in economic

English Language Writing Anxiety among Final Year Engineering Undergraduates in University Putra Malaysia

Interviewing, or MI. Bear in mind that this is an introductory training. As

Unit 3: EXPLORING YOUR LIMITING BELIEFS

Chapter 02 Developing and Evaluating Theories of Behavior

Technical Specifications

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Ball State University

Admission Test Example. Bachelor in Law + Bachelor in Global Governance - BIG

The Role of Modeling and Feedback in. Task Performance and the Development of Self-Efficacy. Skidmore College

Chapter 1 Social Science and Its Methods

Science is a way of learning about the natural world by observing things, asking questions, proposing answers, and testing those answers.

Roskilde University. Publication date: Document Version Early version, also known as pre-print

DATA GATHERING METHOD

INTERVIEWS II: THEORIES AND TECHNIQUES 5. CLINICAL APPROACH TO INTERVIEWING PART 1

Reliability and Validity checks S-005

Psychology 205, Revelle, Fall 2014 Research Methods in Psychology Mid-Term. Name:

ConnSCU GENERAL EDUCATION ASSESSMENT RUBRIC COMPETENCY AREA: Written Communication

Georgina Salas. Topics 1-7. EDCI Intro to Research Dr. A.J. Herrera

Chapter 5: Producing Data

Bruno D. Zumbo, Ph.D. University of Northern British Columbia

Validation of an Analytic Rating Scale for Writing: A Rasch Modeling Approach

ADDITIONAL CASEWORK STRATEGIES

Culminating Assessments. Option A Individual Essay. Option B Group Research Proposal Presentation

Construct(s) measured in face-to-face and video-conferencing delivered speaking tests

Recommendations for Jeevan Gnanodaya School for the Deaf Sarika D. Mehta (sarikadmehta AT gmail DOT com)

CHINO VALLEY UNIFIED SCHOOL DISTRICT INSTRUCTIONAL GUIDE AMERICAN SIGN LANGUAGE 1

PSYCHOLOGY IAS MAINS: QUESTIONS TREND ANALYSIS

Transcription:

Nov. 2006, Volume 4, No.11 (Serial No.38) US -China Foreign Language, ISSN1539-8080, US A A New Approach to Examining Validity Test-taking Strategy Investigation HE Xue-chun * (Foreign Languages Department, University of Electronic Science and Technology of China Zhongshan Institute, Zhongshan, Guangdong 528402, China) Abstract: A new type of approach, the analysis of the process and strategies of test taking, provides new insights into the factors that affect test performance and construct validity. One can find the fact that what test constructors think is being tested or will be tested is not always what the test actually will do when test-taking strategies are concerned about. This paper makes a brief review of the approach, mainly focusing on test-taking strategies and the most common way verbal report approach of inquiring those strategies. Key words: construct validity; test-taking strategies; verbal report approach 1. Introduction The purpose of modern language tests is to observe and to measure language proficiency and communicative competence. But how can we know whether the abilities we wish to measure in a language test are appropriately tested? Usually, the question is responded from the aspect of validity. Messick defines validity as the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores (Messick, 1989: 13). A test is said to be valid if it measures accurately what it is intended to measure (Hughes, 2000: 22). Traditionally, three main kinds of validity content validity, criterion-related validity, and construct validity are examined and used to assess the degree that a test achieves to support its use. However, are these validities enough to justify that test scores really indicate the test takers actual language competence? These validations are of great importance and value when concerning with test methods (Bachman, 1999: 111), while less sufficient when concerning about test takers. Test performance is affected not only by the characteristics of test method, but also by test takers language ability, by individual attributes, including test takers cognitive and affective characteristics, their real world knowledge, and factors such as their age, sex, native language, educational, and socio-economic background, and by the preparation or prior experience with a given test among test takers, which are sometimes referred to as test wiseness, including a variety of general test taking strategies, such as reading questions before the passages on which they are based (Bachman, 1999: 113-114). Since the characteristics of test takers affect their test performance, the results of the test may show false information about their language proficiency and language communicative competence. If so, the validity justif ication of the test would be questionable. Test takers play such important roles in the validation of a test just as learners play crucial roles in language teaching and learning. As a result, test developers and users have made a shifting of research from merely on test methods to test takers and test-taking strategies. HE Xue-chun(1977- ), female, M.A., lecturer of Foreign Languages Department, University of Electronic Science and Technology of China Zhongshan Institute; research field: English language and literature. 88

2. A Brief History of Research on Test-taking Strategies One of the first requests that more attention be paid to the processes of test takers in giving answers on language tests was issued by Bormuth in 1979 (Cohen, 2002: 90): There are no studies, known to the author, which have attempted to analyze the strategies used to derive correct answers to a class of items. The custom has been to accept the author s claims about what underlying processes were tested by an item. And, since there were no operational methods for defining classes of items, it was not scientifically very useful to present empirical challenges to the test author s claims. After this, studies emerged to observe and describe how learners of different ages actually accomplish first language (L1) testing. Based on his own research efforts, Mchan indicated that it may be misled to draw a conclusion that a wrong answer is due to a lack of understanding, for the answer may come from an alternative, equally valid interpretation (1974: 44). Andrew D. Cohen (2002), one of the first and the most important persons who do researches on strategies employed by language test takers, made a brief summary of the history of researches into the process of second language (L2) test performance. He (2002: 90) said that since the late 1970s, interest has slowly begun to grow in approaching second language testing from the point of view of the strategies used by respondents going through the process of taking the test. By the 1990s, L2 testing books acknowledged this concern as a possible source of insights concerning test reliability and validity. According to Cohen and Bachman (1999), the other researches who are interested in test taking strategies are known to be Homburg and Spaan(1981), Maclean and d Anglijan (1986), Grotjahn (1986), Gordon (1989), Anderson (1989), Nevo (1989), S.A. Messick (1989), and Lyle F. Bachman (1999). Among them, Grotjahn is also known by his convincing arguments in a paper (1986) that the validation of language tests must include, in addition to the quantitative analysis of test takers responses, the qualitative analysis of test taking process and of the test tasks themselves (Bachman, 1999: 270). At the same time, some other relevant researches begin to appear, such as studies linking the selection of a given strategy with success or failure on foreign language tests (Cohen, Weaver & LI, 1996), investigating the relationships among cognitive and metacognitive strategies reported by test takers and their scores (Purpura, 1996), and studies inquiring the research methods themselves and their own reliability and validity (Cohen, in press when his Strategies and Processes in Test Taking and SLA published in 1999). 3. Test-taking Strategies and Their Significance in Examining Validity 3.1 Test-wiseness and test-taking strategies There are various strategies test takers employ according to context. Cohen (2002: 92) viewed test-taking strategies as those test-taking proces ses that the respondents have selected and of which they are conscious, at least to some degree. In other words, the notion of strategy implies on element of selection. He said one test-taking strategy is to opt out (2002: 92) of the language task at hand, for example, through a surface matching identical information in the passage and in one of the response choices. Another is to use shortcuts to arrive at answers (e.g. not reading the text as informed or directed but simply looking immediately for the answers to the given reading comprehension questions). In such cases, the test takers may be using test-wiseness to find a way to get round the need to tap their actual language knowledge or lack of it. Fransson (1984, cited in Cohen, 2002: 92) holds the similar idea by claiming that test takers may not proceed by the way of the text but rather around it. However, in some cases, other kinds of strategies rather than test-wiseness are employed. For example, the 89

respondent may produce a written translation of a text before he can respond to questions dealing with a text (Cohen & Aplek, 1979). Some of these strategies are thought to be efficient and good for test taking. Therefore, while try to examine test takers language competence, test constructors also want to examine their ability to use certain kinds of strategies, that is strategic competence (Canale & Swain, 1980). This competence emphasizes compensatory strategies, that is, strategies used to compensate for or remediate a lack in some language area. Bachman and Palmer s model for viewing strategic competence (1996) provided their current categories of it, including an assessment component (to assess which communicative goals are achievable and what linguistic resources are needed), a goal-setting component (to identify the specific tasks to be performed), and a planning component (to recall the relevant items from their language knowledge and make a plan for their use in response). It is believed that a test is valid if it elicits efficient or good strategies or strategies which test constructors assumed test takers will employ to promote correct answers. At times, when a limited number of strategies are well-chosen and are used effectively in a response to an item, the use of them may indicate genuine control over the item. At other times, true control requires the use of a number of strategies, for example when responding to reading comprehension questions, cloze and compositions (Cohen, 2002: 93-108). Generally, in response to reading comprehension items in a foreign language test in China, some strategies are regarded to be contributory reading strategies (YANG Hui-zhong & C. Weir, 1998: 82). For example, read the questions first so that the reading of the text is directed at finding answers to those questions; read the text passage with translations sometimes into mother tongue so that the test takers obtain better understanding of the passage; look for topic sentences; read some sentences or paragraphs a second time if necessary; and paraphrase the sentences. However, it s hard to judge whether a test-taking is good or not for a given task. Cohen claims (2002: 93) that the evaluation of any test-taking strategy depends on how individual test takers employ the strategies at a given moment on a given task. They use their particular cognitive style and the degree of cognitive flexibility, their language knowledge while employing some strategy. Furthermore, what test takers think of or what they actually use are not always the strategies that test constructors regard to be effective or good and that respondents should use. 3.2 Studies on test-taking strategies and validation Messick (1989: 54) has stated the importance of investigating these processes of test-taking themselves. And Cohen (2002) makes many claims about the significance of investigating test-taking strategies to gain information about which testing methods would be valid, considering more indirect testing formats, including multiple choice, cloze test and more direct formats, such as summarization tasks, open-ended questions and compositions. More indirect formats are those that do not reflect real-world tasks. They may cause the use of strategies solely for the purpose of coping with the test format, which is using test-wiseness. These formats elicit less test-taking strategies that take the place of actual language use strategies, but responses to such measures are still influenced by test-wiseness. It would appear that the test-taking strategy research can be used to support or refute ideas about multiple choice items, at least with respect to a given test in a given test situation with given respondents (Cohen, 2002: 108). In a research study (Gordon, 1987) with 30 tenth grade EFL students, the researcher found that answers to test questions did not necessarily reflect comprehension of the text. The multiple choice format has been used for a long time. The problems of the format, however, were not originally discovered with qualitative investigations. With qualitative investigations, it is suggested (Cohen, 2002) that, more sensible judgments can be made on whether to use multiple choice items or not. For example, a study could be designed to require indicating through 90

retrospective verbal reports the process by which they arrived at answers to multiple choice grammar items. It can help to make determination if it was actually grammatical knowledge that was being tapped in each item or whether the deciding element was control of one or another vocabulary word in the stem or in the distractors. Cohen (2002) also claims that the results of test-taking strategy studies on cloze tests and the more open-ended formats would also appear to provide crucial information about what those tests actually measure and how valid the tests are. He says while the reliability of given cloze test may be high because the individual items are interrelated, the validity as a measure of global reading ability could be questioned if the respondents indicate that they answered most of the items by means of local micro-level strategies (Cohen, 2002: 100). Global-level reading ability refers to predicting information accurately in context and using lexical and structural knowledge to cope with linguistic difficulties, and local micro-level reading strategies usually refers to individual word-centered strategies like matching words in alternatives to text, coping words from the text, translating word for word or formulating global impressions of text content on the basis of key words or isolated lexical items in text or test question. The validity of more direct testing formats could be suspected when responses to them are affected by test-wiseness. For example, in producing summaries, respondents may use various shortcut measures, such as to summarize by lifting material directly from the passage rather than restating it at a higher level of generality, to prefer to include material when they are doubt about whether to include or exclude it, just assuming that a longer summary would probably be preferred by the raters, and so on. Test takers have developed numerous test-wise techniques for obtaining correct answers without fully or even partially understanding the text. Students may get an item right for the wrong reasons or wrong for the right reasons. It would be interesting to know whether this is just only one individual case or whether this faulty logic is shared by other respondents as well. If poor logic used to answer a reading comprehension item was caused by an overly ambiguous text passage or by an ambiguous question and shared by numerous test takers, then it might suggest the need to improve the test or the revision of the question. 4. Approaches to Inquire Test-taking Strategies This section is going to talk bout some approaches to inquire test-taking strategies. And it also reviews the discussions of the reliability and validity of verbal report approach in the researches. 4.1 Three main approaches Some suggested approaches for looking at test taking process have been supplied in the field of testing research methods. According to Cohen (2002), there are three main approaches. One is the observation of what respondents do during tests. Another is the designing items that are assumed to require the use of certain strategies, and adding up the correct responses as an indicator of strategy use. And a third approach is the use of verbal report, while the items are being answered, immediately afterward, or some time later on. Messick (Bachman, 1999: 269) has discussed a number of approaches to the analysis of the test taking process, including the followings: protocol analysis and computer modeling, the analysis of response times and mathematical modeling of these, the analysis of reasons given by test takers for choosing a particular answer, and the analysis of systematic errors. Recently the third approach is widely applied in studies on test-taking strategies. Qualitative empirical research procedures have been utilized by language testing researchers to better understand the processes involved in taking language tests. The prevalent approach is to perceive test-taking strategies and respondents reactions to 91

different item and test types through their verbal report/self-report data, including retrospective interview and thinking aloud (Cohen, 1984, 1987; Grotjahn, 1986; MacKay, 1974; Haney & Scott, 1987; Larson, 1981; Gordon, 1984; Nevo, 1989; Anderson et al., 1991). Besides, introspective data can also help to examine efficiently respondents test-taking strategies. They have been efficiently utilized to inquire the validity of reading comprehension tests in Chinese National College English Test - Band 4 and Band 6 (YANG Hui-zhong & C. Weir, 1998). Verbal report has helped determine how respondents actually take tests of various kinds (Cohen, 2002: 95). Verbal report techniques have provided a major tool in the gathering of data on test-taking strategies. Such techniques were originally developed in the first language and then the second language acquisition research to study the processes of reading and writing. Now ways of research on test taking have helped to refine the research methodology for tapping such test-taking strategies. A number of studies have found that it is possible to collect introspective and retrospective data from respondents just after they have answered each item on a multiple choice reading test. Several such studies, such as MackKay s (1974), Larson s (1981) and Nevo s (1989), are described in Chon s Strategies and Process in Test Taking and SLA (2002). Verbal report, as one of qualitative methodologies, provides a valuable source of information perhaps the most focused possible on the strategies which test takers used in their responses and why they did so (Cohen, 2002: 107). It can help to see what items are actually testing, aiding us in making decisions about which item to keep and which to throw out. Cohen (2002) suggests that the use of verbal report be a part of pretesting. If a test taker has reasonable reasons for marking an item wrong, then the item needs to be written. 4.2 Discussions of the reliability and validity of verbal report approach Verbal report techniques can help us better understand the processes and the test-taking strategies that test takers use. Cohen stated, however, that just as there is a keen interest in using verbal report methods to improve the reliability and validity of assessment instruments, there needs to be an ongoing concern for assuring the reliability and validity of the very verbal report methods that are being used to collect test-taking strategy data. Discussions of reliability and validity of verbal reports have been heard since the 1980s. The discussions can be found in Ericsson and Simon (1984), Grotjahn (1987) and Haastrup (1987). Extensive discussions of reliability and validity in qualitative research can be found in Kirk and Miller (1986), Miles and Huberman (1994), and Denzin and Lincoln (1994). A recent article of Cohen s is said to deal extensively with the issue of improving the reliability and validity of verbal reports, especially concerns about the appropriate use of such measures. The discussed issues include the effects of the immediacy of the verbal report, the benefits of prompting for specifies in verbal report, the advantages of providing guidance in how to produc e verbal report, and the effects of verbal report on task performance. A need is claimed by him to provide greater systematicity both in the collection of such data and in the report of such studies in the research literature, since the use of verbal report techniques is becoming more prevalent in investigating test-taking strategies. 5. Conclusion Though the field of test-taking strategy research is a young and inexperienced one, and though these techniques are still in need of improvement, researchers can find useful descriptions in the literature of techniques for identifying the strategies used by test takers. These research approaches have a great deal of potential for providing evidence for construct validation by complementing the quantitative analysis of test and item scores. 92

The findings from the growing test-taking strategy research area will prove beneficial in constructing, administering, and interpreting language tests. References: [1] Anderson et al. (1991). An Exploratory Study into the Construct Validity of a Reading Comprehension Test: Triangulation of Data Sources. Language Testing, 8(1): 41-66. [2] Bachman, Lyle F. (1999). Fundamental Considerations in Language Testing. Shanghai: Shanghai Foreign Language Education Press. [3] Cohen, A. D. (1984). On Taking Language Tests: What the Students Report. Language Testing, 1(1): 70-80. [4] Cohen, A. D. (2002). Strategies and Processes in Test Taking and SLA. In: Lyle F. Bachman and Andrew D. Cohen, eds. Interfaces Between Second Language Acquisition and Language Testing Research. Beijing: Foreign Language Teaching and Research Press. [5] Hughes, A. (2000). Testing for Language Teacher. Beijing: Foreign Language Teaching and Research Press. [6] LIU Run-qing & HAN Bao-cheng. (2000). Language Testing and Its Methods. Beijing: Foreign Language Teaching and Research Press. [7] Messick, S. A. (1989). Educational Measurement. 3 rd Ed. Linn, R. L. New York: American Council on Education/Macmillan. [8] YANG Hui-zhong & Weir. C. (1998). Validation Study of the National College English Test. Shanghai: Shanghai Foreign Language Education Press. (Edited by Flora, Doris and Jessica) 93