Abstract. Background and significance. Introduction. Valmi D. Sousa PhD RN 1 and Wilaiporn Rojjanasrirat PhD RNC IBCLC 2

Similar documents
Questionnaire Translation

Cross-Cultural Equivalence and Psychometric Properties of the Portuguese Version of the Depressive Cognition Scale

Translation of research instruments: research processes, pitfalls and challenges

Testing the Multiple Intelligences Theory in Oman

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

Adaptation and evaluation of early intervention for older people in Iranian.

DATA GATHERING METHOD

The Bengali Adaptation of Edinburgh Postnatal Depression Scale

ADMS Sampling Technique and Survey Studies

Reliability. Internal Reliability

1. Evaluate the methodological quality of a study with the COSMIN checklist

Examining the efficacy of the Theory of Planned Behavior (TPB) to understand pre-service teachers intention to use technology*

DESIGNING AND VALIDATION OF SOCIO- PSYCHOLOGICAL AND SEXUAL CONSTRAINTS QUESTIONNAIRE FOR FEMALE ATHLETES

Original Article. Relationship between sport participation behavior and the two types of sport commitment of Japanese student athletes


Critical Evaluation of the Beach Center Family Quality of Life Scale (FQOL-Scale)

By Hui Bian Office for Faculty Excellence

11-3. Learning Objectives

Validity and reliability of measurements

Rosenberg Self-Esteem Scale (RSES)

Editorial: An Author s Checklist for Measure Development and Validation Manuscripts

Social Support as a Mediator of the Relationship Between Self-esteem and Positive Health Practices: Implications for Practice

ASSESSING THE UNIDIMENSIONALITY, RELIABILITY, VALIDITY AND FITNESS OF INFLUENTIAL FACTORS OF 8 TH GRADES STUDENT S MATHEMATICS ACHIEVEMENT IN MALAYSIA

THE USE OF CRONBACH ALPHA RELIABILITY ESTIMATE IN RESEARCH AMONG STUDENTS IN PUBLIC UNIVERSITIES IN GHANA.

Pædiatric Rheumatology InterNational Trials Organisation

a, Emre Sezgin a, Sevgi Özkan a, * Systems Ankara, Turkey

Reliability and Validity checks S-005

PTHP 7101 Research 1 Chapter Assignments

Volume 10 Number 4

Importance of Good Measurement

International Trade and Finance Association. Cultural Materialism: Where East Meets West

Text-based Document. Psychometric Evaluation of the Diabetes Self-Management Instrument-Short Form (DSMI-20) Lin, Chiu-Chu; Lee, Chia-Lun

Patient Reported Outcomes (PROs) Tools for Measurement of Health Related Quality of Life

ʻThe concept of Deaf identity in Sloveniaʼ

how good is the Instrument? Dr Dean McKenzie

The Assessment in Advanced Dementia (PAINAD) Tool developer: Warden V., Hurley, A.C., Volicer, L. Country of origin: USA

Validity and reliability of measurements

Sample size for pre-tests of questionnaires. PERNEGER, Thomas, et al. Abstract

Analysis of the Reliability and Validity of an Edgenuity Algebra I Quiz

CHAPTER VI RESEARCH METHODOLOGY

Factor structure of the Schalock and Keith Quality of Life Questionnaire (QOL-Q): validation on Mexican and Spanish samples

Bangla translation, adaptation and piloting of Edinburgh Postnatal Depression Scale

The Concept of Validity

Examining of adaptability of academic optimism scale into Turkish language

Table S1. Search terms applied to electronic databases. The African Journal Archive African Journals Online. depression OR distress

Translation, Cross-Cultural Adaptation, Validation and Reliability of the Northwestern Dysphagia Patient Check Sheet (NDPCS) in Iran

PDFlib PLOP: PDF Linearization, Optimization, Protection. Page inserted by evaluation version

Test Reactivity: Does the Measurement of Identity Serve as an Impetus for Identity Exploration?

The Influence of Psychological Empowerment on Innovative Work Behavior among Academia in Malaysian Research Universities

Freire EAM *,**,***, Bruscato A ***,****, Leite DRC *, Sousa TTS *, Ciconelli RM **,*** Abstract. Introduction

Impacts of Instant Messaging on Communications and Relationships among Youths in Malaysia

Assessing Cultural Competency from the Patient s Perspective: The CAHPS Cultural Competency (CC) Item Set

2013 Supervisor Survey Reliability Analysis

SW 9300 Applied Regression Analysis and Generalized Linear Models 3 Credits. Master Syllabus

PSYCHOMETRIC ASSESSMENT OF THE CHINESE LANGUAGE VERSION OF THE ST. GEORGE S RESPIRATORY QUESTIONNAIRE IN TAIWANESE PATIENTS WITH BRONCHIAL ASTHMA

Modeling the Influential Factors of 8 th Grades Student s Mathematics Achievement in Malaysia by Using Structural Equation Modeling (SEM)

HOW TO DESIGN AND VALIDATE MY PAIN QUESTIONNAIRE?

Nursing Practice Today

Translation and Validation of Study Instruments for Cross-Cultural Research

Journal of Nursing Education and Practice, 2013, Vol. 3, No. 11

Confirmatory factor analysis (CFA) of first order factor measurement model-ict empowerment in Nigeria

Packianathan Chelladurai Troy University, Troy, Alabama, USA.

Iranian Journal of Diabetes and Lipid Disorders; 2011; Vol 10, pp 1-6

Personal Style Inventory Item Revision: Confirmatory Factor Analysis

Assessing the Validity and Reliability of a Measurement Model in Structural Equation Modeling (SEM)

Collecting & Making Sense of

No part of this page may be reproduced without written permission from the publisher. (

ESTABLISHING VALIDITY AND RELIABILITY OF ACHIEVEMENT TEST IN BIOLOGY FOR STD. IX STUDENTS

A Study on Adaptation of the Attitudes toward Chemistry Lessons Scale into Turkish

Rong Quan Low Universiti Sains Malaysia, Pulau Pinang, Malaysia

Intercultural Communication Apprehension Scale (PRICA): Validity and Reliability Study in Turkish

Internal structure evidence of validity

Validation in the Cross-Cultural Adaptation of the Korean Version of the Oswestry Disability Index

Factorial Validity and Consistency of the MBI-GS Across Occupational Groups in Norway

Evaluating the Reliability and Validity of the. Questionnaire for Situational Information: Item Analyses. Final Report

Career Counseling and Services: A Cognitive Information Processing Approach

On the usefulness of the CEFR in the investigation of test versions content equivalence HULEŠOVÁ, MARTINA

Cross-cultural influences affect people s perceptions and. Translating Instruments Into Other Languages: Development and Testing Processes

Review of Various Instruments Used with an Adolescent Population. Michael J. Lambert

Relationships between stage of change for stress management behavior and perceived stress and coping

Reliability and Validity of the Hospital Survey on Patient Safety Culture at a Norwegian Hospital

Downloaded from aud.tums.ac.ir at 18:12 IRST on Saturday March 9th (Categorization of Auditory Performance: CAP)

Title: Reliability and validity of the adolescent stress questionnaire in a sample of European adolescents - the HELENA study

Factor structure and psychometric properties of the General Health Questionnaire. (GHQ-12) among Ghanaian adolescents

Role of Callous-Unemotional Traits in prediction of Childhood behavior problems

CHAPTER-III METHODOLOGY

Chapter 3. Psychometric Properties

Statistics for Psychosocial Research Session 1: September 1 Bill

Development of a measure of self-regulated practice behavior in skilled performers

A Short Form of Sweeney, Hausknecht and Soutar s Cognitive Dissonance Scale

The Scoliosis Research Society-22 questionnaire adapted for adolescent idiopathic scoliosis patients in China: reliability and validity analysis

Screening Drug, Alcohol and Substance Abuse the Psychometric Measures

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW

Gezinskenmerken: De constructie van de Vragenlijst Gezinskenmerken (VGK) Klijn, W.J.L.

Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist

Abstract. Key words: bias, culture, Five-Factor Model, language, NEO-PI-R, NEO-PI-3, personality, South Africa

PÄIVI KARHU THE THEORY OF MEASUREMENT

Development and validation of the alcohol-related God locus of control scale

The Validity And Reliability Of The Turkish Version Of The Perception Of False Self Scale

An International Study of the Reliability and Validity of Leadership/Impact (L/I)

Transcription:

Journal of Evaluation in Clinical Practice ISSN 1365-2753 Translation, adaptation and validation of instruments or scales for use in cross-cultural health care research: a clear and user-friendly guidelinejep_1434 268..274 Valmi D. Sousa PhD RN 1 and Wilaiporn Rojjanasrirat PhD RNC IBCLC 2 1 Associate Professor, The University of Kansas School of Nursing, Kansas City, Kansas, USA 2 Associate Professor, Graceland University School of Nursing, Independence, Missouri, USA Keywords back-translation, cross-cultural validation, health care research, translation Correspondence Valmi D. Sousa The University of Kansas School of Nursing 3901Rainbow Boulevard Kansas City, KS 66160 USA E-mail: vsousa@kumc.edu Accepted for publication: 3 February 2010 doi:10.1111/j.1365-2753.2010.01434.x Abstract Rationale, aims and objectives The diversity of the population worldwide suggests a great need for cross-culturally validated research instruments or scales. Researchers and clinicians must have access to reliable and valid measures of concepts of interest in their own cultures and languages to conduct cross-cultural research and/or provide quality patient care. Although there are well-established methodological approaches for translating, adapting and validating instruments or scales for use in cross-cultural health care research, a great variation in the use of these approaches continues to prevail in the health care literature. Therefore, the objectives of this scholarly paper were to review published recommendations of cross-cultural validation of instruments and scales, and to propose and present a clear and user-friendly guideline for the translation, adaptation and validation of instruments or scales for cross-cultural health care research. Methods A review of highly recommended methodological approaches to translation, adaptation and cross-cultural validation of research instruments or scales was performed. Recommendations were summarized and incorporated into a seven-step guideline. Each one of the steps was described and key points were highlighted. Example of a project using the proposed steps of the guideline was fully described. Conclusions Translation, adaptation and validation of instruments or scales for crosscultural research is very time-consuming and requires careful planning and the adoption of rigorous methodological approaches to derive a reliable and valid measure of the concept of interest in the target population. Introduction Globalization and migration have contributed to an increasing diversity of the population in many countries, particularly in the USA, where the background of the population is extremely diverse regarding culture, language and ethnicity [1,2]. Therefore, there is a need for relevant cross-cultural research to addresses a number of problems among these multinational and multicultural populations. However, health care researchers conducting crosscultural studies must have access to reliable and cross-validated instruments in other cultures and/or in other languages [3 6]. Findings from cross-cultural research may have great clinical implications for physicians, nurses and other health care professionals who provide care for diverse populations because the delivery of quality care depends on the accurate assessment and deeper understanding of an individual s cultural, linguistic and ethnic background. Background and significance The increase in diverse populations worldwide and the need for cross-cultural and multinational research indicate a great need for clinicians and researchers to have access to reliable and valid instruments or measures cross-validated among diverse cultural segments of the population and/or in other languages [3,4,6]. This would enhance the validity, the generalization and the translation of cross-cultural health care research. Although there are wellestablished methodological approaches for translating, adapting and validating instruments for use in cross-cultural health care research [2 5,7 15], a great variation in the use of these approaches continues to prevail in the health care literature. A recent review of 47 methodological studies focusing on the translation and validation of instruments for cross-cultural research reported that the quality and methodological approaches of the reviewed studies varied greatly [16]. There was no clear 268 2010 Blackwell Publishing Ltd, Journal of Evaluation in Clinical Practice 17 (2011) 268 274

V.D. Sousa and W. Rojjanasrirat Validation of instruments or scales consensus among researchers on how the approaches should be used or combined, a great variation on the qualifications of translators, and a lack of detailed information about the translation, back-translation, validation, testing, and revision and refinement of the instruments. Corroborating these findings, another researcher [2] reported that, unfortunately, translating, adapting and cross-culturally validating research an instrument is treated as an unimportant step of study protocols. Furthermore, the most commonly used and reported methodological approach was the forward translation only, often using an unqualified translator. In addition, the same author [2] reported that only a few researchers have described the use of strategies and steps or processes for the adaptation and/or validation of the instruments, and emphasized that it is not sufficient to forward translate an instrument without carefully evaluating its adaptation and cross-cultural validation. The procedure should consist of a comprehensive process that involves not only translation of an instrument, but also thorough evaluation of its adaptation and cross-cultural validation. These data suggest that despite the existing recommendations and guidelines to use a comprehensive multistep process for translating, adapting and cross-validating instruments, researchers have not been doing this. It may be that the methodological approaches are not clearly presented in a user-friendly format, which makes it difficult for researchers to adopt and follow the recommendations. Therefore, the objectives of this scholarly paper were to review published recommendations of crosscultural validation of instruments and scales, and to propose and present the highly recommended methodological approaches for translating, adapting and validating instruments for crosscultural health care research in a clear and a user-friendly guideline. This would lead to better understanding and use of these approaches by health care researchers, particularly nurse researchers worldwide. Methodological approaches Of the two main categories of translation (symmetrical and asymmetrical), the symmetrical category is the most recommended approach because it refers to faithfulness of meaning and colloquialness in both the source language (SL; original language of the instrument) and the target language (TL; desired language) and not to a literal translation [12]. The purpose of translation is to achieve equivalence between the instrument in the SL and the instrument in the TL [2]. The symmetrical translation is the only category that facilitates the comparison of responses from individuals of one culture to those of another [6,12,13] and the determination of the most relevant types of cross-cultural equivalence: the semantic, conceptual, content, technical and criterion [1,17]. In addition, the process known as centring [7], in which both the SL and the TL of an instrument are equally important, should be used. The process of translation, adaptation and cross-cultural validation of an instrument for use in other cultures, languages and countries requires careful planning and adoption of comprehensive, rigorous and most established methodological approaches [2 5,7 15]. Because there are variations among these approaches, we have incorporated the most recommended ones in a user-friendly guideline to facilitate adoption, consistency and use. Step 1: translation of the original instrument into the target language (forward translation or one-way translation) The instrument in the source (original) language is forward translated to the TL (target language) by at least two independent translators, preferably certified, whose mother language is the desired TL of the instrument. The translators must be bilingual (i.e. fluent in the source and desired TL of the instrument) and preferably bicultural (i.e. having in-depth experience in the culture of the source and desired TL of the instrument). In addition, the two translators must have distinct backgrounds. The first translator must be knowledgeable about health care terminology and the content area of the construct of the instrument in the desired TL. The second translator must be familiar with colloquial phrases, health care slang and jargon, idiomatic expressions, and emotional terms in common use in the desired TL. The second translator should not be knowledgeable about medical terminology and/or the construct of the instrument. This approach will generate two translated versions that contain words and sentences that cover both the medical and the usual spoken language with its cultural nuances. Therefore, choosing well-qualified translators is the key to high-quality translations. If resources are available, translations can also be done by two teams of independent translators (each team of translators must have the same characteristics as the two individual independent translators described above), which may result in higher-quality translations by minimizing the introduction of personal idiosyncrasies when using only two independent translators. 1 Instrument in the SL translated to TL (TL1 and TL2) to produce two forward-translated versions of the instrument. 2 Use two bilingual and bicultural translators whose mother language is the desired TL, but who have distinct backgrounds: One translator must be knowledgeable about health terminology and the content area of the construct of the instrument in the TL. The other translator must be knowledgeable about the cultural and linguistic nuances of the TL. 3 Two independent teams of translators can also be used (each team of translators must have the same characteristics of the two individual independent translators). Step 2: comparison of the two translated versions of the instrument (TL1 and TL2): synthesis I The instructions, the items and the response format of the two forward-translated versions of the instrument (TL1 and TL2) and both the TL1 and the TL2 with the original version of the instrument in the SL are initially compared by a third bilingual and preferably bicultural independent translator regarding ambiguities and discrepancies of words, sentences and meanings. Any 2010 Blackwell Publishing Ltd 269

Validation of instruments or scales V.D. Sousa and W. Rojjanasrirat ambiguities and discrepancies must be discussed and resolved using a committee approach. Consensus should be achieved with the participation of the third translator, the two translators from Step 1, and the investigator and/or other members of the research team. This process will generate the preliminary initial translated version of the instrument in the TL (PI-TL). 1 Use a third independent translator to compare the TL1 and TL2, and to compare both the TL1 and TL2 with the SL version of the instrument. 2 Use a committee approach (third independent individual or translator, translators who participated in Step 1, and investigator and/or other members of research team) to resolve ambiguities and discrepancies and derive the PI-TL) Step 3: blind back-translation (blind backward translation or blind double translation) of the preliminary initial translated version of the instrument The PI-TL is translated back into the SL by two other independent translators with the same qualifications and characteristics described above in Step 1. For this step, the translators mother language should be the SL of the original instrument, and they should be completely blind to the original version of the instrument (they had never seen the original version of the instrument). They will produce two back-translated versions of the instrument. Again, the first translator must be knowledgeable about health care terminology and the content area of the construct of the instrument in the SL, but no prior knowledge of the instrument being backtranslated. The second translator must be familiar with colloquial phrases, health care slang and jargon, idiomatic expressions, and emotional terms in common in the SL. The second translator should not be knowledgeable about medical terminology and/or construct of the instrument and should have no prior knowledge of the instrument being back-translated as well. If resources are available, back-translation can also be done by two teams of translators, which may result in higher-quality back-translations by minimizing the introduction of personal idiosyncrasies when using only one independent back-translator to generate each initial backtranslated version of the instrument. This process will result in two back-translated versions of the instrument in its original SL (B-TL1 and B-TL2). This step allows for clarification of words and sentences used in the translations. As noted in Step 1, choosing well-qualified translators is the key to high-quality back-translations. 1 The PI-TL Back-translated to SL (B-TL1 and B-TL2) to produce two back-translated versions. 2 Use two bilingual and bicultural translators whose mother language is the SL, but who have distinct backgrounds: One translator must be knowledgeable about health terminology and the content area of the construct of the instrument in the SL. The other translator must be knowledgeable about the cultural and linguistic nuances of the SL. 3 Two independent teams of translators can also be used (each team of translators must have the same characteristics of the two individual independent translators). Step 4: comparison of the two back-translated versions of the instrument (B-TL1 and B-TL2): synthesis II Initially, the instructions, items and response format of the two back-translations (B-TL1 and BTL2) are compared by a multidisciplinary committee with the instructions, items and response format of the original instrument in the SL regarding format, wording, and grammatical structure of the sentences, similarity in meaning, and relevance. It is highly recommended that the committee should include at least one methodologist (who can be the investigator and/or a member of the research team), one health care professional who is familiar with the content areas of the construct of the instrument, and all four bilingual and bicultural translators involved in Step 1 (forward translation of the instrument into the TL) and Step 3 (back-translation of the instrument from the TL into the SL). It is also recommended that the developer of the original instrument in the SL participated and provide insights on the construct of the instrument and clarify any questions that might arise. Having at least one monolingual committee member whose mother language is the TL of the instrument would enhance the quality of the pre-final version of the translated instrument. Any ambiguities and discrepancies regarding cultural meaning and colloquialisms or idioms in words and sentences of the instructions, the items, and the response format between the two back-translations (B-TL1 and B-TL2) and between each one of the two back-translations (B-TL1 and B-TL2) and the original instrument in the SL are discussed and resolved through consensus among the committee members to derive a pre-final version of the instrument in the TL (P-FTL). If discrepancies cannot be resolved, it may be necessary to repeat Steps 1 though 4: two other independent bilingual and bicultural translators must be used to translate the original instrument (SL) again to generate two translations, and two other independent bilingual and bicultural translators must be used to backtranslate the translated versions of the instrument (TL) following the same procedure described above (known as the repetition approach). Alternatively, only items that do not retain their original meaning are re-translated and back-translated. The evaluation of the translated and back-translated versions follows the same validation process described above. This process is repeated until no ambiguities or discrepancies are found. These methodological approaches of Step 4 will establish the initial conceptual, semantic and content equivalence of the P-FTL. Conceptual equivalence refers to the degree to which a concept of the items of the instrument exists in both the source and target cultures. Semantic equivalence refers to sentence structure, colloquialisms or idioms that ensure that the meaning of the text or idea of the items of the instrument in the SL is present in the TL. Finally, content equivalence refers to the relevance and pertinence of the text or idea of the items of the instrument in each culture. The committee s role is to evaluate, revise and consolidate the instructions, items and response format of the back-translated instruments that have conceptual, semantic and content equivalency and to develop the P-FTL for pilot and psychometric testing. 270 2010 Blackwell Publishing Ltd

V.D. Sousa and W. Rojjanasrirat Validation of instruments or scales 1 Comparison between the two back-translations (B-TL1 and B-TL2) of the instrument, and between both BTL1 and B-TL2 and the original SL instrument: Evaluate similarity of the instructions, items and response format regarding wording, sentence structure, meaning and relevance. 2 Use a multidisciplinary committee: One methodologist (researcher or a member of the research team). One health care professional. All four bilingual and bicultural translators used in Step 1 and Step 3: two translators whose mother language is the desired TL of the instrument and two translators whose mother language is the SL of the original instrument. 3 If possible, developer of the original instrument should participate in the discussions. 4 If ambiguities and discrepancies cannot be resolved, Steps 1 through 4 may be repeated as many times as necessary. Alternatively, only items that do not retain their original meaning are re-translated and back-translated. Step 5: pilot testing of the pre-final version of the instrument in the target language with a monolingual sample: cognitive debriefing The P-FTL is pilot tested among participants whose language is the TL of the instrument to evaluate the instructions, response format and the items of the instrument for clarity. Participants should be recruited from the target population in which the instrument will be used (e.g. if the instrument measures self-care among individuals with type 2 diabetes, then the sample must consist of individuals with type 2 diabetes). A sample size of 10 40 individuals is recommended [3,18]. Each participant is asked to rate the instructions and items of the scale using a dichotomous scale (clear or unclear). Participants who rate the instructions, response format or any item of the instrument as unclear are asked to provide suggestions as to how to rewrite the statements to make the language clearer. Instructions, response format and items of the instrument that are found to be unclear by at least 20% of the sample must be re-evaluated [19]. Therefore, the minimum inter-rater agreement among the sample is 80%. This step is used to further support the conceptual, semantic and content equivalency of the translated instrument and further improve the structure of sentences used in the instructions and items of the P-FTL to be easily understood by the target population prior to psychometric testing. To further determine the conceptual and content equivalence of the items of the P-FTL, use of an expert panel, is highly recommended. The instructions, response format and the items of the instrument are evaluated for conceptual equivalence (clarity) by six to ten members of an expert panel [20,21] who are knowledgeable about the content areas of the construct of the instrument and the target population in which the instrument will be used and whose mother language is the TL of the instrument. When possible, a committee of 10 members is preferable [20,21]. Each member of the committee who rates the instructions, response format or any item of the instrument as unclear is asked to provide suggestions as to how to rewrite the statements and make the language clearer. Instructions, response format and items of the instrument that are found to be unclear by at least 20% of the committee members must be revised and re-evaluated [19]. The minimum inter-rater agreement among the experts panel is 80%). This process will further determine the conceptual equivalence of the translated instrument. The expert panel is then asked to evaluate each item of the instrument for content equivalence (content-related validity [relevance]) using the following scale: 1 = not relevant; 2 = unable to assess relevance; 3 = relevant but needs minor alteration; 4 = very relevant and succinct. Items classified as 1 (not relevant) or 2 (unable to assess relevance) should be revised [20,21]. Content validity index at the item level (I-CVI) and at the scale level (S-CVI) should be calculated. There are three methods to calculate S-CVI [20 22], but the averaging calculation (S-CVA/ Ave) method is preferred [22]. Using 10 experts, the I-CVI of 0.78 or above [20] and S-CVA/Ave of 0.90 or above [21] are the minimum acceptable indices. Items that do not achieve the minimum acceptable indices are revised and re-evaluated. New content validity indices are calculated. The process continues until acceptable indices of content-related validity or content equivalence are achieved. It is also recommended that the kappa coefficient of agreement be determined to increase confidence in the content validity of the instrument [23]. A kappa of 0.60 is generally the minimum acceptable coefficient to determine good agreement [24]. The purpose of Step 5 is to continue developing the P-FTL for pre-field test for preliminary and/or full psychometric testing. 1 Pilot test of the P-FTL among individuals whose language is the TL of the instrument: Evaluate the instructions, items and response format clarity. Use a sample size of 10 40 participants. 2 It is highly recommended to use an expert panel to further examine the instrument for: Clarity of the instructions, items and response format. Content equivalence (content-related validity) using I-CVI, S-CVI/Ave and Kappa coefficient of agreement. Use a sample of 6 10 experts (10 experts are preferred). Step 6: preliminary psychometric testing of the pre-final version of the translated instrument with a bilingual sample This step is rarely used; however, when a bilingual population is accessible, it is recommended that the instrument be pre-field tested among bilingual individuals (fluent in the SL of the original instrument and the TL of the translated instrument). If this is not possible, skip this step and move to Step 7. In general, the recommendation is to use at least five subjects per item of the instrument to conduct the preliminary psychometric testing of a new instrument [25]. Ideally, the bilingual sample should be from the target population in which the instrument will be used (e.g. adult individuals with type 2 diabetes, African American women with heart failure). However, in many instances, this may be difficult and unrealistic; thus other alternatives can be used such as sampling bilingual college students and faculty or workers in travel agencies, currency exchange agencies, international trade companies, embassies and consulates, and language schools. 2010 Blackwell Publishing Ltd 271

Validation of instruments or scales V.D. Sousa and W. Rojjanasrirat Initially, participants are given the P-FTL and are asked to answer the items. The participants respond to the items of the P-FTL without seeing the original instrument in the SL. After completion of the P-FTL, participants are given the original instrument in the SL and are asked to answer the items. They may complete a demographic questionnaire and/or other instruments of interest. The order of the items of the original instrument must be mixed to be in a different order from that of the items of the P-FTL. Responses on both versions of the instrument are then compared (i.e. interpretation of scores is the same in both cultures) to establish criterion equivalency (a type of construct validity). Statistical analyses used for comparison purposes may consist of descriptive statistics, correlation coefficients, and paired t-test or one-way anova. Scale and item analysis is also used to establish the initial preliminary psychometric properties of the instrument (internal consistency reliability) and to compare these properties of the P-FTL with the SL of the original instrument. When the instrument purpose is to serve as a diagnostic or screening testing, preliminary calculation of sensitivity and specificity is recommended. This Step 6 also determines initial technical equivalency (the method of assessment) and is useful to support the conceptual, semantic, content and construct validity of the P-FTL prior to conducting full psychometric field testing. 1 When possible, pilot test the P-FTL among bilingual individuals to: Compare the P-FTL and the SL instrument in the SL. Establish criterion equivalency and further support the conceptual, semantic, content and construct equivalency of the P-FTL. 2 Use at least five subjects per item of the instrument. 3 Subjects complete the P-FTL first without seeing the original instrument in the SL. 4 Subjects complete the original instrument in the SL in which items have been mixed in different order from the P-FTL. Step 7: full psychometric testing of the pre-final version of the translated instrument in a sample of the target population This last step is used to establish the initial full psychometric properties of the newly translated, adapted and cross-validated instrument with a sample of the target population of interest. The sample size for this step depends on the types of psychometric approaches that will be used. The more complete the psychometric approaches for evaluation of the translated instrument the more confidence will be generated in its reliability and validity properties. In general, per rule of thumb, it is highly recommended to use at least 10 subjects per item of the instrument scale and item analysis and exploratory factor analysis [25 28]. If there is a plan to use confirmatory factor analysis to test the factor structure of the instrument, the recommendation per rule of thumb is approximately 300 500 subjects per item of the instrument [28,29]. Power analysis based on the number of degrees of freedom, an alpha level (0.05 or 0.01), and a desired power (80% or above) can also be calculated [30,31]. The most recommended and commonly used psychometric approaches in this step are estimation of: (1) internal consistency reliability (or sensitivity and specificity); (2) stability reliability (test retest reliability); (3) homogeneity; (4) construct-related validity such as convergent and/or divergent (discriminant) validity; (5) criterion-related validity such as concurrent and/or predictive validity; (6) factor structure of the instrument (dimensionality); and (7) model fit. Although, it is not the purpose of this user-friendly guideline to describe the many statistical approaches that can be used in Step 7, the most common statistical approaches are scale and item analysis, Pearson s correlation analysis, exploratory factor analysis and confirmatory factor analysis. The purpose of the Step 7 is to revise and refine the items of the P-FTL as needed to derive the final psychometrically sound FTL consisting of adequate estimates of reliability, homogeneity, and validity and with a stable factor structure and/or model fit. 1 Full psychometric testing of the P-FTL among individuals from the target population to: Revise and refine the items of the final version of the instrument in the TL. Establish internal consistency reliability (or sensitivity and specificity), stability reliability, homogeneity, construct-related validity, criterion-related validity, factor structure and model fit of the instrument. 2 Use at least 10 subjects per item of the instrument for general psychometric approaches (scale and item analysis, Pearson s correlations and exploratory factor analysis). 3 Use 300 500 subjects for confirmatory factor analysis or conduct a power analysis. Example of a project to translate, adapt and validate a research instrument A project to translate, adapt and cross-validate an instrument for cross-cultural research may take several years; and it is normally conducted using more than one study to adhere to the recommended methodological approaches described above. One study might set as its initial goal to translate, adapt and cross-validate a research instrument using Steps 1, 2, 3, 4 and 5 only. In a second study, the researchers might set a single goal to establish the preliminary psychometrics of the translated instrument with bilingual participants using Step 6. Then, in a third study, the researchers goal might be to establish the initial full psychometric properties of a translated instrument in a sample of the target population of interest using the approaches described in Step 7. Depending on the psychometric approaches used in this third study, other studies might be necessary to continue the development and psychometric evaluation of the translated instrument. To illustrate the use of the methodological steps to translate, adapt and cross-validate an instrument and to evaluate the preliminary and initial full psychometric properties of an instrument, we present an example of a project that used two studies to adhere to the recommended guideline. Study one: cross-cultural equivalence and psychometric properties of the Portuguese version of the depressive cognition scale In this study [6], the depressive cognition scale (DCS) was translated from English into Portuguese using Steps 1 and 2, and 272 2010 Blackwell Publishing Ltd

V.D. Sousa and W. Rojjanasrirat Validation of instruments or scales back-translated from Portuguese into English and cross-culturally validated using Steps 3 and 4. Preliminary psychometric evaluation of the scale was conducted with a bilingual sample using Step 6. Note that Step 5, important to determine clarity of the instructions, response format and sentence structure of the items, was not done because the committee was comprised of three members whose mother language was Portuguese who were convinced that all those aspects of the scale were completely clear. The DCS was originally developed in English to measure depressive cognitions [32]. The theoretical basis for the development of the scale was Beck s theory of depression and Erickson s theory of psychosocial development. The DCS consists of eight items on a 6-point Likerttype scale ranging from 0 (strongly disagree) to 5 (strongly agree). Each item of the scale measures a specific cognition: hopelessness, helplessness, purposelessness, powerlessness, worthlessness, loneliness, emptiness and meaninglessness. Using Steps 1 through 4, the DCS English version was translated into Portuguese by two bilingual translators who were fluent in both English and Portuguese languages to generate two versions of the translated scale. The Portuguese versions of the scale were blindly back-translated into English by two different translators who never saw the original version of the DCS in English. The two versions of the translated scale were compared with each other, and each one of these versions was compared with the original English version of the scale to determine its conceptual, semantic and content equivalence using a panel of three Brazilian bilinguals and an American monolingual expert in the content area with consultation with the translators who participated in the translation and back-translation of the versions of the scale. Ambiguities and discrepancies regarding conceptual and semantic equivalence on two items that measured emptiness and purposelessness were discussed and resolved by the committee members. A final version of the translated scale in Portuguese was derived and named Escala Cognitiva de Depressão (ECD). As we stated previously, we skipped Step 5 and proceeded to Step 6 to evaluate the preliminary psychometrics of the Portuguese version of the scale (ECD), with a bilingual sample of individuals from the general population, mostly students and faculty members of a major Brazilian university. This sample was chosen because of the anticipated difficulty to approach and recruit individuals from the target population of interest (i.e. individuals with diabetes mellitus). We used the recommended sample size of five subjects per item of the scale; therefore, 40 bilingual adults participated in the study. Using a two-step approach for data collection, participants received and responded to the Portuguese version of the scale (ECD) only. Then participants answered a demographic questionnaire and completed the original English version of the scale (DCS). We used descriptive statistics, scale and item analysis, and paired t-tests to compare the responses of participants in both the Portuguese (ECD) and the English (DCS) items of the scales. Descriptive statistics showed strong similarities in participants responses on both the Portuguese and English versions of the scale (identical responses ranged from 78% to 98%). Pearson s correlation coefficients between each item of the ECD and each item of the DCS ranged from 0.64 to 0.96 and were statistically significant (P < 0.001). Estimates of internal consistency reliability and homogeneity were similar in both the ECD and DCS versions of the scale (Cronbach s alphas were 0.79 and 0.78, respectively; item-to-total correlation coefficients ranged from 0.32 to 0.75 on the ECD and from 0.29 to 0.77 on the DCS). There was no statistically significant difference between the overall mean score of the ECD (M = 5.05, SD = 3.49) and the overall mean score of the DCS (M = 5.20, SD = 3.56); t(1,39) = 0.76, P = 0.45. At the item level, a statistically significant difference was found between the mean score of item 2 on the ECD (M = 0.80, SD = 0.76) and the mean score of item 2 on the DCS (M = 0.90, SD = 0.74); t(1,39) = 2.08, P = 0.04. The findings of Step 6 suggested adequate criterion equivalence and supported the conceptual, semantic and content validity of the ECD. The ECD had an adequate internal consistency reliability, homogeneity and initial support for construct validity. The full psychometric properties of the ECD could be further tested among individuals from the target population of interest. Study two: psychometric properties of the Portuguese version of the depressive cognition scale in Brazilian adults with diabetes mellitus The second study [33], was conducted to continue the psychometric evaluation of the ECD using Step 7. The purpose of the study was to evaluate the full psychometric properties of the ECD among adults with diabetes mellitus (i.e. target population of interest). We used the recommended sample size of at least 10 subjects per item of the scale. The sample consisted of 82 Brazilian adults with diabetes mellitus. The estimate of internal consistency reliability was a Cronbach s alpha of 0.88. Scale and item analysis indicated that the inter-item correlation coefficients ranged from 0.37 to 0.69 and the item-to-correlation coefficients ranged from 0.51 to 0.71. Exploratory factor analysis indicated that the ECD was unidimensional, explaining 56.73% of its item variance, and had factor loadings ranging from 0.60 to 0.80 and communality values ranging from 0.36 to 0.67. In addition, the ECD total score was statistically significantly correlated with the Portuguese version of the Beck Depression Inventory (r = 0.24, P < 0.05). The study findings further supported the reliability, homogeneity and construct-related validity of the ECD among Brazilians with diabetes mellitus. Conclusions The diversity of the population worldwide shows a great need for cross-culturally validated instruments or scales. Translation, adaptation and validation of an instrument or scale for cross-cultural research is time-consuming and requires careful planning and adoption of rigorous methodological approaches to derive a reliable and valid measure of the concept of interest in the target population. This scholarly paper has reviewed and incorporated the highly recommended methodological approaches in a clear and user-friendly guideline. The adoption of symmetrical translation using centring process will lead to more accurate adaptation and cross-cultural validation of a translated instrument. The steps of the proposed guideline provide clear directions to cross-culturally validate research instruments or scales. Choosing the right translators and committee members is key aspect that must be considered to enhance the quality of the translation, back-translation and cross-validation of an instrument or scale. Pilot testing of a translated instrument or scale among participants whose language is the TL of the instrument to evaluate the instructions, the items and the 2010 Blackwell Publishing Ltd 273

Validation of instruments or scales V.D. Sousa and W. Rojjanasrirat response format of the instrument for clarity also enhances the quality of the final version of the translated instrument. Finally, using well-established approaches to test the preliminary psychometric properties of the translated instrument or scale among bilingual individuals and full psychometrics of the translated instrument or scale among individuals from the target population of interest will derive a reliable and valid scale in the TL of interest. References 1. Hilton, A. & Skrutkowski, M. (2002) Translating instruments into other languages: development and testing processes. Cancer Nursing, 25 (1), 1 7. 2. Sperber, A. D. (2004) Translation and validation of study instruments for cross-cultural research. Gastroenterology, 126 (1), S124 S128. 3. Beaton, D. E., Bombardier, C., Guillemin, F. & Ferraz, M. B. (2000) Guidelines for the process of cross-cultural adaptation of self-report measures. Spine, 25 (24), 3186 3191. 4. Beaton, D. E., Bombardier, C., Guillemin, F. & Ferraz, M. B. (2002) Recommendations for the cross-cultural adaptation of health status measures. Available at: http://www.dash.iwh.on.ca/assets/images/ pdfs/xculture2002.pdf (last accessed 1 October 2009). 5. Bracken, B. A. & Barona, A. (1991) State of the art procedures for translating, validating and using psychoeducational tests in crosscultural assessment. School Psychology International, 12 (1 2), 119 132. 6. Sousa, V. D., Zauszniewski, J. A., Mendes, I. A. C. & Zanetti, M. L. (2005) Cross-cultural equivalence and psychometric properties of the Portuguese version of the depressive cognition scale. Journal of Nursing Measurement, 13 (2), 87 99. 7. Brislin, R. W. (1970) Back-translation for cross-cultural research. Journal of Cross-Cultural Psychology, 1 (3), 185 216. 8. Brislin, R. W. (1986) The wording and translation of research instruments. In Field Methods in Cross-Cultural Research (eds W. J. Lonner & J. W. Berry), pp. 137 164. Beverly Hills, CA: Sage Publications. 9. Chapman, D. W. & Carter, J. F. (1979) Translation procedures for the cross cultural use of measurement instruments. Educational Evaluation and Policy Analysis, 1 (3), 71 76. 10. Guillemin, F., Bombardier, C. & Beaton, D. (1993) Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. Journal of Clinical Epidemiology, 46 (12), 1417 1432. 11. Jones, E. (1987) Translation of quantitative measures for use in crosscultural research. Nursing Research, 36 (5), 324 327. 12. Jones, E. G. & Kay, M. (1992) Instrumentation in cross-cultural research. Nursing Research, 41 (3), 186 188. 13. Jones, P. S., Lee, J. W., Phillips, L. R., Zhang, X. E. & Jaceldo, K. B. (2001) An adaptation of Brislin s translation model for cross-cultural research. Nursing Research, 50 (5), 300 304. 14. McDermott, M. A. N. & Palchanes, K. (1994) A literature review of the critical elements in translation theory. Image: Journal of Nursing Scholarship, 26 (2), 113 117. 15. Wild, D., Grove, A., Martin, M., Eremenco, S., McElroy, S., Verjee-Lorenz, A. & Erikson, P. (2005) Principles of good practice for the translation and cultural adaptation processs for patient-reported outcomes (PRO) measures: report of the ISPOR task force for translation and cultural adaptation. Value in Health, 8 (2), 94 104. 16. Maneesriwongul, W. & Dixon, J. K. (2004) Instrument translation process: a method review. Journal of Advanced Nursing, 48 (2), 175 186. 17. Beck, C. T., Bernal, H. & Froman, R. D. (2003) Methods to document semantic equivalence of a translated scale. Research in Nursing and Health, 26 (1), 64 73. 18. Sousa, V. D., Hartman, S. W., Miller, E. H. & Carroll, M. A. (2009) New measures of diabetes self-care agency, diabetes self-efficacy, and diabetes self-management for insulin-treated individuals with type 2 diabetes. Journal of Clinical Nursing, 18 (9), 1305 1312. 19. Topf, M. (1986) Three estimates of interrater reliability for nominal data. Nursing Research, 35 (4), 253 255. 20. Lynn, M. R. (1986) Determination and quantification of content validity. Nursing Research, 35 (6), 382 385. 21. Waltz, C. F., Strickland, O. L. & Lenz, E. R. (2005) Measurement in Nursing and Health Research, 3rd edn. New York: Springer Publishing Company. 22. Polit, D. F. & Beck, C. T. (2006) The content validity index: are you sure you know what s being reported? Critique and recommendations. Research in Nursing and Health, 29 (5), 489 497. 23. Wynd, C. A., Schmidt, B. & Schaefer, M. A. (2003) Two quantitative approaches for estimating content validity. Western Journal of Nursing Research, 25 (5), 508 518. 24. Streiner, D. L. & Norman, G. R. (2008) Health Measurement Scales: A Practical Guide for Their Development and Use, 4th edn. New York: Oxford University Press. 25. Nunnally, J. C. & Bernstein, I. H. (1994) Psychometric Theory, 3rd edn. New York: McGraw-Hill. 26. Hair, J. F. J., Anderson, R. E., Tatham, R. L. & Black, W. C. (1998) Multivariate Data Analysis, 5th edn. Englewood Cliffs, NJ: Prentice- Hall, Inc. 27. Stevens, J. (2002) Applied Multivariate Statistics for the Social Sciences, 4th edn. Mahwah, NJ: Lawrence Erlbaum Associates. 28. Tabachnick, B. G. & Fidell, L. S. (2001) Using Multivariate Statistics, 4th edn. Needham Heights, MA: Allyn & Bacon. 29. Gonzalez, R. & Griffen, D. (2001) Testing parameters in structure equation modeling: every one matters. Psychological Methods, 6(3), 258 269. 30. MacCallum, R. C., Browne, M. W. & Sugawara, H. M. (1996) Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1 (2), 130 149. 31. MacCallum, R. C., Browne, M. W. & Cai, L. (2006) Testing differences between nested covariance structure models: power analysis and null hypotheses. Psychological Methods, 11 (1), 19 35. 32. Zauszniewski, J. A. (1995) Development and testing of a measure of depressive cognitions in older adults. Journal of Nursing Measurement, 3 (1), 31 41. 33. Sousa, V. D., Zanetti, M. L., Zauszniewski, J. A., Mendes, I. A. C. & Daguano, M. O. (2008) Psychometric properties of the Portuguese version of the depressive cognition scale in Brazilian adults with diabetes mellitus. Journal of Nursing Measurement, 16 (2), 125 135. 274 2010 Blackwell Publishing Ltd