Write an essay on the process you would follow in developing a psychological assessment measure. Discuss the steps that you would take in this process, including how you would choose items for your test, how you would evaluate the reliability and validity of your test, and the issue of establishing norms. Discuss the theory comprehensively and illustrate your understanding with an example or examples. Introduction
The process of developing a psychological test is a complex and lengthy one. ( Foxcroft & Roodt, 2001) but aspects related to the planning of a psychological test are not always sufficiently emphasised and sometimes not mentioned at all ( Kaplan & Saccuzzo, 1997). When the test is to be used in a multicultural context, attention needs to be paid to the cultural relevance (and potential bias) of the test right from the planning and design phase instead of only being sensitive to cultural aspects from the item writing phase onwards.
Also given that we do not have a long history of developing culturally appropriate tests applicable to diverse groups in South Africa, test developers need to grapple with basic issues such as what methods of test administration might be appropriate or inappropriate for certain cultural groups and what language to develop the test in, for example. More time needs to be spent in the planning phase exploring and critically considering test design issues.
The first and most important step in developing psychological measures is the planning phase. Planning involves writing out the skeleton of what one aims to achieve. Careful though needs to go into deciding on the aim of the measure, defining the content of the measure and key elements of the test plan. a test plan consists of the following aspects: (a) specifying the purpose and rationale for the test as well as the intended target population, (b) defining the construct (content domain) and creating a set of test specifications to guide item writing, (c) choosing the test format, (d) choosing the item format, and (e) specifying the administration and scoring methods (Robertson, 1990).
Specifying the aim of the measure
The first step is to state the aim of the measure, the construct I will use and how the outcome will be used. If I am conducting this study in South Africa I will also need to mention that the measure will be used in a multicultural society. I would need to elaborate on what I mean by multicultural by highlighting the context. I would state the age of the test takers and their educational status. The information concluded above is important because it may have an impact on the test specifications and design. I would need to state whether the test would be paper-based or computer-based.
When that decision is made I would need to consider whether the test-takers are familiar with such tests. The test takers may underperform on the evaluation because they are not proficient in the instrument of measure. This may impact the validity of the study to be conducted. I would also need to ascertain whether the test will be administered individually or in a group setting.
Because psychological constructs are brewed in western societies, the emphasis is on individualism. When working in a multicultural society, however, it is important to consider the norms of the society I would be working in. In some cultures, for example, the group identity is valued over the individual identity. This could have an effect on the content of the measure. Defining the content of the measure
Here I need to figure out what I want to measure and why. This will show me what to focus on during the other steps. A working definition of the construct is needed. This includes identifying exactly what I aim to get out of this research study. To do this I need to embark on a comprehensive literature review. I will see how my topic has been investigated in the past and spot the gaps. I can now make the decision on whether I am conducting a new study or adapting an existing study into the South African context.
Later I will need to make the same decision on the instrument I will use for data gathering. Since I would be working in South Africa, I need to decide on whether separate norms should be developed for test takers from advantaged and disadvantaged schooling backgrounds and/or for urban and rural areas. I would assemble a team of content, language and cultural experts to scrutinise the content being developed. Nell (1994) states that language is a critical moderator variable of test performance.
If the test taker is not proficient in that language, it is difficult to ascertain whether poor performance is due to language or communication difficulty or that the test-taker has a low level of the construct being measured. I would produce the test in a bilingual format and specify the source language. Work would need to be done to ensure that the construct is meaningful for each group. Developing the test plan (specifications)
Once the construct to be assessed has been defined and operationalised, a decision needs to be reached regarding what approach will be employed to guide the development of the test content and specifications. Decisions will be made regarding the format to be used (open-ended items, forced-choice items etc.), how they will be scored (objective or subjective tests), and whether time limits will be imposed. The language and cultural experts are once again needed during this step.
Sometimes psychological constructs, conceptualised in western society, do not have a known equivalent in African discourse. For such constructs the translated version would need to explain the construct in a way that is closest to the English meaning. This will require more time for the African language test taker. The test specification should eliminate the possibility of construct bias. The format therefore needs to be standardised for a variety of cultural groups or it should at least include items that will be considered easy, moderate and difficult by all groups.
Although these steps follow after each other, I will need to go backwards and forwards to ensure content and construct validity.
The second step is item writing. Once the test specifications have been finalised, the team of experts writes or develops the items. The trend in South Africa has been to simply adapt an already made test to accommodate South African test takers. This is not necessarily the easier option. Firstly, concepts are not always understood in the same way in different societies. For example, the term depression is sometimes taken to mean with very sad in some societies. It is therefore important to ensure construct validity even for an English test given to English mother tongue speakers of a different society to that of the tests’ origin.
If the assessment measure will be administered to children, face validity will be ensured through the use of big writing, use of colour and drawings. The length of the items should also be considered. With every step of items writing reliability is ensured.
Reviewing the items
An item bank is then developed and items reviewed in terms of whether they meet the content specification and whether they are well written. Items which do not meet the specifications are removed from the bank before it can be used to generate criteria-referenced tests. The team of experts should focus on both content validity and indicate whether the items are from stereotyping and potential bias. The experts will then return the item list with recommendations. They will need to be re-written or revised.
Assembling and pre-testing the experimental version of the measure
Items need to be arranged in a logical way. Since we are dealing with a multicultural society, we need to ensure that the items are balanced and on appropriate pages. The length of the items in each category needs to be finalised. For long problems based items, time adjustments need to be made. A decision would have been made with regards to whether the test is paper-based or computer-based. The appropriate apparatus needs to be made available. The Pre-testing the experimental version of the measure
The test items have to be administered to a large group of examinees. This sample should be representative of the population for which the eventual test is intended. This will be the norm group.
Items analysis phase
During this phase items are checked for relevance. Again we see if each item is reliable and valid to the study. The characteristics of the items can be evaluated using the classical test theory or the item response theory. At the item level, the CTT model is relatively simple. CTT does not invoke a complex theoretical model to relate an examinee’s ability to success on a particular item. Instead, CTT collectively considers a pool of examinees and empirically examines their success rate on an item (assuming it is dichotomously scored).
This success rate of a particular pool of examinees on an item, well known as the p value of the item, is used as the index for the item difficulty (actually, it is an inverse indicator of item difficulty, with higher value indicating an easier item). The ability of an item to discriminate between higher ability examinees and lower ability examinees is known as item discrimination, which is often expressed statistically as the Pearson product-moment correlation coefficient between the scores on the item (e.g., 0 and 1 on an item scored right-wrong) and the scores on the total test. When an item is dichotomously scored, this estimate is often computed as a point-biserial correlation coefficient.
IRT, on the other hand, is more theory grounded and models the probabilistic distribution of examinees’ success at the item level. As its name indicates, IRT primarily focuses on the item-level information in contrast to the CTT’s primary focus on test-level information. The IRT framework encompasses a group of models, and the applicability of each model in a particular situation depends on the nature of the test items and the viability of different theoretical assumptions about the test items.
Revising and standardizing the final version of the measure
Once the qualitative and quantitative information has been gathered, the test is administered to the large sample for standardization. All the items that were found to be unclear are simplified. Vocabulary and grammar is corrected. Split-half reliability is assessed. The translated version is checked through back translation (into the source language). The items are finalised for the test. The final database is used to check on reliability and validity. The administration and scoring instruction may need to be modified. Then the final version is administered.
Technical evaluation and establishing norms
The items can be analysed using the item response theory. The characteristics of each item may be represented graphically be means of a graph which relates an individuals’ ability score with their probability of passing the items. Items with large variances are selected. The scores obtained by the norm group in the final test form are referred to as the norms of the test. To compare an individual’s score with the norms, their raw score will be converted to the same kind of derived score as that in which the test norms are reported (e.g. percentile ranks, McCall’s T scores etc). Publishing and ongoing refinements
A test manual is compiled before a measure published. The manual should make information on the psychometric properties of the test easily understandable. It will be updated from time to time as more information becomes available.
List the steps that should be followed in the adaption of an assessment measure for cross-cultural application and briefly explain what each step means. 1. Reasons for adapting measures Cross-cultural assessment has become a sensitive issue due to specific concerns regarding the use of standardized tests across cultures.
By adapting an instrument, the researcher is able to compare the already-existing data with newly acquired data, thus allowing for cross-cultural studies both on the national and international level. Adaptations also can conserve time and expenses (Hambleton, 1993). Test adaptation can lead to increased fairness in assessment by allowing individuals to be assessed in the language of their choice (Hambleton & Kanjee, 1995).
2. Important considerations when adapting measures
The test can be compromised if there are problems between the test takers and the administrator. The administrator should therefore familiar with the culture of the test-taker. They cannot take it for granted that the test taker will be exposed to the format of the test. This could lead to the score representing a lack of skill with regards to the format of the test instead of measuring the construct being assessed. Some languages, like isiZulu, require more time to be spent reading therefore would require more time to complete.
3. Designs for adapting measures
Before selecting an assessment instrument for use in counseling or research, counselors and researchers are trained to verify that the test is appropriate for use with their population. This includes investigation of validity, reliability, and appropriate norm groups to which the population is to be compared. Validity and reliability take on additional dimensions in cross-cultural testing as does the question of the appropriate norm group. The instrument must be validly adapted, the test items must have conceptual and linguistic equivalence, and the test and the test items must be bias free (Fouad, 1993; Geisinger, 1994).
Two basic methods for test adaptation have been identified: forward translation and back-translation. In forward translation, the original test in the source language is translated into the target language and then bilinguals are asked to compare the original version with the adapted version (Hambleton, 1993; 1994). In back-translation, the test is translated into the target language and then it is re-translated back to the source language. This process can be repeated several times. Once the process is complete, the final back-translated version is compared to the original version (Hambleton, 1994). Each of these adaptation processes has their strengths and limitations.
4. Bias analysis and differential item functioning
Another issue that must be considered in cross-cultural assessment is test bias. The test user must ascertain that the test and the test items do not systematically discriminate against one cultural group or another. Test bias may occur when the contents of the test are more familiar to one group than to another or when the tests have differential predictive validity across groups (Fouad, 1994). Culture plays a significant role in cross-cultural assessment.
Whenever tests developed in one culture are used with another culture there is the potential for misinterpretation and stagnation unless cultural issues are considered. Issues of test adaptation, test equivalence and test bias must be considered in order to fully utilize the benefit of cross-cultural assessment.
5. Steps for maximizing success in test adaption
Hembleton (2004) summarised nine key steps that should be addressed when adapting or translating assessment instruments.
6. Challenges related to test adaption in south Africa
A disadvantage of adaptation includes the risk of imposing conclusions based on concepts that exist in one culture but may not exist in the other. There are no guarantees that the concept in the source culture exists in the target culture (Lonner & Berry, 1986).
Another disadvantage of adapting existing tests for use in another culture is that if certain constructs measured in the original version are not found in the target population, or if the construct is manifested in a different manner, the resulting scores can prove to be misleading (Hambleton, 1994). Despite the difficulties associated with using adapted instruments, this practice is important because it allows for greater generalizability and allows for investigation of differences among a growing diverse population. Once the test has been adapted, test equivalence must be determined.
Foxcroft, C.D. & Roodt, G. (2009). An introduction to psychological assessment in South Africa. Johannesburg: Oxford University Press
Hambleton, R. K. (2001). The next generation of the ITC Test Translation and Adaptation Guidelines. European Journal of Psychological Assessment, 17, 164-172.
Hambleton, R. K. (2004). Issues, designs, and technical guidelines for adapting tests into multiple languages and cultures. In R. K. Hambleton, P. F. Merenda, & C. D. Spielberger (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 3-38). Mahwah,NJ: Lawrence Erlbaum Associates
Van Ede, D.M. (1996). How to adapt a measuring instrument for use with various cultural groups: a practical step-by-step introduction. South African Journal of Higher Education, 10, 153-160.