welcome to my site teaching online
Teaching Online Contents
- 1 Steps of Construction of Criterion-Referenced Tests
- 1.1 I. Preliminary Considerations
- 1.2 II. Domain Specification and Review
- 1.3 III. Construction of Test Items and Review
- 1.4 Iv. Empirical Review
- 1.5 V. Test Assembly
- 1.6 a. Determination of test length
- 1.7 b. Preparation of parallel forms
- 1.8 c. Selection of test items
- 1.9 d. Preparation of test directions etc.
- 2 VI. Selection of Standard/Criterion Score
- 3 VII. Pilot Test Administration
- 4 VIII. Finding Validity and Reliability of the Test
- 5 IX. Preparation of Manuals
- 6 Domain Description for Criterion-Referenced Tests
Development and Construction of Criterion-Referenced Tests
Some educationists believe that the technique of construction of criterion referenced tests is similar to that of norm referenced tests and the two testing modes simply differ in their purpose and use. It is almost agreed that the purpose of norm referenced tests is selection and classification, whereas the purpose of criterion referenced tests is to identify the individual’s learning deficiencies m a given course.
In the construction of any tests for selection, the main purpose is to develop items which are capable of discriminating among learners. Here only those items are selected, which ensure some magnitude of variance in obtained sets of scores. The technology of norm referenced tests construction is based on the assumption that a moderate amount of score variance exists in the test scores. The construction of criterion-referenced tests has a altogether different focus. An adequate criterion referenced test is considered as one which identifies learning deficiencies of those who took a course at a moderately effective level. In an ideal situation a criterion-referenced test may not necessarily yield score-variance at all. Anyhow, variance may be present in some measure, but it is not a necessary attribute of a criterion reference tests. What is desired is (I)precisely defined behavior domain (II) the quality of test-items that pin pointedly indicate the status of an individual with respect to a well defined behavior domain. Like any measuring tool, the criterion referenced tests are constructed through a we Sequenced interconnected set of steps.
There are varieties of criterion-referenced tests and due to confusion in the nature and scope of such tests, it is quite difficult to provide clear cut guidelines for construction of such tests. However, keeping in view the basic tenets of criterion-referenced tests the following steps can be suggested.
Steps of Construction of Criterion-Referenced Tests
Steps of construction of criterion-referenced tests are as follows:
- Preliminary Considerations
- Domain Specification and Review
- Construction of Test Items and Review IV Empirical Review
- Test Assembly
- Selection of Standard/Criterion Score
- Pilot Test Administration
- Finding Validity and Reliability of the Test
- Preparation of Manuals
I. Preliminary Considerations
The first step in development of criterion-referenced test is the decision about the subject matter area to be worked out. Then next step is to select the unit on which test is to be developed. This unit may have more than one modules/domains/sections which comprise the total unit. Since domain refers to a particular segment of the content one may examine the topic and delineate it into various segments which can be developed into a well defined separate domains. Each domain can then be analyzed in terms of facts, concepts, principles, process etc. that may be arranged in order of their increasing complexity. Description of domain is very important as it provides the basis for item writing. The services of content and evaluation experts for review of domain specifications and for validation of items are necessary. It may be noted here that the content and evaluation experts preferably should be two in one.
II. Domain Specification and Review
Having decided about content elements of a domain selected the next task is to formulate the instructional objectives or expected learning outcomes which may be categorized in terms of knowledge, understanding, application, skills, attitudes etc. These objectives should be stated so precisely that the performance of students could be clearly interpretable in terms of adequacies or inadequacies in terms of intended learning outcomes. The special effort is needed to be made for clarifying the content into objective. Each objective description must clearly define the behavior domain.
After completion of writing objectives i.e. intended competency statements, review of competency statements is to be made. The review must be made for clarity and competencies that are one should check whether objectives are stated clearly in terms of performance and whether all important topics/units of content are taken. Statements should be written in simple, clear and precise language.
III. Construction of Test Items and Review
As far as the construction of items is concerned these items are to be developed in accordance with the domain description. A number of items might be constructed for any given objective (Domain), even a highly specific objective (well defined domain) could have a potential item pool of well over several thousand items. At this stage, of test construction, the writer is required to invent more and more imaginative ways of creating situations. Those are congruent with the specifications of domain definition. Different types of items such as essay types, short answer type, fill in the blank in sentences or in diagrams, alternatives response or true false, multiple choice in verbal and visual forms and lastly matching and sequencing type may be exploited for measuring different abilities as specified in domain definition.
The review by the teacher is essential to see whether all questions in the forms of the test are congruent with the specific objectives besides having cursory check for any glaring deficiencies in the test. Prior to the field trial the test may be re-examined by the practicing teachers. The main purpose is to detect some content flaws, if any and check for congruence of items with the domain description.
Messick (1975) has suggested not to use the word content validity but content relevance or content representative ness because validity is based/is calculated upon test scores. Hence the question is to decide whether items in a domain represent the knowledge, skill competencies specified in a particular domain or not. So the assessment of content relevance is of utmost importance in CRT. The content relevance is assessed through logical review by subject and evaluation experts and empirical review based on field test.
Iv. Empirical Review
The field testing of item-analysis has always been a source of ambivalence among advocates of CRT. Perfect congruence is expected between objectives and item. By adopting highly sophisticated technology of item generation, there is the least possibility of having items which function less than satisfactorily. Still however experience indicates that there will always be some subjective element in formation of objectives and therefore in production of items. Therefore, there is room for some kind of item analysis. Again revision of test items, if necessary, in view of item analysis should be carried out to remove any flow in any item.
V. Test Assembly
At the stage of test assembly decisions about following aspects should be taken:
- Determination of test-length
- Preparation of parallel forms
- Selection of test items from the pool
- Prepare test directions, practical questions, test booklet, lay-out, scoring keys, answer sheets etc.
a. Determination of test length
The determination of test length means determining the number of test items measuring each objective to include in a test. That is to decide how many test items to sample and how it is to be sampled. Eight items represent sufficient basis on which to assess student mastery to make instructional decisions from CRT data.
b. Preparation of parallel forms
The importance of having parallel forms of test in instructional program is in re-testing of students whenever necessary. It will be more appropriate to test them with parallel form than with the initial test.
c. Selection of test items
A parallel form is prepared by sampling items from the domain or sub-domain either through a random or stratified random sampling plan. Ina stratified random sampling plan, the items in the domain or sub-domain are divided initially into strata on difficulty levels content areas content and difficulty levels, instructional units and a specified proportion of items is sampled randomly from within each stratum.
d. Preparation of test directions etc.
The procedure for the preparations of directions etc. in this section is to be completed before the administration of the test. All these aspects should be carried out with utmost care and accuracy, so that the tasks to be performed at testing time are clear and well understood by testees. Sufficient care should be taken to ensure that copies of answer-sheets and question paper are legible and free from typographical errors. Sufficient number of practice questions should be put to each sub-domain test. Two parallel forms should be prepared at this stage. Scoring keys and key out of the answer sheets should also be prepared.
VI. Selection of Standard/Criterion Score
More than 20 different methods for setting performance standard (cut off scores) have been recommended in the literature (Berk 1984). The standard setting is necessary to assign examinees into mastery ‘non mastery’ status. That is to decide whether an examiner has enough knowledge, skills, competencies specified in domain to go ahead or not. Cut off scores can be estimated by Binomial Model and Bayesian Model.
VII. Pilot Test Administration
The test can be administered to test the domains which are covered by the test. The domains being tested can be arranged according to the needs of the teacher and administered one after the other in sequential order. Where time is stipulated to complete the task the test may be administered accordingly. Student’s responses may be recorded and tabulated in accordance with the scheme of analysis which has to be mostly in terms of specified domain objectives. Any deficiency found in the procedures of administration of the test after evaluating the procedure of this stage, changes should be made accordingly.
VIII. Finding Validity and Reliability of the Test
The important data for CRT is to find out validity and reliability of the test. Although validity is considered to be an important component of any test the criterion referenced test score validity has been paid little attention by researchers. The validation procedure in CRT is different from that of NRT. It has been argued that ‘content validity’ is a sufficient measure for CRT. Content coverage is an impo1iant consideration in the test construction and interpretation, to be sure, but in itself it does not provide validity (of the test). Item analysis (item validity) procedure can be used to know the validity of the test.
The reliability is concerned with the consistency of a test measurement over a time. The assessment procedure of consistency of decisions of CRT is totally different from NRT.
IX. Preparation of Manuals
Information material in the form of manual is important to aid qualified users to administer test and interpreted the results of the test. It must include following aspects:
(a) Test administrator’s manual and a technical data manual. The manual for test administrator’s should include specific administrative guidelines, specific expertise required or training required for administration of the test.
Domain Description for Criterion-Referenced Tests
Domain has a clearly defined segment of knowledge that lends itself to the attainment of pre defined expected outcomes of learning if taught properly.
Concept of Domain
Criterion-referenced test shows a great diversity in the use of the term domain’. Accordingly, to Dockrel (1975) ‘a behavioral domain is one which can be detected by an observer by Virtue of the pupils being able to do something new as a result of learning. Since performance criteria are inherent in the concept of criterion referenced measurement, specification and organization of relevant behaviors is an important task of test development. Hively (1974) refers to it as ‘domain referenced measurement or theory of performance’ and emphasizes careful definition of the domain of relevant behaviors associated with an area of knowledge which would later on be used for test items to that domain.
Thus according to Hively, the concept of domain includes both, the specific content area and the behaviors associated with this content. Cronbach (1977) refers to the ‘term’ universe specification with a focus on skills and lists the situations in which observations are to be made. Observations :provide valid sample of this universe. Ebel ( 1962) uses the term ‘standard domain of content’ with a focus on content that provides the basis for test construction. Baker (1981) uses ‘behavioral classes’, universe of content and prescribed instructional outcomes. Nitka (1980) uses the termin0logy of ‘standard domain of content’ domain or universe specification in relation to be behavioral objectives.Popham (1975) prefers to call Domain referenced testing in relation to content general objectives and simplified objectives.
Once we have appreciated the need to delineate the content and behaviours in the form of desired or acceptable criteria of performance with reference to a domain or segment of knowledge our next concern is to delineate the domains in a particular topic or unit of teaching. How many domains could be identified from a particular unit or how big or small a domain would be depends on a number of factors like the following:
- a) Total time allocation to the unit in the instructional plan.
- b) How much time is available or earn marked for testing the students on that unit?
- c) How homogenous or heterogeneous are the chunks of content that can be divided into convenient integrated sub- units or sections which could become the basis for listing?
- d) How many concepts or other content elements are spread over in a given unit oflearning? The higher the density of the new elements per unit of teaching the more could be the number of domains in that unit.
- e) To what extent the content elements in a unit are amenable to develop intended learning outcomes? The higher level content elements like principles and concepts are likely to generate more intended outcomes that content elements like terms and facts.
- f) To what extent the developer is prepared to sacrifice ‘sufficiency’, aspect at the altar of brevity?
We may like to use concepts as the most usable form of content element as one of the descriptors. Concepts may be identified in the domain and arranged in order of hierarchy, complexity, development or sequential order suitable for instructional purpose. This may be followed by the corresponding learning outcomes intended as a product of learning of those concepts. Depending upon the nature and scope of a concept, the list of intended learning outcomes in relation to knowledge, understanding and application objectives arranged in taxonomic order.
An attempt is made to define, identify and delineate the domains in a given unit of learning. Domain description involves the identification of the content elements and their sequencing, besides formulating the specific intended learning outcomes corresponding to each of the concepts or the contents elements listed. Once a domain is properly defined it becomes basis for test specification in developing criterion referenced tests. Since the criterion referenced tests are to be used for diagnosing adequacies and inadequacies in students learning their description by the classroom teachers is warranted. Internal review by the developers followed by external review by the other teachers on the basis of consensus is the usual approach for validating a domain. It becomes for the test developers to formulate test specifications for validating the domain and developing criterion referenced tests to find out inadequacies and inadequacies in students learning.