welcome to teaching online
Teaching Online Contents
- 1 Terms Related to Criterion-Referenced Tests
- 1.1 Domain-Referenced Tests
- 1.2 Content Standard Tests
- 1.3 Universe-Defined Test
- 1.4 Differential Assessment Device (DAD)
- 1.5 Criterion Referenced Differential Assessment Devices (CRDAD)
- 1.6 Objective Based Test
- 1.7 Comparison with DRT
- 1.8 Mastery Tests
- 1.9 Unit Test
- 1.10 Objectives-Based Approach to Outcome Accountability
- 1.11 Performance Tests
- 1.12 Nature and Meaning of Criterion-Referenced Tests Part-1:- Click Here
- 1.13 Nature and Meaning of Criterion-Referenced Tests Part-3:- Click Here
Terms Related to Criterion-Referenced Tests
In a criterion-referenced test the scores have a reference to a criterion and in a norm-referenced test to the norm. Similarly in domain-referenced test, scores have a reference to a domain. Here the term “domain” means “a segment of knowledge and behavioral domain is one which can be detected by an observer by virtue of the pupils being able to do something new as a result of learning”. (Singh, P. 1983)
Hively’s ( 1968) system generally referred to as domain referenced tests. Hively and his colleagues were wrestling with the problem of how best to define curricular content in science and mathematics for purposes of instructional design. The scheme they came up with, referred to as an item form, and consisted of detailed set of specializations, which limited the form of items that measured a particular skill. Inlater years Hively etal (1970) described their approach to measurement as domain-referenced testing because an examinee’s performance was referred to a defined domain of learner behavior. He tried to bridge gap between the statement of behavioral objectives (the criterion) and the CRTs constructed to measure the achievement of criterion by specifying the domains of behavior-concept of domain includes both (a) specific content area as well as (b) behaviors associated with this content.
This type of measurement is most suitable when the area to be measured is a domain that can be clearly defined, the number of possible elements in it is within finite bounds and a sampling frame listing all the elements of the domain exists or can be readily constructed, so that a probability sample of elements to be tested can be drawn from it. In a domain-referenced test the overall score has absolute meaning (criterion-referenced meaning) in the sense of indicating what proportion of some defined domain the examinee has mastered, “Arandom or stratified random sample of items from a domain is called domain-referenced test (DRT). It permits the most satisfactory criterion referenced interpretations because test scores can be interpreted most directly in terms of performance task” Millman (1975) as quoted in Singh (1983) DRT’s so framed permit the most satisfactory criterion referenced interpretations because test scores can be interpreted most directly in terms of performance tasks and also the percent of the population of tasks the student would answer correctly.
The advantage of the domain-referenced test is that it permits the user to make a special kind of criterion reference interpretation , namely, as estimation of an examinee’s domain score or level of functioning, defined as the percent of the population of items the examinee could answer correctly. A disadvantage of DRT in areas where the emphasis is on knowledge and understanding, the effective use of such measures seems less likely.
A larger domain size can be tolerated in tests used for making major decisions or in gross score reporting. For remedial instruction in which information about specific skill is needed smaller domain sizes are desired. Popham identifies three practical considerations indetermini.ng domain size:
- First he suggests the test maker estimate the amount of instructional time it would take to acquire the skills in the domain. Domains that can be mastered in one instructional session are too small; those requiring an entire semester are too large.
- Second he suggests that the test user estimate domain size by dividing the total number of high-priority skills and behavior the test maker wishes to measure, by the number of tests he is willing to give.
- A third consideration is to make the domain as large as one can and still maintain content homogeneity.
Content Standard Tests
Ebel (1962) suggested the term content-standard for tests having scores that indicate “the percent of a systematic sample from a defined domain of tasks which an individual has performed successfully” Ebel (1962) as quoted in Singh(1983). The score is based directly on the tasks which make up or provide the context of the test. A key feature of the definition is that the processes by which the scores are obtained test construction, administration, and scoring are explicit and objective enough so that independent investigators would obtain substantially the same scores for the same persons.
Osburn (1968) used the term universe-defined to describe “a test constructed and administered in such a way that an examinee’s score on the test provides an unbiased estimate of his score on some explicitly defined universe of item content” (Osburn, 1968, as quoted in Singh, P.1983). Like the case with content-standard test, the criteria by which items are or are not, included in the domain are made explicit, and those items that appear on the test are selected in some systematic way from the universe. The difference between content-standard tests and universe defined test is primarily in Osburn’s use of formalized schemes originated by Hively, called item forms, to help generate, the universe of test items and to describe the items salient characteristics.
Differential Assessment Device (DAD)
Differential assessment device differ in the procedure of test construction and in completeness in specification of population or domain. Some test heavily rely on procedures for item selection and test validation as the test is designed to differentiate groups of individuals believed to differ on the attribute purportedly measured by the test. Such instruments are referred to as differential assessment devices.
Criterion Referenced Differential Assessment Devices (CRDAD)
Criterion-Referenced Differential Assessment Device refers to those differential assessment device which do reference a particular objective or skill with sufficient specification that a Criterion referenced interpretation aweless norm-referenced one is reasonable “(Millman 1975, as quoted in Singh,P.1983)” Two broad differences, then, of a domain-referenced test (DRT) and a differential assessment device capable of a criterion referenced interpretation are the specificity with which the item population is stated, and, thus the quality and nature of the criterion referenced interpretation and the reliance on empirical method of test development, and, thus the discrimination power of test results. There is a trade-off between the ability to interpret and the ability to discriminate DRTs maximize the former, DADs the latter, differential assessment device which permit a criterion-referenced interpretation. CRDADs constitute the middle ground.
Objective Based Test
Objective-based tests are those whose items have been constructed to measure an instructional objective. Usually such objectives are formulated behaviorally, that is, they describe the type of post instruction behavior being sought of learners. Quits often the objectives in which such tests are based possess qualities similar to the behavioral objectives. A characteristic of such objective-based tests is that the nature of the final test depends largely on the idiosyncrasies of the item writers. There is no reason to believe for example, that two sets of item writers given the same behaviorally stated objective would construct tests that either looked alike or provided comparable scores, if the same examinees took both. There are more ways to match an objective than are presented in the test. Too much slippage is possible in the item format used, choice of options, and selection of stimulus material, resulting level of difficulty and other areas for such congruence to occur.
“It is probably wiser to view objectives-based tests in the same way as we sometimes think of teacher-made class room tests. All such measures device can, if well constructed and sensibly interpreted, serve highly useful educational functions. But to consider them as genuine exemplary criterion-referenced measurement reflects more generosity than accuracy”. (Popham, 1990)
Comparison with DRT
Objectives-based tests usually yield just a dichotomous score that dictates whether the examinee has reached the designated standard of performance corresponding to the specified objective. Domain-referenced tests is designed to yield a continuous score scale in which a maximum score represents hundred percent and a minimum score indicates total absence of mastery of any part of that domain. The main difference between the CRT and the domain referenced test (DRT) is that in the (DRT) no clear domain of content is specified for an objective and items are not considered to be representative of any content domain while in the CRT items are organized into clusters. Each cluster containing a representative set of items from a well defined content domain measuring an objective.
“Mastery tests have been defined as criterion-referenced tests that are administered at the end of an educational treatment to determine whether student can perform all the tasks specified in the objective of the programme” (Singh,P,1983). Mastery concept is closest to CRT. Several theories of mastery learning are proposed; the basic premise uniting these approaches as under:
“If students are normally distributed with respect to aptitude, but the kind and quality of instruction and the amount of time available for learning are made appropriate to the characteristics and need of each student, the majority of students may be expected
to achieve mastery of the subject.” Although mastery learning concept has appealed a lot to the educational workers. Sometimes the word criterion and mastery are confused in usage; criterion represents the desired level of achievement or performance usually specified as tasks defined in terms of terminal behaviors. It refers to a particular standard of performance representing the intended outcome of learning. Mastery refers to the level of individual’s performance on those specified tasks, considered satisfactory as a cut-off score to declare one as master or non-master.
The impact of mastery learning has not been, nor is it claimed to be, in stimulating a new kind of test development. Rather, it is in encouraging educators to set their goal which might be surpassed by practically all student of performance standards set prior to instruction, and in offering advice on how such a goal can be realized.
Unit tests are used for classification and identification of non masters for remedial instruction. Unit tests are used after a unit of instruction has been covered in the class hence called unit test. These tests can be CRT,DRT,DAD,CRDAD, objectives based or mastery tests. The purpose of the unit test is to find out the extent of pupil achievement in respect of that unit for providing remedial instruction to the students. Since the primary aim of a unit test in formative evaluation, which is also the primary aim of various types of tests discussed under the broad category of CRT, a mention of the unit test has been made here. An improved variety of this test is “Unit mastery learning of system”. This system is a method of individualized, self-paced learning, through repeatable testing, and enables students to attain mastery of the content of one unit before proceeding to the next.
Objectives-Based Approach to Outcome Accountability
This approach is called objectives-based evaluation. It usually takes the form of selecting a set of important objectives, constructing short tests to measure each of these objectives, and then objectives are intended. The performance of teachers who are operating under the same conditions can then be compared. In this method evaluation of teachers for outcomes accountability involves the use of a set of tests to assess pupil performance on the particular objectives with which the school is most concerned.
This approach is designed to be more efficient than a total objective based system and involves selecting a few relevant objectives and constructing tests to measure student achievement of them. This approach has been suggested for establishing a fair and objective basis for a teacher evaluation system is called performance tests. The objectives chosen for this purpose should deal with a relatively small but important unit of the curriculum in which the students have had no previous instruction. The next step is to arrive students to teachers randomly or by means of fair matching techniques so that student characteristics and other factors beyond the teacher’s control are counter balanced among the teacher’s who are to be evaluated. The teachers are then given a fixed amount of time to teach these objectives and at the end of that period, student performance is assessed. A teacher’s students do on a series of performance tests is presumed to correlate fairly well with how that teacher’s students do on tests to measure end of year kinds of objectives. If the results were satisfactory, then the teacher performance tests could be used as reasonable proxy for end of year out come measures.
Glass (1978) discussing modem educational movement and the thinking of educationists observes that the terms such as accountability education, mastery learning assessment, have to come to rest at bottom, on a common nation that a minimal acceptable level of performance on a task can be specified. Whether, it goes · by the name of mastery, competency, or proficiency, it is the same fundamental nation (Glass, 1978 as quoted in Dubey, S. 1991). The term criterion-referenced measurement, objective-referenced measurement, and domain- referenced measurement have been used more or less interchangeably by some writers and still others have used only one of the three terms to cover all the concepts with which they deal that are related to criterion-referenced measurement.