Methods of Procedure of Criterion Referenced Tests : (Part-1)

criterion referenced test, norm referenced test, criterion referenced, evaluation criteria, criteria, define criteria, criterion definition, reference, what is criteria, what does criteria mean, criteria examples, criteria meaning, reference meaning, reference definition, teaching online, Study of the Syllabus of Home Science, Domain Specification/Blue Print, Writing multiple choice items , Socio-Economic Status Scale, Methods of Procedure of Criterion Referenced Tests,

Methods of Procedure of Criterion Referenced Tests

welcome to my website


The steps followed under the present investigation have been discussed in this chapter. The present investigation is the identification of terminal competencies in home science at class XII and evaluation of achievement of these, by students of Gorakhpur district (U.P.) in relation to several background factors.


Study of the Syllabus of Home Science


     There were many books of Home Science for intermediate level available to the investigator. Books were written by different authors. Investigator studied all the available books of home science for intermediate level and selected the book that was properly written and includes all the syllabus of home  science prescribed  by the U.P. Board of high school and intermediate education. Syllabus of home science for intermediate level was divided into two papers. First includes the contents of human physiology and hygiene and second paper includes the contents of child welfare and sociology.


Preliminary Considerations


Purpose of the test


    Basic purpose of the test was the identification of terminal competencies in home science at intermediate level and evaluation of achievement of these by students of Gorakhpur.


Population of the study


    The population of the study consisted of all the students of Gorakhpur district studying the home science course prescribed by the U.P. Board of high school and intermediate education at intermediate level.


Specification of initial test length


The test length was decided from following angles:

    Firstly, the teachers of intermediate classes were consulted for fixing the time duration for test. The literature suggests that the longer is a test, the more reliable and valid it is likely to be. It was therefore decided to have the tests of nearly the same duration of time which the tests of terminal examinations have. The test time was fixed as three hours and one hour time was enough for socio economic status scale. Secondly, the test length was such that at least minimum acceptable accuracy could be obtained in classifying students in the categories of master and non master. Length of test was such that it covered the entire syllabus. Guilbert (1976) holds that such a decision cannot be accurately made unless the number of multiple choice questions be more than 30. The binomial model of score interpretations suggests the same conclusion . In view of these, the test length in terms of number of test items was about 100 items in first paper of home science and 100 items in second paper of home science.


Domain Specification/Blue Print


    Several terms have been used to describe the same concept. Domain specification and item population are such terms. Another term which connotes virtually the same meaning is table of specifications or blue print, though in the opinion of several experts on CRT a blueprint is not the same thing as item population. Blue print is a less specified description of item population. Hambleton (1990,P,115) observes; “Criterion referenced test designers must (typically) prepare considerably more detailed content specifications then provided by behavioral objectives to ensure that criterion­ referenced test scores can be interpreted in the intended way”.


criterion referenced test, norm referenced test, criterion referenced, evaluation criteria, criteria, define criteria, criterion definition, reference, what is criteria, what does criteria mean, criteria examples, criteria meaning, reference meaning, reference definition, teaching online, Study of the Syllabus of Home Science, Domain Specification/Blue Print, Writing multiple choice items , Socio-Economic Status Scale, Methods of Procedure of Criterion Referenced Tests,


    Millman (1971) observes” Traditional item-writing approaches use test plans that often list subject matter topics as row headings and constructs like knowledge and application as column headings. The cells are filled with numbers that designate how many items will measure a certain cognitive ski 11 employing the designed subject matter topic,” Millman (1971) further is of the view that “two item writers given  the same test plan and working independently would produce tests likely to correlate only moderately with each other and to have different difficulty levels.” It is clear that the difference between domain specification and blue print is of quality (i.e., of specificity). The domain specification is expected to be more specific than the blue print. Developing  a criterion referenced test is the preparation of a domain description that is, a clear specification of what the test will measure and guide to writing the test items.


    Berk (1990) provides the following strategies for mastery testing instructional and behavioral objectives, in the context of domain referenced testing amplified objectives.


     Guilbert (1976),Harper (1990) describes the conditions a good statement of objective should satisfy. A good statement of objective should contain a description of performer, activity or action required, task or content, conditions under which the act should take place, and the criteria for judgment, (i.e. the minimum level of performance of the student). Popham (1990) has used a term “amplified objectives” which conform to rigorous specifications for development of objectives. Domain specification and objectives (competency statements) were prepared given in appendix C and blue prints are given in appendix D.E.F.G.


Review of objectives


     After the competency statements or objectives were finalized these competency statements were reviewed by experts of home science and in the area of educational measurement. Modifications in the objectives were made in light of the suggestions given by the experts.


Determination of item Format


     Investigator decided to use all the items of multiple choice type for the purpose. The multiple choices item is the most flexible of the objective item types. It can be used  to appraise the achievement of the educational objectives that can be measured by paper and pencil test” (Thorndike, (1961,P.228) “Research has shown that almost all important objectives can be measured by multiple- choice questions in a better way. Multiple choice items has come in use in the country in most competitive examinations and therefore even class XII students have some awareness of these items. Multiple choice items seem to both instructors and students to be less susceptible to chance errors resulting from – guessing than true false items “(Ebel, 1966, P.150). Multiple choice test items can serve a useful purpose in the measurement of educational objectives. Its advantage seems so apparent that it has become the type most widely used in tests constructed by specialists. The important educational outcomes: knowledge, comprehension, evaluation and synthesis etc. can be tested by multiple choice test items. It can be easily, and with greater speed and accuracy scored of course according to Ebel 1972 a well prepared true false test could give a better reliability of results in comparison to a multiple choice test of the same duration of time. But the true false items were not used in the present investigation because several other experts do not seem to share this view, and the chance of getting a high score (by guessing alone) in true-false type of test is higher in comparison to multiple choice tests.

Writing multiple choice items


    The number of options used in the multiple-choice question differs on different tests and there is no real reason why it cannot vary for items in the same test. An item must have at least 3 answer choices to be classified as multiple-choice item and the typical pattern is to have 4 answer choices to reduce the probability of guessing the answer.


     Writing multiple-choice items for the investigation was done by the present investigator. The guidelines suggested by (Thorndike 1961) were followed. Attempt was made to ensure that all the distracters appear plausible answer to the student who does not know the answer, and there is only one correct answer to the item. The number of items written was kept to be much more than the number to be actually used in the test to forestall the shortage of item after rejection of some of them at various levels of review. The present investigator however decided to follow the tradition setby prestigious examining bodies that all used four options. Hundred and fifteen items were written in first paper and 115 items in paper­ II of home science at intermediate level.


Item Editing


     After the multiple choice items were written for evaluating the achievement of terminal competencies. These multiple choice items were reviewed by different people. It was reviewed by language expert to check if there is any difficulty or mistake in language of items and it was also checked by subject matter specialist. The selected items were set aside for evaluation purpose.


Assessment of Content Validity


     It should be clear that content validity is important primarily for measurement of achievement. In particular, validity of a formative test concerned with mastery of one or more specific educational objectives. Buck (1975) says “Determining the validity of a criterion referenced test is more straight forward than determining the reliability, criterion-referenced measures are validated primarily in terms of the adequacy  with which they represent the criterion. Content validity approaches are most relevant and suited to criterion­ referenced measures. According to Nunnally (1967) a necessary condition for the content validity of a test is that the “test must stand by itself as an adequate measure of what it is supposed to measure, validity cannot be determined by correlating the test with a criterion, because the test itself is the criterion of performance.” A group of teachers were consulted, teaching online home science  at intermediate level in different intermediate colleges. Each item of both the test papers was examined to find out if it measures the same competency it was supposed to measure whether the language and the content of the item were commensurate with class standard and the test books prescribed, whether they are not stereotype, whether they were bias free, whether they met the technical specifications of the subject matter. The items which were found to lack content validity were rejected. The criterion referenced tests; however, it will be helpful to think of content validity as an attempt to establish how well the test accurately describes the behavioral domain being measured.


Try out


     Items were administered to a few individual students of home science at intermediate level without giving them an idea of the possible use of the items, to see how the items worked. Items were not tried out in the manner as in usually done in the case of standardized test development, the reason being the fear of leakage of item to students, and thus introducing another uncontrollable source of error in test scores. Student’s difficulty was noted and improvements if any required on the basis of this knowledge made.


Test assembly


     Singh (1983) suggests preparing two parallel forms A and B, so that one of the two could be used after post instructional remedial measures, but we did not prepare two parallel forms because no institutions, will provide that much time to the investigator for test administration. The directions for the tests were the same as were given in the guidelines prepared for the students and already distributed to them. Since the tests for class XII were supposed to measure the entire syllabus of home science at intermediate level and the number of items included in both the papers of home science adequate for measuring content.


Selection of criterion score


    There are different method for determining criterion scores and prescribing standard for differentiating masters from non­ masters. These standards vary from seventy percent to ninety percent score. In Indian conditions where a pass is thirty three percent score it would be unrealistic to set a standard above seventy percent score. Therefore the investigator fixed the standard at seventy percent items correct for mastery level for both the test papers of home science.


Socio-Economic Status Scale


    Kuppuswami”s (1962) and Jalolta”s (1970) socio-economic status scale were out dated. The income has increased and several consumers’ goods which were not mentioned in those socio­ economic status scale. Few years back television sets, refrigerator and other house hold gadgets were used only in the well off families but now the use of television, refrigerators etc has become common. Keeping the above points in view it was thought to use fresh socio­ economic status scale prepared by Dr. B. B. Pandey, Reader and Rajesh Singh Department of Education, University of Gorakhpur.


    The Socio-economic status scale prepared by Dr. B.B.Pandey consists of questions on economic, educational, cultural and social background of the student.


Methods of Procedure of Criterion Referenced Tests (Part-2):- Click Here

About the Author

I am blogger and doing internet marketing since last 3 year. I am admin at and many more site. Very sincere thanks for your interest in, we take our visitors' comments on utmost priority. You will surely get more solved examples very shortly. Kindly let me know any other requirement. .... Please keep visiting. Thanks a lot.