وزن‌دهی بهینه به سؤال‌ها و خرده آزمون‌های ورودی برای ساخت نمره کل ترکیبی

ذوالفقارنسب, سلیمان; خدایی, ابراهیم; یادگارزاده, غلامرضا

وزن‌دهی بهینه به سؤال‌ها و خرده آزمون‌های ورودی برای ساخت نمره کل ترکیبی

نوع مقاله : مقاله پژوهشی

نویسندگان

سلیمان ذوالفقارنسب ¹

ابراهیم خدایی ²

غلامرضا یادگارزاده ³

¹ کارشناس ارشد پژوهشی مرکز تحقیقات، ارزشیابی، اعنبارسنجی و تضمین کیفیت آموزش عالی سازمان سنجش آموزش کشور

² دانشیار دانشگاه تهران

³ عضو هیأت علمی سازمان سنجش آموزش کشور

چکیده

این تحقیق به منظور وزن‌دهی بهینه به خرده آزمون‌ها و سؤال‌های آزمون سراسری برای ساخت نمره کل ترکیبی انجام شده است. هدف نهایی این تحقیق پایین آوردن خطای اندازه‌گیری نمره کل ترکیبی بر اساس نظریه کلاسیک آزمون‌سازی بود. وزن‌دهی در سه سطح صورت گرفته است نخست آزمون سی سؤالی چهارگزینه‌ای حساب دیفرانسیل که نمونه آن 3409 نفر بود بر اساس وزن‌دهی در سطح گزینه‌های سؤال (درصد محبوبیت گزینه‌ها، نمره فرمولی) و در سطح سؤال (سرجمع ساده بدون وزن یا وزن مؤثر سؤال، وزن عاملی سؤال و وزن دشواری سؤال) وزن‌دهی شده‌اند. همچنین در سطح خرده آزمون یک مجموعه آزمون سراسری دستیاری پزشکی با پنج خرده آزمون با طول برابر شش سؤال که نمونه آن 3572 نفر بود نیز به روش‌های مختلف (متوسط ضریب همبستگی پیرسون، وزن عاملی و ضرایب رگرسیون) وزن‌دهی شده‌اند. به علاوه یک مجموعه آزمون دستیاری پزشکی دیگر با طول خرده آزمون‌های نابرابر به ترتیب 45، 26، 24، 6 و 6 سؤال که در بین گروه 3638 نفری اجرا شده بر اساس وزن مؤثر خود خرده آزمون‌ها (بدون وزن) مورد بررسی قرار گرفته است. این تحقیق نشان داد که روش نمره فرمولی بیشترین واریانس خطا را نسبت به دیگر روش‌ها تولید می‌کند. تنها وزن‌دهی بر اساس دشواری سؤال می‌تواند رتبه‌بندی افراد را به نفع افراد شایسته‌تر تغییر دهد و دیگر روش‌های وزن‌دهی برای افزایش پایایی رضایت بخش نیستند و ضریب پایایی آزمون در همان ابتدا تحت تأثیر سؤال‌های خوب و خرده آزمون‌های خوش ساخت با طول بهینه است.

کلیدواژه‌ها

وزن‌دهی

نمره کل ترکیبی

نظریة کلاسیک آزمون‌سازی

ضریب پایایی

نمره واقعی

خطای اندازه‌گیری

نمره فرمولی

عنوان مقاله English

Optimum Weighting to Entrance Subtests and Their Items to Make Composite Score

نویسندگان English

Soleyman Zolfagharnasab ¹

Ebrahim Khodaei ²

Gholamreza Yadegarzadeh ³

چکیده English

This research has been accomplished to weight national subtests and their items to make composite score. The aim of the project was to reduce measurement error associated with composite score in classical test theory framework. weighting procedure has been done in three levels; first, 30 multiple-choice-item test in differential calculus course with 3409 sample size was weighted at item choice level (choices popularity percent and formula score), and item level (simple total without weight or item effect weight, item factor weight and item difficulty weight). At subtest level, also, a test battery of medical assistance national test with 5 equally sized subtests, 6 items, which administered on 3572 candidates, has been weighted in different ways (average Pearson product-moment correlation coefficients weights, factor weights and regression coefficients). Another test battery of medical assistance national test with 5 unequal subtest length, 45, 26, 24,6, and 6 items which administered on 3638 candidates were studied without weighting just on their effective weights. This research revealed that formula score method produces more error than other procedures. Only weighting by item difficulty could rearrange examinees ranking in favor of qualified examinees. Other weighting methods are not satisfactory to enhance reliability coefficients and reliability coefficients, ab initio, are affected by appropriate items and well-made subtests with optimum length.

کلیدواژه‌ها English

weighting

composite score

Classical Test Theory

reliability coefficient

true score

measurement error

formula score

- پرند، کوروش و ذوالفقارنسب، سلیمان (1389). گزارش تحلیلی آزمون دستیاری پزشکی13/12/1388. وزارت بهداشت، درمان و آموزش پزشکی. انتشار نیافته.

- آلن، مری جی. و ین، وندی ام. (1979). مقدمه‌ای بر نظریه‌های اندازه‌گیری (روانسنجی). ترجمه علی دلاور (1374). سازمان مطالعه و تدوین کتب علوم انسانی دانشگاه‌ها.

- Brennan, Robert L. (2004). Some Perspectives on Inconsistencies among Measurement Models. Measurement and Assessment (CASMA) College of Education. University of Iowa City, IA 52242. Tel: 319-335-5439. Fax: 319-384-0505. Web: www. uiowa. edu/˜casma.

- Childs, Ruth A., Susan Elgie, Tahany Gadalla, Ross Traub, Andrew P. Jaciw (2004). IRT-Linked Standard Errors of Weighted Composites. Practical Assessment, Research & valuation, 9 (13). Retrieved May 17, 2010 from http: //PAREonline. net/getvn. asp?v=9&n=13

- De Klerk, G. (2008). Classical test theory (CTT). In M. Born, C. D. Foxcroft & R. Butter (Eds.), Online Readings in Testing and Assessment, International Test Commission, http: //www. intestcom. org/Publications/ORTA. php.

- He, Qing ping (2009). Estimating the Reliability of Composite Scores. Office of Qualifications and Examinations Regulation Spring Place Coventry Business Park Herald Avenue Coventry CV5 6UB. www. ofqual. gov. uk

- Hendrickson, A., Patterson, B. & Ewing, M. (2010). Developing Form Assembly Specifications for Exams with Multiple Choice and Constructed Response Items: Balancing reliability and validity concerns. Paper presented at the Annual Conference of the National Council for Measurement in Education, May 1, 2010, Denver.

- Nedelsky, L. (1954). Absolute Grading Standards for Objective Tests.Educational and Psychological Measurement. 1954, v. 14 no. 1.

- Nelson, L. R. (2001). Item Analysis for Tests and Surveys Using Lertap 5. University of Technology Perth, Western Australia. www. lertap. curtin. edu. au/HTMLHelp/LRTP5HHelp. pdf

- Roid, G. H., & Carson, A. D. (2003). Special Composite Scores for the SB5. (Stanford-Binet Intelligence Scales, Fifth Edition Assessment Service Bulletin No. 4). Itasca, IL: Riverside Publishing.

- Shun-Wen Chang (2008). Effects of Gaps-Minimizing Approaches on the Raw-to-Scale Score Conversions When Forms Vary in Difficulty. Bulletin of Educational Psychology, 2008, 39, Special Issue on Test and Measurement, 151-174 National Taiwan Normal University, Taipei, Taiwan, R. O. C.

- Shun-Wen Chang (2009). Choice of Weighting Scheme in Forming the Composite. Department of Educational Psychology and Counseling. National Taiwan Normal University. Bulletin of Educational Psychology, 2009, 40 (3), 489-510

- Stanley, J. C., & Wang, M. D. (1968). Differential weighting a survey of methods and empirical studies. New York/10027

- Webb, N. M., Shavelson, R. J., & Haertel E. H. (2006). Reliability Coefficients and Generalizability Theory. Handbook of Statistics, Vol. 26. ISSN: 0169-7161. © 2006 Elsevier B. V. DOI: 10. 1016/S0169-7161 (06) 26004-8.

- Yu, Min-Ning (1991). The Assessment of partial Knowledge. The Journal of National Chengchi University, Vol. 63, 1991.

دوره 3، شماره 4 - شماره پیاپی 4
زمستان 1392
صفحه 79-104

XML

اصل مقاله 952.79 K

تعداد مشاهده مقاله 4,075
تعداد دریافت فایل اصل مقاله 893

مطالعات اندازه گیری و ارزشیابی آموزشی

وزن‌دهی بهینه به سؤال‌ها و خرده آزمون‌های ورودی برای ساخت نمره کل ترکیبی

Optimum Weighting to Entrance Subtests and Their Items to Make Composite Score

دوره 3، شماره 4 - شماره پیاپی 4زمستان 1392صفحه 79-104

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 3، شماره 4 - شماره پیاپی 4
زمستان 1392
صفحه 79-104