هم‌ترازسازی و پیوند نمرات در آزمون‌های سراسری

نوع مقاله : مقاله پژوهشی

نویسندگان

1 کارشناس پژوهشی سازمان سنجش آموزش کشور، تهران، ایران

2 دانشیار دانشکده روانشناسی و علوم تربیتی دانشگاه تهران، ایران

10.22034/emes.2024.561619.2412

چکیده

هدف: بنا به مصوبه شورای عالی انقلاب فرهنگی قرار است سازمان سنجش در سال دو بار آزمون سراسری برگزار کند و نمره هر آزمون تا دو سال اعتبار خواهد داشت. هدف این پژوهش معرفی و اجرای روش‌های مناسب هم‌ترازی در چهارچوب تئوری کلاسیک آزمون‌سازی است تا بتوان نمرات آزمون‌های دوره‌های مختلف را مقایسه کرد و بر اساس آنها دانشجویان را به شیوه‌ای عادلانه انتخاب کرد.
روش پژوهش: بر اساس طرح گروه‌های نابرابر با سوالات مشترک سه فرم آزمون X ، Y و Z حساب دیفرنسیل روی 3 گروه 600، 1111 و 2200 نفری به طور آزمایشی اجرا شده. تعداد سوال هر فرم در کل 21، 21 و 20 به ترتیب هر کدام با 6 سوال مشترک بوده است. داده‌ها در نرم افزار R با پکیج equate (آلبانو، 2018) تحلیل شده‌اند. تابع هم‌ترازی هم‌صدک برای هم‌ترازکردن این فرم‌ها به کار رفته؛ فرایند پیش هم‌وارسازی با تبدیلات لگاریتم خطی صورت گرفته تا پارامترها با خطای معیار کمتر و نمرات با دقت بیشری برآورد شوند.
یافته‌ها: میانگین دشواری این سه فرم آزمون X ، Y و Z به ترتیب، 03/9، 61/7 و 79/7 است. هم‌چنین هر سه فرم دارای چولگی متفاوت به ترتیب 20/0، 52/0 و 39/0 می‌باشد.
نتیجه‌گیری: در طرح گروه‌های نابرابر به دلیل دشواری متفاوت فرمهای آزمون‌ها نیازمند سوالات مشترک است و چولگی راست توزیع نمرات مستلزم استفاده از تابع هم‌ترازی هم‌صدک به عنوان مناسب‌ترین تابع برای هم‌ترازکردن  نمرات است؛ برای آزمون‌هایی که نمی‌توان سوالات مشترک تهیه کرد، پیشنهاد می‌شود به دلیل چولگی راست توزیع نمرات با همان تابع هم‌ترازی هم‌صدک آنها را پیوند داد.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Equating and Linking Scores in National Exams

نویسندگان [English]

  • Soleyman Zolfagharnasab 1
  • Ali Moghadamzadeh 2
  • Negar Sharifi Yeganeh 1
1 Research Expert of National Organization of Educational Testing (NOET), Iran
2 Associate Professor, Faculty of Psychology and Education, University of Tehran, Tehran, Iran
چکیده [English]

Objective: According to the approval of the Supreme Council of Cultural Revolution the National Organization of Educational Testing is supposed to hold a national exam twice a year and the score of each test would have credit for two years. The purpose of this research is to introduce and implement appropriate methods of equating in the framework of the classical test theory in order to compare the test scores of different courses to choose applicants in a fair way.
Methods: Based on common-item nonequivalent groups design three test forms X, Y and Z of differential calculus were experimentally implemented on the group of 600, 1111 and 2200 examinees. The number of test items were 21, 21 and 20 with six common item in total. Data were analyzed in R software with the package equate (Albano & Albano, 2018). The equipercentile function was used to equate the test forms; The pre-smoothing process with linear logarithmic transformations were done so that the parameters were estimated with less standard error and scores with more accuracy.
Results: The average difficulty of these test forms were 9.03, 7.61 and 7.79 respectively. Also, all three forms have different skewness 0.203, 0.518 and 0.392, respectively.
Conclusion: Due to the different difficulty and right skewness of the distribution of the tests scores, it is suggested to equate them with the equipercentile function in a nonequivalent anchor test design. The tests for which common item cannot be developed, it is recommended to link their scores by the same equipercentile function.

کلیدواژه‌ها [English]

  • Keywords: equipercentile
  • pre-smoothing
  • linear equating
  • anchor items
  • national exams

References

Albano, A., & Albano, M. A. (2018). Package ‘equate’. Available at: https://cran.r-project.org/web/packages/equate/index.html
Albano، A. D. (2016). Equate: An R package for observed-score linking and equating. Journal of Statistical Software, 74, 1-36.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In RL Thorndike (Ed.), Educational measurement.
Bahmanabadi, S., Falsafinejad, M., Delavar, A., Farrokhi, N., & Minaei, A. (2020). Identification of Optimal Equating Method in Multidimensional Tests. Educational Measurement and Evaluation Studies, 10(30), 217-264. doi: 10.22034/emes.2020.44489
Brennan, R.L. (2001). Generalizability Theory, Iowa Testing Programs. University of Iowa. Springerverlag, New York.
Chen,F., Huang, H. & MacGregor, D.,(2009). EQUATING OR LINKING: BASIC CONCEPTS AND A CASE STUDY. Originally presented at CAL, Washington. Available at: https://faculty.ecnu.edu.cn/picture/article/220/0c/13/03357e474db0b2d5de11abaef0fb/793ecb9d-fe0b-4ff5-bd56-78148d7d4210.pdf.x
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating. ETS Research Report Series2010(2), i-41.
Heh, V. K. (2007). Equating accuracy using small samples in the random groups design (Doctoral dissertation, Ohio University). Available at: https://etd.ohiolink.edu/apexprod/rws_etd/send_file/send?accession=ohiou1178299995&disposition=inline
Hendrickson, A. B., & Kolen, M. J. (2001). IRT Equating of the MCAT. MCAT Monograph.
Kim, S. H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied psychological measurement, 22(2), 131-143.
Levine, R. Equating the score scales of alternative forms administered to samples of different ability 1955 Princeton. NJ Educational Testing Service (ETS Research Bulletin No. 55-23). Available at: https://onlinelibrary.wiley.com/doi/epdf/10.1002/j.2333-8504.1955.tb00266.x
Liang, Z., Zhang, M., Huang, F., Kang, D., & Xu, L. (2021). Application Innovation of Educational Measurement Theory, Method, and Technology in China’s New College Entrance Examination Reform. Chinese/English Journal of Educational Measurement and Evaluation, 2(1), 3.
Livingston, S. A. (2014). Equating test scores (without IRT). Educational testing service.
Liu, J., & Low, A. C. (2007). An exploration of kernel equating using SAT® data: Equating to a similar population and to a distant population. ETS Research Report Series2007(1), i-22.
MoghadamZade, A.(2015). Optimal Smoothing Method of Data in Test Equating: The Case of TOLIMO and Comprehensive Trial Tests of Iran Educational Testing Organization. Quarterly of Educational Measurement, 6(21), 261-287. doi: 10.22054/jem.2015.5736
Muraki, E., Hombo, C. M., & Lee, Y. W. (2000). Equating and linking of performance assessments. Applied Psychological Measurement, 24(4), 325-337. Available at: https://www.researchgate.net/publication/247742704_Equating_and_Linking_of_Performance_Assessments
Parsaeian, M., NaghiZadeh, S., Naderi, H.  (2018). Selection the best Method of Equating Using Anchor-Test Design‎ in Item Response Theory . Andishe-_ye-Amari. Avalable at: http://andisheyeamari.irstat.ir/article-۱-۵۰۴-fa.html‎‎
Rezvanifar, S., Falsafinejad, M., Delavar, A., (2016). eguating methods. Quarterly of Educational Measurement, 7(26), 1-33. doi: 10.22054/jem.2017.2737.1085
Ryan, J., & Brockmann, F. (2009). A Practitioner's Introduction to Equating with Primers on Classical Test Theory and Item Response Theory. Council of Chief State School Officers.
Schultz, D. P., & Schultz, S. E. (1996). A History of Modern Psychology [1969]. Translated by: Saif, A., Sharifi, H. P., Ali Abadi, K. & Najafi Zand, J. (2005). Dowran publication
Seufert, B. (2012).When, why, and how the business analyst should use linear regression. Available at: https://mobiledevmemo.com/when-why-and-how-you-should-use-linear-regression/
Shea, J. A., & Norcini, J. J. (1995). Equating. Licensure Testing: Purposes, Procedures, and Practices. Edited by Impara JC. Lincoln, NE, Buros Center for Testing, 253-287.
‪Supreme Council of Cultural Revolution (2021). Policies and criteria for organizing assessment and acceptance Applicants for admission to higher education. Resolution no. 3217: avalable at: https://sccr.ir/pro/3217/
Swaminathan, H. (---) .linking and equating of test scores. University of Connecticut. Available at: https://slidetodoc.com/linking-and-equating-of-test-scores-hariharan-swaminathan/
Tian, F. (2011). A comparison of equating/linking using the Stocking-Lord method and concurrent calibration with mixed-format tests in the non-equivalent groups common-item design under IRT. Unpublished doctoral dissertation, Boston College.
von Davier, M., & von Davier, A. A. (2004). A unified approach to IRT scale linking and scale transformations. ETS Research Report Series, 2004(1), i-21.
Zolfagharnasab, S., Khodaei, E., & Yadegarzadeh, G. (2013). Optimum Weighting to Entrance Subtests and Their Items to Make Composite Score. Educational Measurement and Evaluation Studies, 3(4), 79-104.