مقایسۀ روش‌های معیارگزینیِ نقطه‌گذاری معیار و علامت‌گذاری در دسته‌بندی سطوح عملکرد مطالعة کلان‌مقیاسِ سنجش ریاضی

نوع مقاله : مقاله پژوهشی

نویسنده

استادیار پژوهشگاه مطالعات آموزش و پرورش، تهران، ایران

10.22034/emes.2021.248192

چکیده

هدف: معیارگزینی یکی از فنون سنجش برای طبقه‌بندی معتبرِ آزمودنی‌ها است. در این مطالعه، تأثیر استفاده از دو روش معیارگزینیِ نقطه‌گذاری معیار و علامت‌گذاری بر نتایج حاصله از مطالعة کلان مقیاسی تحلیل شد که برای سنجش یادگیری ریاضی پایة ششم در بین دانش‌آموزان شهر تهران اجرا شده بود.
روش پژوهش: این روش‌ها روی داده‌های سنجش کلان‌مقیاس استانی که بر 9720 دانش‌آموز پایۀ ششم شهر تهران اجرا شده بود، مقایسه شدند. مشارکت‌کنندگان در این پیمایش در مجموع 264 سؤال ریاضی را پاسخ دادند و پاسخ‌های آنان با استفاده از روش مقادیر محتمل تحلیل شدند.
یافته‌ها: نتایج نشان دادند که به‌کارگیری روش نقطه‌گذاری معیار باعث می‌شود که به ترتیب 75، 48، 18 و 2 درصد از دانش‌آموزان حداقل نمرات لازم را در سطوح عملکردی پایین، متوسط، بالا و پیشرفته کسب کنند. هم‌چنین، با استفاده از این روش 9/23 درصد از سؤالات در همان سطحی قرار گرفتند که توسط کارشناسان موضوعی تعیین شده بودند. در مقابل، مقایسة فاصلة میانگین‌های متوالیِ پارامتر جایگاه با انحراف معیار جایگاه در سطوح عملکردی، کیفیت دسته‌بندی اولیة کارشناسان برای استفاده در روش علامت‌گذاری را زیر سؤال برد. علاوه‌براین، تأثیر استفاده از پنج احتمال پاسخِ 52/0، 57/0، 62/0، 67/0 و 75/0 بر دسته‌بندی دانش‌آموزان نشان داد که با وجود تأکید پیشینة پژوهشی بر احتمال پاسخِ 67/0، کم‌ترین احتمال پاسخ (52/0) نتایج واقعی‌تری را نسبت به بقیه تولید می‌کند ولی هم‌چنان در مقایسه با روش نقطه‌گذاری معیار معیار سخت‌گیرانه‌ای به‌نظر می‌رسد.
نتیجه‌گیری: باید به معیارگزینی به عنوان یک مبحث فنی در همه سنجش‌هایی که درجه‌بندی یا قبول و ردی یکی از تبعات شرکت در آزمون است، توجه بیشتری شود.

کلیدواژه‌ها


عنوان مقاله [English]

A Comparison between Benchmarking and Bookmarking to Classification of Performance Levels in Large-scale Study of Mathematics Assessment

نویسنده [English]

  • Masoud Kabiri
Assistant Professor of Research Institute for Education
چکیده [English]

Objective: Standard setting is one of the assessment techniques to create valid classifications of examinees. In present study, the effect of two standard setting methods, benchmark and bookmarking, was examined in results of a large-scale study, which was planned for assessing mathematics learning in sixth grade students of Tehran city.
Methods: Two methods were compared using data of a provincial large-scale assessment which carried out on 9720 sixth grade students in Tehran city. They asked 264 mathematics items and their response were analyzed by plausible values. 
Results: Results of applying benchmark showed that 75, 48, 18, and 2 percent of students attained minimum scores in low, mediate, high, and advanced levels; respectively. In addition, 23.9 percent of items located in the same level that identified by content experts. In contrast, quality of classification by content experts in bookmarking was critiqued due to comparing of successive averages with standard deviations of location parameters. Moreover, effect of using five response probabilities: 0.52, .057, 0.62, 0.67, and 0.75 in classification of students indicated that, in spite of recommendation of response probability 0.67 in literature, the lowest response probability (0.52) produced the most realistic results rather than other response probabilities, however, this is still a strictly standard comparing benchmarking methods.
Conclusion: Standard setting should be considered as a technical issue in all assessments that grading or pass/fail is consequent of the test.

کلیدواژه‌ها [English]

  • standard setting
  • benchmark
  • bookmarking
  • math education
Cartwright, F. (2015). Item and test analysis. In G. Shiel & F. Cartwright (Eds.), Analyzing data from a national assessment of educational achievement (vol. 4) (pp. 125-257). Washington DC: International Bank for Reconstruction and Development / The World Bank.
Cizek, G., & Bunch, M. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. California: SAGE Publications, Inc.
Foy, P., & Yin, L. (2016). Scaling the TIMSS 2015 achievement data. In M. Martin, I. Mullis, & M. Hooper (Eds.), Methods and procedures in TIMSS 2015 (pp. 13.11-13.63). Boston: TIMSS & PIRLS International Study Center.
Habibzadeh, S., Delavar, A., Farrokhi, N., Minaei, A., & Jalili, M. (2019). The use of Rasch and item mapping in determining cut score of comprehensive pre internship exam. Research in Medicine Education; 11 (3), 59-70. [in Persian].
Jalili, M., & Mortaz Hejri, S., (2012). Standard setting for objective structured clinical exam using four methods: Pre-fixed score, Angoff, borderline regression and Cohen, Strides in Development of Medical Education, 9(1), 77-84. [in Persian].
Jalalizadeh, M., Delavar, A., Farokhi, N., & Askari, M. (2020). Comparison of ANGOF-based IRT method and Bookmark method for standard Setting of MSRT language test, Journal of Research in Teaching, 7(4), 49-69. [in Persian]
Kabiri, M. (2019). BARAKAAT: Program for monitoring educational quality in Tehran – Iran’s first provincial large-scale assessment, UNESCO-NEQMAP website. Retrieved in https://neqmap.bangkok.unesco.org/barakaat-program-for-monitoring-educational-quality-in-tehran-irans-first-provincial-large-scale-assessment/
Kabiri, M. (2020). Program for monitoring educational quality in Tehran (BARAKAAT): specifying the quality of math education in 6th grade (vol. 1), Research report: Tehran city department of Education. [in Persian].
Kabiri, M. (in press). Quality of math and science education in Iran, comparing with others coantries: Results of TIMSS 2019, Tehran: Madresseh Pub. [in Persian].
Makarem, A., Mahdavifard, H., & Gholami, H. (2017). Evaluation of passing scores in semiotics: An objective structured clinical examination for medical students of Mashhad University of Medical Sciences, Strides in Development of Medical Education, 14(1), 42-50. [in Persian].
Mortaz Hejri, S., Jalili, M., & Labaf, A. (2012). Setting standard threshold scores for an objective structured clinical examination using Angoff method and assessing the impact of reality checking and discussion on actual scores. Iranian Journal of Medical Education. 11 (8) :885-894. [in Persian].
LaRoche, S., Joncas, M., & Foy, P. (2016). Sample design in TIMSS 2015. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and Procedures in TIMSS 2015 (pp. 3.1-3.37). Boston: TIMSS & PIRLS International Study Center and International Association for the Evaluation of Educational Achievement (IEA).
Lewis, D. M., Mitzel, H. C., & Schulz, M. (2012). The bookmark standard setting procedure. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (pp. 225-254): Routledge.
Lissitz, R. W. (2013). Standard setting: Past, present, and perhaps future. In M. Simon, K. Ercikan, & a. M. Rousseau (Eds.), Improving large-scale assessment in education: Theory, issues, and practice (pp. 154-174). New York: Routledge.
Mullis, I. V. S., Cotter, K. E., Centurino, V. A. S., Fishbein, B. G., & Liu, J. (2016). Using scale anchoring to interpret the TIMSS 2015 achievement scales. In I. V. S. M. M. O. Martin, & M. Hooper (Ed.), Methods and Procedures in TIMSS 2015 (pp. 14.11-14.47). Boston: TIMSS & PIRLS International Study Center and International Association for the Evaluation of Educational Achievement (IEA).
OECD. (2017). PISA 2015 Technical Report. Paris: OECD Publishing.
Olsen, R. V., & Nilsen, T. (2017). Standard setting in PISA and TIMSS and how these procedures can be used nationally. In S. Blömeke & J.-E. Gustafsson (Eds.), Standard Setting in Education (pp. 69-84): Springer.
Phillips, G. W. (2012). The benchmark method of standard setting. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (pp. 323-346): Routledge.
Price, L. R. (2017). Psychometric methods: Theory into practice. New York: Guilford Publications.
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data. Educational Researcher, 39(2), 142.
Shiel, G., & Cartwright, F. (2015). National Assessments of Educational Achievement, Volume 4: Analyzing Data from a National Assessment of Educational Achievement. Washington DC: The World Bank.
UIS. (2017). Constructing UIS proficiency scales and linking to assessment to support SDG indicator 4.1.1 reporting. Retrieved from UNESCO-UIS: http://uis.unesco.org/sites/default/files/documents/gaml4-constructing-uis-proficiency-scales-linking-assessments-support-sdg-indicator4.1.1-reporting.pdf