کاربرد مدل های چند ارزشیIRT در نمره گذاری آزمون های سرنوشت ساز (مورد مطالعه: آزمون پروانه وکالت)

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دکتری سنجش و اندازه‌گیری ، دانشکده روان شناسی و علوم تربیتی، دانشگاه علامه طباطبائی، تهران، ایران

2 دانشیار، گروه سنجش و اندازه‌گیری، دانشکده روان شناسی و علوم تربیتی، دانشگاه علامه طباطبائی، تهران، ایران

3 استاد، گروه سنجش و اندازه‌گیری، دانشکده روان شناسی و علوم تربیتی، دانشگاه علامه طباطبائی، تهران، ایران

10.22034/emes.2023.563268.2426

چکیده

هدف: هدف مطالعه حاضر، مقایسه میزان دقت و خطای اندازه­گیری مدل­های دوارزشی و چند ارزشیIRT در نمره­گذاری آزمون­های توانایی سرنوشت‌ساز بود.
روش پژوهش: جامعه پژوهش شامل تمامی شرکت­کنندگان آزمون سراسری پروانه وکالت سال­های 1396 و 1398 بوده که از میان آن­ها تعداد 5000 نفر از سال 1396 و تعداد 5000 نفر از سال 1398 با روش نمونه­گیری تصادفی ساده انتخاب شدند. همچنین، گردآوری داده­ها با استفاده از پاسخ­های شرکت­کنندگان آزمون انجام یافت. متغیر مستقل این پژوهش، شیوه و مدل نمره­گذاری و متغیر وابسته، میزان برازش و آگاهی (دقت) مدل محسوب می­شود. بر این اساس، روش پژوهش آزمایشی است.
یافته‌ها: تجزیه و تحلیل یافته­ها نشان داد که از میان مدل­های لجستیک دوارزشی IRT، مدل 3 پارامتری، و از میان مدل­های چندارزشی اسمی مورد مطالعه نیز، مدل­ 3 پارامتری در مقایسه با سایر مدل­ها، برازش و نیز آگاهی­دهندگی بیشتر و مطلوب­تری بر روی داده­های مورد مطالعه داشتند.
نتیجه‌گیری: با توجه به برازش و میزان آگاهی مطلوب­تر مدل 3 پارامتری دو ارزشی و مدل 3 پارامتری چندارزشی در مقایسه با سایر مدل­ها، استفاده از این مدل­ها در نمره­گذاری می­تواند به افزایش دقت اندازه­گیری و کاهش خطا، و  نیز به منصفانه بودن فرآیند گزینش متقاضیان آزمون پروانه وکالت کمک نماید.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

The application of IRT Polytomous models in scoring high-stakes tests (Case of study: lawyer's license test)

نویسندگان [English]

  • Reza Payravi 1
  • Mohammadreza Falsafinejad 2
  • Asghar Minaei 2
  • Ali Delavar 3
  • Ali Farrokhi 3
1 Ph.D student, Faculty of Psychology and Education, Allameh Tabataba'i University, Tehran, Iran
2 Associate Professor, Department of Educational Measurement, Allameh Tabataba'i University, Tehran, Iran
3 Professor, Department of Educational Measurement, Allameh Tabataba'i University, Tehran, Iran
چکیده [English]

Objective: The aim of this study was to compare the accuracy and measurement error of dichotomous and Polytomous IRT models in scoring high-stakes, large-scale ability tests.
Methods: The statistical population of this study was included all the participants of the lawyer's license external tests in 2016 and 2018, from which 5000 persons and 5000 persons respectively were selected by random sampling. In addition, data collection was done using the responses of the participants of the above exam. Accordingly, the research method is experimental.
Results: The analysis of the findings showed that among the dichotomous IRT logistic models, the 3-parameter model, and among the nominal Polytomous models studied, the 3-parameter model are a better fits and information compared with other models on the data under study.
Conclusion: Considering the more favorable fit and the level of information of the 3-parameter dichotomous model and the 3-parameter Polytomous model compared with other models, the use of these models in scoring can increase the accuracy of measurement and reduce the error. In addition, the use of these models also helps the fairness of the selection process of the applicants for the lawyer's license exam.

کلیدواژه‌ها [English]

  • Keywords: IRT scoring
  • Dichotomous models
  • IRT nominal Polytomous models
  • Fairness of assessment

References

Abad, F.; Olea, J. & Ponsoda, V. (2009). The Multiple-Choice Model Some Solutions for Estimation of Parameters in the Presence of Omitted Responses. Applied Psychological Measurement, Vol. 33, No. 3, pp. 200-221.
Baker, F. B. & Ho Kim, S. (2017). The Basics of Item Response Theory Using R. Springer International Publishing.
Brown, A. & Croudace, T. (2015). Scoring and estimating score precision using multidimensional IRT. In Reise, S. P. & Revicki, D. A. (Eds.). Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment. New York: Routledge/Taylor & Francis Group.
Bock, R. D. (1997). The nominal categories model. In Handbook of modern item response theory. New York: Springer.
Bock, R. D. & Gibbons, R D. (2021). Item response theory. John Wiley & Sons Ltd.
Bolt, D.; Wollack, J. & Suh, Y. (2012). Application of a multidimensional nested logit model to multiple-choice test items. Psychometrika, 77(2), 339–357.
Burnham, K. P., & Anderson, D. R. (2002(. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag.
Burnham, K. P., & Anderson, D. R. (2004(. Multimodel Inference: understanding AIC and BIC in Model Selection, Amsterdam Workshop on Model Selection.
Carlson, J. E. & Von Davier, M. (2018). Item Response Theory. Available from https://ets.org
Chalmers, R. P. (2012). Mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29.
De Ayala, R. J. (1989). A comparison of the nominal response model and the three parameter logistic model in computerized adaptive testing. Applied measurement in education, 5, 17-34.
De Ayala, R. J. (2009). The Theory and Practice of Item Response Theory. Guilford Publications, Inc.
Drasgow, F. & Levine, M.V. & Tsien, S. (1995). Fitting Polytomous Item Response Theory Models to Multiple-Choice Tests. APPLIED PSYCHOLOGICAL MEASUREMENT. Vol. 19, No. 2.
DeMars, C. (2010). Item response theory. Published by Oxford University Press, Inc.
Kim, Jee-Seon. (2006). Using the Distractor Categories of Multiple-Choice Items to Improve IRT Linking. Journal of Educational Measurement, Vol. 43, No. 3, pp. 193–213.
Lacourly, N; Sanmartin, J; Silva, M; & Uribe, P. (2018). IRT Scoring and the principle of consistent order. Available from https://arXiv.org
Lahner, F; Schauber, S; Lorwald, A; Kropf, R; Guttormsen, S; Fischer, M; & Huwendiek, S. (2020). Measurement precision at the cut score in medical multiple-choice exams: Theory matters. Perspectives on Medical Education, 9, 220-228.
Myszkowski, N., & Storme, M. (2018).  A snapshot of g? Binary and Polytomous item-response theory investigations of the last series of the Standard Progressive Matrices (SPM-LS). Intelligence, 68, 109–116.
Paek, I. & Cole, K. (2020). USING R FOR ITEM RESPONSE THEORY MODEL APPLICATIONS. Routledge
Penfield, R, & La Torre, J. (2008). A new response model for multiple-choice items. Paper presented at the 2008 annual meeting of the National Council on Measurement in Education, New York.
Penfield, R. (2014). An NCME Instructional Module on Polytomous Item Response Theory Models. Educational Measurement: Issues and Practice, Vol. 33, No. 1, pp. 36–48.
Preston, K; Reise, S; Cai, L; & Hays, R. (2011). Using the Nominal Response Model to Evaluate Response Category Discrimination in the PROMIS Emotional Distress Item Pools. Educational and Psychological Measurement, 7(3), 523-550.
Price, L. R. (2017). Psychometric Methods. Guilford Publications, Inc.
Reif, M. (2014). IRT models for multiple-choice items (mcIRT).
Ritt, M. (2016). The impact of high-stakes testing on the learning environment. Master of social work clinical research papers. Paper 658.
Revelle, W. (2017). Psych: Procedures for Personality and Psychological Research. Available from https://personality-project.org/r/psych.
Samejima, F. (1996). Polychotomous responses and the test score. Available from https://eric.ed.gov
Simon, M; Ercikan, K; & Rousseau, M. (2013). Improving Large-Scale Assessment in education. New York: Routledge.
Storme, M; Myszkowski, N; Baron, S; & Bernard, D. (2019). Same Test, Better Scores: Boosting the Reliability of Short Online Intelligence Recruitment Tests with Nested Logit Item Response Theory Models. Intelligence, 7(3), 1-17.
Suh, Y., & Bolt, D. (2010). Nested logit models for multiple-choice item response data. Psychometrika, 75(3), 454–473.
Suh, Y. & Bolt, D. (2011). A nested logit approach for investigating distractors as causes of differential item functioning. Journal of Educational Measurement, 48, 188-205.
Thompson, N. (2021). Classical Test Theory vs. Item Response Theory: What are some key differences, and how to choose? Available from https://assess.com
Tour, L; Mengcheng, W; & Tao, X. (2017). An investigation of enhancement of ability evaluation by using a nested logit model for multiple-choice items. Annals of psychology, 33(3), 530-537.
Van der Linden, W. J. (2016). Handbook of Item Response Theory. Taylor & Francis Group, LLC.