Identification of Optimal Equating Method in Multidimensional Tests

Document Type : Original Article




Equating is one of the most important issues in educational measurement, violation of the assumptions of which results in serious challenges in it. In multidimensional tests, the use of unidimensional equating methods causes bias in results. Therefore, the purpose of this study was to identify the optimal equating methods in multidimensional tests. Six equating methods consisting of unidimensional and multidimensional methods was compared with each other. The equipercentile method was considered as the criterion for comparing the other methods in terms of being robust against the unidimensionality assumption. The statistical population consisted of all the candidates in Mathematics’ entrance exam in the years 2017 and 2018. Mathematics’ exam data of 5,000 people from both years were selected for equating. Test dimensions were determined using NOHARM and MPLUS software, the ltm package was used to estimate the unidimensional parameters and the mirt package in the R software was used to estimate the multidimensional parameters. IRT unidimensional observed score and true score equating conduct with PIE program and equipercentile equating was performed using R equate package. The mirt, mvnorm, and MASS packages in R software were also used for multidimensional equating.  The results showed that the most optimal method for multidimensional tests equating was FULL MIRT observed score equating,and then the unidimensionalized MIRT observed score equating is the appropriate method and that using unidimensional methods of the observed score and true score are not efficient in such conditions. Therefore, it is recommended that FULL MIRT observed score methods are used in equating tests with multidimensional structure.


ایزانلو، بلال؛ بازرگان، عباس؛ فرزاد، ولی‌الله؛ صادقی، ناهید؛ کاوسی، امیر (1393). تفکیک ابعاد متعامد از خوشه‌های سؤال بر اساس هشت روش تعیین بعد در داده‎های دوارزشی: مورد آزمون ریاضی رشته ریاضی فیزیک کنکور 92-91. فصلنامه اندازه‌گیری تربیتی، 18(5)، 207-240.
رضوانی‌فر، شیرین (1391). همترازسازی نمرات دروس ریاضی و فیزیک رشته علوم تجربی آزمون کنکور سراسری سال‌های 1388 و 1389 براساس نظریه‌های کلاسیک و جدید اندازه‌گیری. پایان‌نامه کارشناسی ارشد، دانشکده روان‌شناسی و علوم تربیتی، دانشگاه علامه طباطبائی.
شاطریان محمدی، فاطمه (1382). مقایسه سه روش همترازسازی همصدک هموار نشده نمره مشاهده شدهIRT و نمره واقعیIRT در طرح گروه های نامعادل با سؤالات لنگر. پایان‌نامه کارشناسی ارشد، دانشکده روان‌شناسی و علوم تربیتی، دانشگاه علامه طباطبائی.
لرد، فردریک، ام (1980). کاربردهای نظریه سؤال- پاسخ؛‌ ترجمه علی دلاور و جلیل یونسی (1391).‌ تهران: انتشارات رشد.
مقدم‌زاده، علی (1392). روش بهینه همترازسازی با توجه به ویژگی های بومی آزمونهای ملی ایران: مورد مطالعه آزمون تولیمو و آزمون‌های جامع کنکورهای آزمایشی سازمان سنجش آموزش کشور. رساله دکتری، دانشکده روان‌شناسی و علوم تربیتی، دانشگاه علامه طباطبائی.
واشقانی فراهانی، مریم (1380). کاربرد روش همترازسازی همصدک در معادل‌سازی نمرات آزمون‌های ورودی دانشگاه‌ها (کنکور ورودی سال 1387). پایان‌نامه کارشناسی ارشد، دانشکده روان‌شناسی و علوم تربیتی، دانشگاه علامه طباطبایی.
Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues & Practice22(3), 37-51.
Akin Arikan, C. (2019). A Comparison of Kernel Equating Methods Based on Neat Design. Eurasian Journal of Educational Research, 82, 27-44.
Béguin, A. A., & Hanson, B. A. (2001). Effect of noncompensatory multidimensionality on separate and concurrent estimation in IRT observed score equating. annual meeting of the National Council on Measurement in Education, Seattle, WA.
Brossman, B. G. (2010). Observed score and true score equating procedures for multidimensional item response theory [Doctoral dissertation, University of Lowa].
Brossman, B. G., & Lee, W. C. (2013). Observed score and true score equating procedures for multidimensional item response theory. Applied Psychological Measurement37(6), 460-481.
Cao, Y. (2008). Mixed-format test equating: Effects of test dimensionality and common-item sets. [Doctoral dissertation, University of Maryland].
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software48(6), 1-29.
Champlain, A. F. (1996). The Effect of Multidimensionality on IRT True‐Score Equating for Subgroups of Examinees. Journal of Educational Measurement33(2), 181-201.
Chen, J. (2014). Model selection for IRT equating of Testlet-based tests in the random groups design. [Doctoral dissertation, university of Iowa].
Choi, J. (2019). Comparison of MIRT observed score equating methods under the common-item nonequivalent groups design. [Doctoral dissertation, The University of Iowa].
Dorans, N. J. & Holland, P. W. (2000). Population invariance and equitability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37, 281–306
Fraser, C., & McDonald, R. P. (2012). NOHARM 4: A Windows program for fitting both unidimensional.
Freeman, L. (2016). Assessing model-data fit for compensatory and non-compensatory multidimensional item response models using Vuong and Clarke statistics. [Doctoral Dissertation, University of Wisconsin-Milwaukee].
Genz, F., Bretz, T., Hothorn, T., Miwa, X., Mi, F., Leisch, & F. Scheipl (2008). Mvtnorm: Multivariate Normal and T Distribution. URL http://CRAN. R package version 0.9-0.
González, J., & Wiberg, M. (2017). Applying test equating methods. New York: Springer. doi, 10, 978-3.
Han, T., Kolen, M., & Pohlmann, J. (1997). A comparison among IRT true-and observed-score equatings and traditional equipercentile equating. Applied Measurement in Education, 10(2), 105-121.
Hanson, B., & Zeng, L. (2004). PIE: A computer program for IRT equating. (Windows Console Version, Revised by Z. Cui, May 20, 2004) [Manual]. Unpublished
Hanson, B., & Zeng, L. (2004). PIE: A computer program for IRT equating. Iowa City, IA, US: CASMA.
Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6(1), 53-60.
Jasper, F. (2010). Applied dimensionality and test structure assessment with the START-M mathematics test. International Journal of Educational & Psychological Assessment, 6(1), 104-125.
Kahraman, N. (2013). Unidimensional interpretations for multidimensional test items. Journal of Educational Measurement, 50(2), 227-246.
Kim, S. Y. (2018). Simple structure MIRT equating for multidimensional tests theory. [Doctoral dissertation, University of Lowa].
Kim, S. Y., Lee, W. C., & Kolen, M. J. (2020). Simple-Structure Multidimensional Item Response Theory Equating for Multidimensional Tests. Educational & Psychological Measurement, 80(1), 91-125.
Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York: Springer-Verlag.
Lee, E, Lee, W.C., Brennan, R. L. (2014). Equating Multidimensional Tests under a Random Groups Design: A Comparison of Various Equating Procedures, Center for Advanced Studies in Measurement and Assessment, CASMA Research Report, 40,
Lee, S. H. (2007). Multidimensional item response theory: A SAS MDIRT macro and empirical study of PIAT math test. [Doctoral dissertation, University of Oklahoma].
Li, Y. H. (1997). An evaluation of multidimensional IRT equating methods by assessing the accuracy of transforming parameters onto a target test metric. [Doctoral dissertation, University of Maryland].
Li, Y. H., & Lissitz, R. W. (2000). An evaluation of the accuracy of multidimensional IRT linking. Applied Psychological Measurement24(2), 115-138.
Li, Y., Jiao, H., & Lissitz, R. W. (2012). Applying multidimensional item response theory models in validating test dimensionality: An example of K–12 large-scale science assessment. Journal of Applied Testing Technology13(2), 1-27.
Lim, E. (2016). Subscore equating with the random groups design. [Doctoral dissertation, University of Lowa].
Lim, E.;  Lee, W. C. (2016). Subscore Equating and Reporting. Center for Advanced Studies in Measurement and Assessment, CASMA Research Report.
Lord, F. M. (1980). Applications of item response theory to practical testing problems, 1st Ed. Lawrence Erlbaum Associates.
Lord, F. M., & Wingersky, M. S. (1984). Comparison of IRT true-score and equipercentile observed-score equatings. Applied Psychological Measurement, 8, 453–461 .
Meng, Y. (2012). Comparison of Kernel Equating and Item Response Theory Equating Methods. [Doctoral dissertation, University of Massachusetts Amherst].
Min, K. S., & Kim, J. P. (2003). A Comparison of Two Linking Methods for Multidimensional IRT Scale Transformations. ACT.
Peterson, J. L. (2014). Multidimensional item response theory observed score equating methods for mixed-format tests. [Doctoral dissertation, University of Lowa].
R Development Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Available from
Reckase, M. D. (2009). Multidimensional Item Response Theory: Statistics for Social and Behavioral Sciences. New York, NY: Springer.
Ricker, K. L. (2007). The Consequence of Multidimensionality IRT Equating Outcomes Using a Common-Items Nonequivalent Groups Design. [Doctoral dissertation, university of Alberta].
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1-25.
Sansivieri, V., Wiberg, M., & Matteucci, M. (2017). A review of test equating methods with a special focus on IRT-based approaches. Statistica77(4), 329-352.
Seo, D. G., & Weiss, D. J. (2015). Best design for multidimensional computerized adaptive testing with the bifactor model. Educational & Psychological Measurement75(6), 954-978.
Simon, M. K. (2008). Comparison of concurrent and separate multidimensional IRT linking of item parameters. [Doctoral Dissertation, University of Minnesota].
Spence, P. D. (1996). The effect of multidimensionality on unidimensional equating with item response theory. [Doctoral dissertation, University of Florida].
Suksuwan, S., Junpeng, P., Ngudgratoke, S., & Guayjarernpanishk, P. (2012). The Effect of Proportion Common Item's with Mixed Format Test on Multidimensional Item Response Theory Linking. Procedia-Social & Behavioral Sciences, 69, 1505-1511.
Svetina, D., & Levy, R. (2012). An overview of software for conducting dimensionality assessment in multidimensional models. Applied Psychological Measurement, 36(8), 659-669.
Venables, W. N. and Ripley, B. D. (2019). Modern Applied Statistics with S-PLUS. Third Edition.
Von Davier, A. (Ed.). (2010). Statistical models for test equating, scaling, and linking. Springer Science & Business Media.
Von Davier, A. A., Holland, P. W., & Thayer, D. T. (2003). The kernel method of test equating. Springer Science & Business Media.
Wang, S., Zhang, M., & You, S. (2020). A Comparison of IRT Observed Score Kernel Equating and Several Equating Methods. Frontiers in Psychology, 11, 308.
Wetzel, E., & Hell, B. (2014). Multidimensional Item Response Theory models in vocational interest measurement: An illustration using the AIST-R. Journal of Psychoeducational Assessment32(4), 342-355.
Wiberg, M. (2018). equateIRT Package in R. Measurement: Interdisciplinary Research and Perspectives, 16(3), 195-202.
Zhang, O. (2012). Observed score and true score equating for multidimensional item response theory under nonequivalent group anchor test design. [Doctoral Dissertation, University of Florida].