Person Fit Assessment in the 8th Grade Math Test for TIMSS 2015

Document Type : Original Article




The validity of test scores may be compromised because of the presence of aberrant responding behaviors. The presen research applies the  &  person fit statistics to examine the response pattern of TIMSS 8th math test of Australia, Iran, and Republic of Korea. After determining the students with misfitting response pattern, the impact of the inclusion and exclusion of their response on the items parameters estimates has been investigated. The changes in item parameter estimates were found to be significant for some items. In addition, the equivalence of students’ response pattern were determined through   &  person fit statistics, and it was found that there is a consistency between the results of the two statistics. The relationship between the fitting/ misfitting response patterns of the students and their ability has been also studied, and it showed that there is a significant relationship between student’s ability parameter estimation and their response pattern for the  person fit statistic, but for the  person fit statistic, this relationship was not significant.


سرمد، زهره؛ بازرگان، عباس و حجازی، الهه (1384). روش‌های تحقیق در علوم رفتاری. تهران: نشر آگاه.
کبیری، مسعود؛ کریمی، عبدالعظیم و بخشعلی‌زاده، شهرناز (1395). یافته‌های ملی تیمز 2015، روند 20 ساله آموزش علوم و ریاضیات ایران در چشم‌انداز بین‌المللی. پژوهشگاه مطالعات آموزش‌وپرورش، انتشارات مدرسه.
مینایی، اصغر و فلسفی‌نژاد، محمدرضا (1389). روش‌های سنجش تک‌بعدی بودن سؤال‌ها در مدل‌های دوارزشی IRT. فصلنامه اندازه‌گیری تربیتی، 1 (3)، 71–100.
Albers, C. J.; Meijer, R. R. & Tendeiro, J. N. (2016). Derivation and applicability of asymptotic results for multiple subtests person-fit statistics. Applied Psychological Measurement, 40 (4), 274-288.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.
Childs, R. A. & Jaciw, A. P. (2003). Matrix sampling of items in large-scale assessments. Practical Assessment, Research & Evaluation, 8 (16), 1 – 9.
Conijn, J. M.; Emons, W. H. M. & Sijtsma, K. (2014). Statistics lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38 (2), 122-136.
Cui, Y. & Li, J. (2015). Evaluating Person fit for cognitive diagnostic assessment. Applied Psychological Measurement, 39 (3), 223-238.
Cui, Y. & Mousavi, A. (2015). Explore the usefulness of person-fit analysis on large-scale assessment. International Journal of Testing, 15 (1), 23-49.
De Champlain, A. F. & Gessaroli, M. F. (1998). Assessing the dimensionality of item response matrices with small sample size and short test lengths. Applied Measurement in Education, 11 (1), 231-253.
Drasgow, F.; Levine, M. V. & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38 (1), 67–86.
Finch, H. & Habing, B. (2005). Comparison of NOHARM and DETECT in item cluster recovery: Counting dimensions and allocating items. Journal of Educational Measurement, 42 (2), 149-170.
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Harnisch, D. L. & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18 (3), 133–146.
Hendrawan, I.; Glas, C. A. & Meijer, R. R. (2005). The effect of person misfit on classification decisions. Applied psychological measurement, 29 (1), 26-44.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics, Applied Measurement in Education, 16 (4), 277-298.
Knol, D. L. & Berger, P. F. (1991). Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 26 (3), 457-477.
Lamprianou, I. & Boyle, B. (2004). Accuracy of measurement in the context of mathematics national curriculum tests in England for ethnic minority pupils and pupils who speak English as an additional language. Journal of Educational Measurement, 41 (3), 239–259.
Levine, M. V. & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies. British Journal of Mathematical & Statistical Psychology, 35 (1), 42–56.
Levine, M. V. & Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika, 53 (2), 161–176.
Magis, D.; Raiche, G. & Beland, S. (2012). A didactic presentation of Snijder’s  index of person fit with emphasis on response model selection and ability estimation. Journal of Educational & Behavioral Statistics, 37 (1), 57-81.
Martin, M. O.; Mullis, I. V. S. & Hooper, M. (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. J. van der Linden & R. K. Hambleton (Ed.), Handbook of Modern Item Response Theory (pp. 258-269). New York: Springer Verlag.
Meijer, R. R. (1997). Person fit and criterion-related validity: An extension of the Schmitt, Cortina, and Whitney study. Applied Psychological Measurement, 21 (2), 99 -113.
Meijer, R. R. & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25 (2), 107-135.
Mousavi, S. A. (2015). The effect of person misfit on item parameter estimation: A simulation study. Doctoral dissertation, University of Alberta.
Mousavi, A. Tendeiro, J. N. & Younesi, J. (2016). Person fit assessment using the PerFit package in R. The Quantitative Methods for Psychology, 12 (3), 232-242.
Olson, J. F. Martin, M. O. & Mullis, I. V.S. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Phillips, S. E. (1986). The effects of deletion of misfitting persons on vertical equating via the Rasch model. Journal of Educational Measurement, 23 (2), 107–118.
Rudner, L. M. Bracey, G. & Skaggs, G. (1996). The use of a person-fit statistic with one high quality achievement test. Applied Measurement in Education, 9 (1), 91–109.
Rupp, A. A. (2013). A systematic review of the methodology for person fit research in Item Response Theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test & Assessment Modeling, 55 (1), 3-38.
Schmitt, N. S. Cortina, J. M. & Whitney, D. J. (1993). Appropriateness fit and criterion-related validity. Applied Psychological Measurement, 17 (2), 143-150.
Sijtsma, K. (1986). A coefficient of deviance of response patterns. Kwantitatieve Methoden, 7(22), 131–145.
Sijtsma, K. & Meijer, R. R. (1992). A method for investigating the intersection of item response function in Mokken’s nonparametric IRT model. Applied Psychological Measurement, 16 (2), 149-157.
Smith, R. M. (1985). A comparison of Rasch person analysis and robust estimators. Educational & Psychological Measurement, 45 (3), 433–444.
Smith, R. M. (1986). Person fit in the Rasch model. Educational & Psychological Measurement, 46 (2), 359–372.
Snijders, T. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66 (3), 331-342.
Sotaridona, L. S.; Choi, S. W. & Meijer, R. R. (2005). The Effect of Misfitting Response Vectors on Item Calibration and Performance Classification. Retrieved May 2013, from CTB/McGraw-Hill:
Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7 (2), 201-210.
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49 (1), 95–110.
Tatsuoka, K. K. & Tatsuoka, M. M. (1983). Spotting erroneous rules of operation by the individual consistency index. Journal of Educational Measurement, 20 (3), 221–230.
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13 (3), 267–298.
Wright, B. D. & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.
Wright, B. D. & Stone, M. H. (1979). Best test design. Rasch measurement. Chicago: Mesa Press.