سرمد، زهره؛ بازرگان، عباس و حجازی، الهه (1384). روشهای تحقیق در علوم رفتاری. تهران: نشر آگاه.
کبیری، مسعود؛ کریمی، عبدالعظیم و بخشعلیزاده، شهرناز (1395). یافتههای ملی تیمز 2015، روند 20 ساله آموزش علوم و ریاضیات ایران در چشمانداز بینالمللی. پژوهشگاه مطالعات آموزشوپرورش، انتشارات مدرسه.
مینایی، اصغر و فلسفینژاد، محمدرضا (1389). روشهای سنجش تکبعدی بودن سؤالها در مدلهای دوارزشی IRT. فصلنامه اندازهگیری تربیتی، 1 (3)، 71–100.
Albers, C. J.; Meijer, R. R. & Tendeiro, J. N. (2016). Derivation and applicability of asymptotic results for multiple subtests person-fit statistics. Applied Psychological Measurement, 40 (4), 274-288.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.
Childs, R. A. & Jaciw, A. P. (2003). Matrix sampling of items in large-scale assessments. Practical Assessment, Research & Evaluation, 8 (16), 1 – 9.
Conijn, J. M.; Emons, W. H. M. & Sijtsma, K. (2014). Statistics lz-based person-fit methods for noncognitive multiscale measures. Applied Psychological Measurement, 38 (2), 122-136.
Cui, Y. & Li, J. (2015). Evaluating Person fit for cognitive diagnostic assessment. Applied Psychological Measurement, 39 (3), 223-238.
Cui, Y. & Mousavi, A. (2015). Explore the usefulness of person-fit analysis on large-scale assessment. International Journal of Testing, 15 (1), 23-49.
De Champlain, A. F. & Gessaroli, M. F. (1998). Assessing the dimensionality of item response matrices with small sample size and short test lengths. Applied Measurement in Education, 11 (1), 231-253.
Drasgow, F.; Levine, M. V. & Williams, E. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38 (1), 67–86.
Finch, H. & Habing, B. (2005). Comparison of NOHARM and DETECT in item cluster recovery: Counting dimensions and allocating items. Journal of Educational Measurement, 42 (2), 149-170.
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150.
Harnisch, D. L. & Linn, R. L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18 (3), 133–146.
Hendrawan, I.; Glas, C. A. & Meijer, R. R. (2005). The effect of person misfit on classification decisions. Applied psychological measurement, 29 (1), 26-44.
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics, Applied Measurement in Education, 16 (4), 277-298.
Knol, D. L. & Berger, P. F. (1991). Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 26 (3), 457-477.
Lamprianou, I. & Boyle, B. (2004). Accuracy of measurement in the context of mathematics national curriculum tests in England for ethnic minority pupils and pupils who speak English as an additional language. Journal of Educational Measurement, 41 (3), 239–259.
Levine, M. V. & Drasgow, F. (1982). Appropriateness measurement: Review, critique and validating studies. British Journal of Mathematical & Statistical Psychology, 35 (1), 42–56.
Levine, M. V. & Drasgow, F. (1988). Optimal appropriateness measurement. Psychometrika, 53 (2), 161–176.
Magis, D.; Raiche, G. & Beland, S. (2012). A didactic presentation of Snijder’s index of person fit with emphasis on response model selection and ability estimation. Journal of Educational & Behavioral Statistics, 37 (1), 57-81.
Martin, M. O.; Mullis, I. V. S. & Hooper, M. (2016). Methods and procedures in TIMSS 2015. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. J. van der Linden & R. K. Hambleton (Ed.), Handbook of Modern Item Response Theory (pp. 258-269). New York: Springer Verlag.
Meijer, R. R. (1997). Person fit and criterion-related validity: An extension of the Schmitt, Cortina, and Whitney study. Applied Psychological Measurement, 21 (2), 99 -113.
Meijer, R. R. & Sijtsma, K. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25 (2), 107-135.
Mousavi, S. A. (2015). The effect of person misfit on item parameter estimation: A simulation study. Doctoral dissertation, University of Alberta.
Mousavi, A. Tendeiro, J. N. & Younesi, J. (2016). Person fit assessment using the PerFit package in R. The Quantitative Methods for Psychology, 12 (3), 232-242.
Olson, J. F. Martin, M. O. & Mullis, I. V.S. (2008). TIMSS 2007 Technical Report. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College.
Phillips, S. E. (1986). The effects of deletion of misfitting persons on vertical equating via the Rasch model. Journal of Educational Measurement, 23 (2), 107–118.
Rudner, L. M. Bracey, G. & Skaggs, G. (1996). The use of a person-fit statistic with one high quality achievement test. Applied Measurement in Education, 9 (1), 91–109.
Rupp, A. A. (2013). A systematic review of the methodology for person fit research in Item Response Theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test & Assessment Modeling, 55 (1), 3-38.
Schmitt, N. S. Cortina, J. M. & Whitney, D. J. (1993). Appropriateness fit and criterion-related validity. Applied Psychological Measurement, 17 (2), 143-150.
Sijtsma, K. (1986). A coefficient of deviance of response patterns. Kwantitatieve Methoden, 7(22), 131–145.
Sijtsma, K. & Meijer, R. R. (1992). A method for investigating the intersection of item response function in Mokken’s nonparametric IRT model. Applied Psychological Measurement, 16 (2), 149-157.
Smith, R. M. (1985). A comparison of Rasch person analysis and robust estimators. Educational & Psychological Measurement, 45 (3), 433–444.
Smith, R. M. (1986). Person fit in the Rasch model. Educational & Psychological Measurement, 46 (2), 359–372.
Snijders, T. B. (2001). Asymptotic null distribution of person fit statistics with estimated person parameter. Psychometrika, 66 (3), 331-342.
Stocking, M. L. & Lord, F. M. (1983). Developing a common metric in item response theory. Applied Psychological Measurement, 7 (2), 201-210.
Tatsuoka, K. K. (1984). Caution indices based on item response theory. Psychometrika, 49 (1), 95–110.
Tatsuoka, K. K. & Tatsuoka, M. M. (1983). Spotting erroneous rules of operation by the individual consistency index. Journal of Educational Measurement, 20 (3), 221–230.
Van der Flier, H. (1982). Deviant response patterns and comparability of test scores. Journal of Cross-Cultural Psychology, 13 (3), 267–298.
Wright, B. D. & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA Press.
Wright, B. D. & Stone, M. H. (1979). Best test design. Rasch measurement. Chicago: Mesa Press.