References
Andrich, D. (2005). Rasch, Georg. Encyclopedia of Social Measurement, 3, 299–306. Angeles, CA: Sage.
Bijani, H. (2018). Effectiveness of A Training Program on Oral Performance Assessment: The Analysis of Tasks Using the Multifaceted Rasch Analysis. Journal of Modern Research in English Language Studies, 5(4), 27-53.
Bond, T. G., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). New York, NY: Routledge.
DeCotiis, T. A. (1977). An analysis of the external validity and applied relevance of three rating formats. Organizational Behavior and Human Performance, 19, 247-266.
Eckes, T. (2011). Introduction to many-facet Rasch measurement. Franfurt am Main: Peter Lang.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Engelhard Jr, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112.
Esfandiari, R. (2014). A Many-Facet Rasch Measurement of Bias among Farsi-Native Speaking Raters toward Essays Written by Non-Native Speakers of Farsi. Journal of Teaching Persian to Speakers of Other Languages, 3(VOL.3,NO.3,(TOME 8)), 25-54.
Esfandiari, R. & Myford, C. M. (2013). Severity Differences Among Self-Assessors, Peer-Assessors, and Teacher Assessors Rating EFL Essays. Assessing Writing, 18(2): 111-131.
Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical methods for rates and proportions (3rd ed.). Hoboken, NJ: Wiley.
Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2019). IRR: various coefficients of interrater reliability and agreement. 2012. R package version 0.84, 1.
Hays, W. L. (1994). Statistics (5th ed.). Belmont, CA: Wadsworth.
Kempf, W. F. (1972). Probabilistische Modelle experimentalpsychologischer Versuchssituationen [Probabilistic models of designs in experimental psychology]. Psychologische Beiträge, 14, 16–37.
Kim, S. C., & Wilson, M. (2009). A comparative analysis of the ratings in performance assessment using generalizability theory and the many-facet Rasch model. Journal of applied measurement, 10(4), 408–423.
Knoch, U. (2011). Investigating the effectiveness of individualized feedback to rating behavior: A longitudinal study. Language Testing, 28, 179–200.
Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
Linacre, J. M. (2006a). Demarcating category intervals. Rasch Measurement Transactions, 19, 1041–1043.
Linacre, J. M. (2014b). A user’s guide to FACETS: Rasch-model computer programs. Chicago: Winsteps.com. Retrieved from http://www.winsteps.com/facets.
Linacre, J. M., & Wright, B. D. (1989). The length of a logit. Rasch Measurement Transactions, 3, 54–5.
Linacre, J. M., & Wright, B. D. (2002). Construction of measures from many-facet data. Journal of Applied Measurement, 3, 484–509.
Ludlow, L. H., & Haley, S. M. (1995). Rasch model logits: Interpretation, use, and transformation. Educational and Psychological Measurement, 55, 967–975.
Masters, G. N. (2010). The partial credit model. In M. L. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 109–122). New York, NY: Routledge.
Micko, H. C. (1970). Eine Verallgemeinerung des Meßmodells von Rasch mit
einer Anwendung auf die Psychophysik der Reaktionen [A generalization of
Rasch’s measurement model with an application to the psychophysics of reactions]. Psychologische Beiträge, 12, 4–22.
Myers, J. L., Well, A. D., & Lorch, R. F. (2010). Research design and statistical.
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of applied measurement, 4(4), 386-422.
Ostini, R., & Nering, M. L. (2006). Polytomous item response theory models. Thousand Oaks, CA: Sage.
Penfield, R. D. (2014). An NCME instructional module on polytomous item response theory models. Educational Measurement: Issues and Practice, 33(1), 36–48.
Robitzsch, A., & Steinfeld, J. (2018a). immer: Item response models for multiple ratings. R package version 1.1-35.
Robitzsch, A., & Steinfeld, J. (2018b). Item response models for human ratings: Overview, estimation methods, and implementation in R. Psychological Test and Assessment Modeling, 60(1), 101-139.
Smith Jr, E. V., & Kulikowich, J. M. (2004). An application of generalizability theory and many-facet Rasch measurement using a complex problem-solving skills assessment. Educational and Psychological Measurement, 64(4), 617-639.
Stemler, S. E., & Tsai, J. (2008). Best practices in interrater reliability: Three common approaches. In J. W. Osborne (Ed.), Best practices in quantitative methods (pp. 29–49). Los
Tinsley, H. E. A., & Weiss, D. J. (2000). Interrater reliability and agreement. In Wolfe, E. W. (1997). The relationship between essay reading style and scoring proficiency in a psychometric scoring system. Assessing Writing, 4, 83–106.
Wolfe, E. W., & McVay, A. (2012). Application of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practice,31(3), 31–37.