Evaluating the Statistical Power of Logistic Regression Analysis in Detecting Differential Functioning of Test Items

Document Type : Original Article



Although logistic regression analysis has been introduced for detecting biased items of psychological and educational tests, but few researches have empirically investigated its power. The objectives of this research are to evaluate the statistical power of logistic regression analysis and to investigate the mediating factors in detecting differential functioning of test items. Monte Carlo simulation methods were used to answer the research questions. The required data were simulated using WINGEN software with respect to the mediating factors. The data include 3 different sample sizes, 2 types of uniform and non-uniform DIF, 4 different amounts of DIF and 3 levels of DIF items embedded in the simulated tests in 72 different experimental conditions with 100 iterations. So the results of current research is an indicator of desired statistical power of logistic regression analysis and it is proposed that this method is used more when DIF type is uniform and sample size is very large


–     رضایی، عباسعلی و شعبانی، عنایت‌الله (1389). تحلیل کارکرد افتراقی جنسیتی آزمون سنجش توانش عمومی زبان دانشگاه تهران. مجلۀ پژوهش‌های زبان خارجی، شمارۀ 56.

–     گرامی‌پور، مسعود و فلسفی‌نژاد، محمدرضا (1392). روش‌های آماری بررسی کنش افتراقیسؤال(DIF) در آزمون‌های سرنوشت‌ساز. تهران: انتشارات جهاد دانشگاهی واحد تربیت معلم.
–     Agresti, A. (2007). an introduction to categorical data analysis. New York: Wiley Interscience.
–     Byrne, B. M. & Stewart, S. M. (2006). The MACS approach to testing for multigroup invariance of a second-order structure: A walk through the process. Structural Equation Modeling, 13: 287-321.
–     Camilli, G. & Congdon, P. (1999). Application of a method of estimating DIF for polytomous test items. Journal of Educational and Behavioral Statistics 24: 323–341.
–     Camilli, G. & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage Publications.
–     Clauser, B. & Mazor, K. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1): 31–44.
–     Conoley, C. A. (2003). Differential item functioning in the Peabody Picture Vocabulary Test – Third Edition: Partial correlation versus Expert judgment. Unpublisheddoctoral dissertation, Texas A&M University, TX
–     Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are central issues. Psychological Bulletin, 95: 135-135.
–     Elder, C.; Mc Namara, T. & Congdon, P. (2003). Rasch techniques for detecting bias in performance tests: An example comparing the performance of native and non-native speakers on a test of academic English. Journal of Applied Measurement, 4:181–197.
–     Elosua, P. & Wells, C. S. (2013). Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression. Psicológica, 34: 327-342.
–     Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
–     Englehard, G.; Hansche, L. & Rutledge, K. E. (1990). Accuracy of bias review judges in identifying differential item functioning on teacher certification tests. Applied Measurement in Education, 3: 347–360.
–     Flowers, C. P.; Oshima, T. C. & Raju, N. S. (1999). A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23: 309–326.
–     Han, Kyung T. & Hambleton, Ronald K. (2007). User’s Manual for WinGen: Windows Software that Generates IRT Model Parameters and Item Responses. Center for Educational Assessment Research. Amherst, MA: University of Massachusetts, Center for Educational Assessment.
–     Harwell, M.; Stone, C. A.; Hsu, T. C & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20: 101-125.
–     Herrera A. N. (2005). Sample size effect and rate of sample sizes to detect differential item functioning, Doctoral thesis, university of Barcelona, Barcelona (Spain).
–     Hidalgo, M. D. & López-Pina, J. P. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel Haenszel procedures. Educational and Psychological Measurement, 64: 903–915.
–     Jodoin, M. G. & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14: 329–349.
–     Lord, F. M. & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
–     Millsap, R. E (2011). Statistical Approaches to Measurement Invariance. New York: NY, Routledge
–     Narayanan, P. & Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20: 257-274.
–     Parshall, C. G. & Miller, T. R. (1995). Exact versus asymptotic Mantel-Haenszel DIF statistics. Journal of Educational Measurement, 32 (3): 302–316.
–     Penfield, R. D. & Algina, J. (2003). Applying the Liu–Agresti estimator of the cumulative common odds ratio to DIF detection in polytomous items. Journal of Educational Measurement, 40: 353–370.
–     Raju, N. S.; Laffitte, L. J. & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on Confirmatory Factor Analysis and item response theory. Journal of Applied Psychology, 87: 517–529.
–     Reise, S. P.; Widaman, K. F.  & Pugh, R. H. (1993). Confirmatory Factor Analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114: 552-566.
–     Rogers, H. J. & Swaminathan, H. (1993). A comparison of the logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement. 17: 105–116.
–     Santana, A. C. (2009). Effect of the ratio of sample sizes to detect differential items dunctioning through logistic regression procedure, Master thesis, National University of Colombia, Bogotá (Colombia).
–     Shealy, R. T.; Stout, W. F. (1993). A model based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58: 197–239.
–     Spence, I. (1993). Monte Carlo simulation studies. Applied Psychological Measurement, 7: 405-425
–     Swaminathan, H. & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27: 361– 370.
–     Su, Y. -H. & Wang, W. C. (2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18: 313–350.
–     Vandenberg, R. J. (2002). Toward a further understanding of and improvement in measurement invariance methods and procedures. Organizational Research Methods, 5: 139–158.
–     Van der Linden, W. J. & Hambleton, R. K. (1997). Handbook of modern item response theory. New York: Springer-Verlag.
–     Zumbo, B. D. (1999). A Handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
–     Zwick, R.; Thayer, D. T. & Lewis, C. (1999). An empirical Bayes approach to Mantel–Haenszel DIF analysis. Journal of Educational Measurement, 36: 1–28.