Differential Item Functioning (DIF) in Mathematics Test Using Mantel - Hsaenzel and Item – Response Theory

Document Type : Original Article



Abstract: Tests should be fair for all individuals of any gender, race, age, and socioeconomic status. Accordingly, the assessment of item bias and differential item function (DIF) are very important. In this study, the gender differential item function has been studied using Mantel - Haenszel and item response theory (IRT). First, the theoretical framework and evaluation methods of DIF are introduced. Then in order to offer a practical application of the method, responses of a stratified random sample of 4000 subjects Consisting of  2200 males and 1800 females to 55 items of Mathematics test of National University Entrance Exam has been analyzed. The results Reveals the presence of gender DIF on the test items. In assessing DIF using Mantel – Haenszel Methods, 23 items had significant Mantel - Haesnzel index. Based on IRT 9 items had gender DIF, which all of them were in favor of females. The content investigation of the items with gender DIF indicated that most items with DIF in favor of females are in content domain of functions and equations while items with DIF in favor of males are in field of trigonometry, geometry and probability.


-     امبرتسون، سوزان ای و رایس، استیون پی (2000). نظریه‌های جدید روانسنجی برای روانشناسان (IRT). ترجمه: دکتر حسن پاشا شریفی، دکتر ولیالـله فرزاد، مجتبی حبیبی عسگرآباد و بلال ایزانلو (1388). تهران: انتشارات رشد.
-     همبلتون، رونالد ک، سوامینانان، اچ و راجرز، جین. (1991). مبانی نظریه پرسش- پاسخ. ترجمه: دکتر محمدرضا فلسفینژاد (1389). تهران: انتشارات دانشگاه علامه طباطبایی.
-     Abedalaziz, Nabeel. (2010). A gender - related differential item functioning of mathematics test item. The International Journal of Educational and Psychological Assessment, 5,101-116.
-     Ahmadi, Alireza. (2008). Differential Item Functioning in High-stakes Tests: the Effect of Gender and Field of Study. Doctoral dissertation, University of Isfahan, Faculty of Foreign Languages, Department of English.
-     Baghi, Heibatollah & Ferrara, Steven. (1989). A comparison of IRT, Delta Plot and Mantel- Haenszel techniques for detecting differential item functioning across subpopulations in the Maryland test of citizenship skills. Paper presented at the annual meeting of the American Educational Research Association. San Francisco, CA: March 27-31 (Eric database, Report: ED324364).
-     Conoley, C. Adele (2003). Differential item functioning in the Peabody picture vocabulary test - third edition: partial correlation versus expert judgment. Doctoral dissertation. Texas A&M University.
-     Dorans, Neil J. & Holland,Paul W. (1992). DIF Detection and Description: Maentel - Haenszel and Standardization. Paper presented at the Educational Testing Service/AFHRL Conference (Princeton. NJ. October)
-     Driana, Elin. (2007). Gender item functioning on a ninth- grade mathematics proficiency test in Appalachian Ohio. Doctoral dissertation, Ohio University, Ohio.
-     Duncan, Cromwell, Susan. (2006). improving the prediction of differential item functioning: A comparison of the use of an effect size for logistic regression DIF and Mantel- Haenszel DIF methods. Doctoral dissertation, Texas A&M University.
-     Durand, Jeffrey & Park, Siwo. (2006). A Study of Gender and Academic Major - Based Differential Item Functioning (DIF) In KEPT 2006, Mexico.
-     Eng, L. S., & Hoe, L. S. (2005). Detecting Differential Item Functioning (DIF) in Standardized Multiple-Choice Test: An Application of Item Response Theory (IRT). [ONLINE] Available at:http://www.ipbl.edu.my/inter/penyelidikan/seminarpapers/2005/linguitm. pdf.
-     Hambleton, R., & Rodgers, J. (1995). Item bias review. Practical Assessment, Research, and valuation, 4(6). Retrieved November 18, 2006, from http://PAREonline. Net/getvn. asp?v=4&n=6
-     Hambleton، R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.)، Educational measurement (3rd ed.، pp. 147-200). New York NY: American Council on Education & Macmillan Publishing.
-     Landis, J. R. , Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics 33:159-174.
-     Näsström, Gunilla. (2003). Differential item functioning for items in the Swedish National test in mathematics, course B. Paper presented at the Pre-ICME Conference in Växjö.
-     O'Neal, Marcia R. (1991). A Comparison of Method for Detecting Item Bias. Paper presented at the annual meeting of the Mid- South Educational Research Association (20th, Lexington, KY, November 12-15.
-     Oshima, T. C. and Morris, S. B. (2008), Raju's Differential Functioning of Items and Tests (DFIT). Educational Measurement: Issues and Practice, 27: 43–50. doi: 10. 1111/j. 1745-3992. 2008. 00127.
-     Rousseau, M. , Bertrand, R. , & Boiteau, N. (2004). Impact of missing data on robustness of DIF IRT-based and Non-IRT-based methods. Paper presented at the 2004 AERA annual meeting.
-     Shultz, S. Kenneth & Whitney, David. (2005). Measurement Theory in Action. Case Studies and Exercises. Sage publication.
-     Willingham, W. W. & Cole, N. S; (1997). Gender and fair asssessment. New Jersey, U. S. A: Lawrence Erbaum associate
-     Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-like (ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation.
-     Zwick, Rebecca (2012). A Review of ETS Differential Item Functioning Assessment Procedures: Flagging Rules, Minimum Sample Size Requirements, and Criterion Refinement. [ONLINE] Available at: http://www. ets.org/Media/Research/pdf/RR-12-08. pdf.