The Analysis of Dimensionality, Testlet Effect and Differential Item Functioning and Impact in Testlet-Based Tests

Document Type : Original Article



The present study was conducted to investigate the impact of monolinguality-bilinguality of Iranian students on dimensionality, local item dependence and differential item functioning and impact of questions included in the passages of PIRLS (2011). The dimensionality was analyzed through comparing the one-dimensional graded response model and the multidimensional bi-factor item-response theory model. Next, the local item dependence and impact was analyzed using the two-level bi-factor model and, finally, the differential item functioning was examined using a multiple-group bi-factor model used by Cai et al. (2011).
The results of the dimensionality showed that the bi-factor model better fitted to the data than the graded response model. Furthermore, it was found that the local item dependence between two literal trait questions caused deviation from the one-dimensionality and that the linguistic difference could explain a majority of its variance. The Testlet results also showed that the average estimation of monolinguals’ abilities was higher than bilinguals’ abilities. Besides, it was indicated that items embedded with uniform differential item functioning were more difficult for monolinguals than bilinguals but monolinguals outperformed bilinguals in multiple choice items embedded with non-uniform differential item functioning.
Overall, the results showed that the traits related to the two literal Testlet were differently­ perceived among monolingual and bilingual students and local item dependence was more evident among bilinguals than monolinguals. Also, the results indicated a difference between the performance of monolingual and bilingual students in the mixed items format.


Alper Kose, I., & Demirtasli, N. C. (2012). Comparison of unidimensional and multidimensional models based on item response theory in terms of both variables of test length and  sample size. Procedia - Social & Behavioral Sciences, 46, 135 – 140. https://doi:10.1016/j.sbspro.2012.05.082
Baker, C. (2011). Foundations of bilingual education and bilingualism (5th ed.). Bristol: Multilingual Matters.
Beretvas, S. N. & Walker, C. M. (2012). Distinguishing differential testlet functioning from differential bundle functioning using the multilevel measurement model. Educational & Psychological Measurement, 2 (2), 200–223. https://doi:10.1177/0013164411412768.
Beretvas, S. N.; Cawthon, S. W.; Lockhart, L. L. &  Kaye, A. D. (2012). Assessing Impact, DIF, and DFF in Accommodated Item Scores: A Comparison of Multilevel Measurement Model Parameterizations. Educational & Psychological Measurement, 72 (5).
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods—Advanced quantitative techniques in the social sciences. Newbury Park, CA: SAGE.
Cai, L.; Yang, J. S. & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248. https://doi:10.1037/a0023350.
Chen, T. T. & Fienberg, S. E. (1974). Two-Dimensional Contingency Tables with Both Completely and Partially Cross-Classified Data. Biometrics, 30 (4), 629 – 642.
DeMars, C. E. (2006). Application of the bi-Factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43 (2), 145–168.  https://doi:10.1111/j.1745-3984.2006.00010.x
DeMars, C. E. (2013). A Tutorial on interpreting bifactor model scores. International Journal of Testing, 13 (4), 354-378. https://doi:10.1080/15305058.2013.799067       
Duan, J. C.; Hardle, W. K. & Gentle, J. E. (2012). Handbook of computational finance. Springer Heidelberg dordrecht London new york
Elosua Oliden, P. & Mujika Lizaso, J. (2014). Impact of family language and testing language on reading performance in a bilingual educational context. Psicothema, 26 (3),328-335. https://doi: 10.7334/psicothema2013.344
Fukuhara, H. (2009). A differential item functioning model for testlet-based items using bi-factor multidimensional item response theory model: a baysian approach, electronic thesis. treatises and dissertations, florida state university libraries.
Gibbons, R. D. & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57 (3), 423-436.
Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications, Inc.
Holzmüller , H. H.;  Singh, J. &  Nijssen, E. J. (2002). Multicetric cross-national research:  A typology and illustration. Retrieved Februray 19, 2015, from
Kim, S. & Kolen, M. J. (2006). Robustness of Format Effects of IRT Linking Methods for Mixed Format Tests. Applied Measurement in Education, 19 (4), 357-381.
Lee, Y.-W. (2004). Examining passage-related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test, Language Testing (Vol. 21, pp. 74-100). Princeton, NJ: Educational Testing Service.
Ling Ping, H. & Islam, M. A. (2008). Analyzing Incomplete Categorical Data: Revisiting Maximum Likelihood Estimation (Mle) Procedure. Journal of Modern Applied Statistical Methods, 7 (2), 488-500. https://doi:10.22237/jmasm/1225512780.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsboro, NJ: Erlbaum.
MD Desa, Z. N. D. (2012). Bi-factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification. Retrieved out 8, 2016, from /10126/MdDesa_ku_0099D_12360_DATA_1.pdf;sequence=1
Moore, D. White (2015). Unidimensioal vertical scaling of mixed format tests in the presence of item format effect, thesis and dissertation, University of Pittsburgh.
Morton, J. B. & Harper, S. N. (2007). What did Simon say? Revisiting the bilingual advantage. Developmental Science, 10, 719–726.
Mullis, I. V. S.; Martin, M. O.; Gonzalez, E. J., & Kennedy, A. M. (2003). PIRLS 2001 international report: IEAs study of reading literacy achievement in primary school in 35 countries. Retrieved on oct 23, 2017.
Mullis, I. V. S.; Martin, M. O.; Kennedy, A. M. & Foy, P. (2007). PIRLS 2001 international report: IEA’s progress in international reading literacy study in primary school in 40 countries. Retrieved on oct 23, 2017.
Mullis, I. V. S.; Martin, M. O.; Foy, P. & Drucker, K. T. (2012). PIRLS 2011 international result in reading, Retrieved on oct 23, 2017. from P11_IR_FullBook.pdf
Rauch, D. P. & Hartig, J. (2010). Multiple-choice versus open-ended   response formats of reading test items:  A two-dimensional IRT analysis. Psychological Test and Assessment Modeling, 52 (4), 354-379.
Rijmen, F. (2011). Hierarchical factor item response theory  models for PIRLS: capturing clustering effects at multiple levels, IERI monograph series: issues and methodologies in large-scale assessments, 4, 59-74.
Ravand, H. (2015). Assessing Testlet Effect, Impact Differential Testlet, and Item Functioning Using Cross-Classified Multilevel Measurement Modeling. SAGE Open 5, 1-9. https://doi: 10.1177/2158244015585607.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28 (3), 237-247.
Smarter balanced assessment Consortium: 2013-2014 technical report (2015) Validity, item and test development,pilot test and field test, achievement level setting.   Retrieved Februray  18,2015,  from /wpcontent/uploads/2015/08/201314_Technical_Report.pdf
Syahabuddin, K. (2013). Student English Achievement,  Attitude and Behaviour in Bilingual and Monolingual Schools in Aceh, Indonesia. School of Education, Faculty of Education and Arts Edith Cowan University, Perth, Western Australia
Tao, W. (2008). Using the score-based testlet method to handle local item dependence. electronic thesis and dissertation, Boston college.
Thissen, D.; Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26 (3), 247-260.
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. The Journal of Educational Psychology, 16 (7), 433-451.
van de Vijver, F. J. R., & Leung, K. (1997). Cross-cultural psychology series, Vol. 1. Methods and data analysis for cross-cultural research. Thousand Oaks, CA, US: Sage Publications, Inc.
Wainer, H. & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6 (2), 103-118.
Wang, X.; Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlet: Theory and applications. Applied Psychological Measurement, 26 (1), 109-128.
Wei, L. (2010). BAMFLA: Issues, methods and directions. International Journal of Bilingualism, 14 (1), 3-9
Yao, C. (2008). Mixed-format test equating: effects of test dimension and common-item sets. Unpublished Doctoral Dissertation, University of Maryland, College Park.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30 (3), 187-213.
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under compet-ing measurement models. Language Testing,  27, 119-140. doi:10.1177/0265532209347363.
Zhang, O.; Shen, L., & Cannady, M.(2010, April). Polytomous IRT or testlet model: An evaluation of scoring models in small testlet size situations. Paper presented at the Annual Meeting of the 15th International Objective Measurement Workshop, Boulder, CO, USA.
Zenisky, A. L., Hambleton, Z. R. K. and Sireci, S.G. (2003). Effects of local item dependence on the validity of IRT item, Test, and Ability statistics. Retrieved out 8, 2016, from 134.5458&rep=rep1&type=pdf