Alper Kose, I., & Demirtasli, N. C. (2012). Comparison of unidimensional and multidimensional models based on item response theory in terms of both variables of test length and sample size. Procedia - Social & Behavioral Sciences, 46, 135 – 140. https://doi:10.1016/j.sbspro.2012.05.082
Baker, C. (2011). Foundations of bilingual education and bilingualism (5th ed.). Bristol: Multilingual Matters.
Beretvas, S. N. & Walker, C. M. (2012). Distinguishing differential testlet functioning from differential bundle functioning using the multilevel measurement model. Educational & Psychological Measurement, 2 (2), 200–223. https://doi:10.1177/0013164411412768.
Beretvas, S. N.; Cawthon, S. W.; Lockhart, L. L. & Kaye, A. D. (2012). Assessing Impact, DIF, and DFF in Accommodated Item Scores: A Comparison of Multilevel Measurement Model Parameterizations. Educational & Psychological Measurement, 72 (5).
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods—Advanced quantitative techniques in the social sciences. Newbury Park, CA: SAGE.
Cai, L.; Yang, J. S. & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248. https://doi:10.1037/a0023350.
Chen, T. T. & Fienberg, S. E. (1974). Two-Dimensional Contingency Tables with Both Completely and Partially Cross-Classified Data.
Biometrics, 30 (4), 629 – 642.
DeMars, C. E. (2006). Application of the bi-Factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43 (2), 145–168. https://doi:10.1111/j.1745-3984.2006.00010.x
DeMars, C. E. (2013). A Tutorial on interpreting bifactor model scores. International Journal of Testing, 13 (4), 354-378. https://doi:10.1080/15305058.2013.799067
Duan, J. C.; Hardle, W. K. & Gentle, J. E. (2012). Handbook of computational finance. Springer Heidelberg dordrecht London new york
Elosua Oliden, P. & Mujika Lizaso, J. (2014). Impact of family language and testing language on reading performance in a bilingual educational context. Psicothema, 26 (3),328-335. https://doi: 10.7334/psicothema2013.344
Fukuhara, H. (2009). A differential item functioning model for testlet-based items using bi-factor multidimensional item response theory model: a baysian approach, electronic thesis. treatises and dissertations, florida state university libraries.
Gibbons, R. D. & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57 (3), 423-436.
Hambleton, R. K., Swaminathan, H., & Rogers, J. H. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications, Inc.
Kim, S. & Kolen, M. J. (2006). Robustness of Format Effects of IRT Linking Methods for Mixed Format Tests. Applied Measurement in Education, 19 (4), 357-381.
Lee, Y.-W. (2004). Examining passage-related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test, Language Testing (Vol. 21, pp. 74-100). Princeton, NJ: Educational Testing Service.
Ling Ping, H. & Islam, M. A. (2008). Analyzing Incomplete Categorical Data: Revisiting Maximum Likelihood Estimation (Mle) Procedure. Journal of Modern Applied Statistical Methods, 7 (2), 488-500. https://doi:10.22237/jmasm/1225512780.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsboro, NJ: Erlbaum.
MD Desa, Z. N. D. (2012).
Bi-factor Multidimensional Item Response Theory Modeling for Subscores Estimation, Reliability, and Classification. Retrieved out 8, 2016, from
https://kuscholarworks.ku.edu/bitstream/handle/1808 /10126/MdDesa_ku_0099D_12360_DATA_1.pdf;sequence=1
Moore, D. White (2015). Unidimensioal vertical scaling of mixed format tests in the presence of item format effect, thesis and dissertation, University of Pittsburgh.
Morton, J. B. & Harper, S. N. (2007). What did Simon say? Revisiting the bilingual advantage. Developmental Science, 10, 719–726.
Mullis, I. V. S.; Martin, M. O.; Gonzalez, E. J., & Kennedy, A. M. (2003). PIRLS 2001 international report: IEA’s study of reading literacy achievement in primary school in 35 countries. Retrieved on oct 23, 2017.
Mullis, I. V. S.; Martin, M. O.; Kennedy, A. M. & Foy, P. (2007). PIRLS 2001 international report: IEA’s progress in international reading literacy study in primary school in 40 countries. Retrieved on oct 23, 2017.
Rauch, D. P. & Hartig, J. (2010). Multiple-choice versus open-ended response formats of reading test items: A two-dimensional IRT analysis. Psychological Test and Assessment Modeling, 52 (4), 354-379.
Rijmen, F. (2011). Hierarchical factor item response theory models for PIRLS: capturing clustering effects at multiple levels, IERI monograph series: issues and methodologies in large-scale assessments, 4, 59-74.
Ravand, H. (2015). Assessing Testlet Effect, Impact Differential Testlet, and Item Functioning Using Cross-Classified Multilevel Measurement Modeling. SAGE Open 5, 1-9. https://doi: 10.1177/2158244015585607.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28 (3), 237-247.
Smarter balanced assessment Consortium: 2013-2014 technical report (2015)
Validity, item and test development,pilot test and field test, achievement level setting. Retrieved Februray 18,2015, from
http://www.smarterbalanced.org /wpcontent/uploads/2015/08/201314_Technical_Report.pdf
Syahabuddin, K. (2013). Student English Achievement, Attitude and Behaviour in Bilingual and Monolingual Schools in Aceh, Indonesia. School of Education, Faculty of Education and Arts Edith Cowan University, Perth, Western Australia
Tao, W. (2008). Using the score-based testlet method to handle local item dependence. electronic thesis and dissertation, Boston college.
Thissen, D.; Steinberg, L., & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26 (3), 247-260.
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. The Journal of Educational Psychology, 16 (7), 433-451.
van de Vijver, F. J. R., & Leung, K. (1997). Cross-cultural psychology series, Vol. 1. Methods and data analysis for cross-cultural research. Thousand Oaks, CA, US: Sage Publications, Inc.
Wainer, H. & Thissen, D. (1993). Combining multiple-choice and constructed-response test scores: Toward a Marxist theory of test construction. Applied Measurement in Education, 6 (2), 103-118.
Wang, X.; Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlet: Theory and applications. Applied Psychological Measurement, 26 (1), 109-128.
Wei, L. (2010). BAMFLA: Issues, methods and directions. International Journal of Bilingualism, 14 (1), 3-9
Yao, C. (2008). Mixed-format test equating: effects of test dimension and common-item sets. Unpublished Doctoral Dissertation, University of Maryland, College Park.
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30 (3), 187-213.
Zhang, B. (2010). Assessing the accuracy and consistency of language proficiency classification under compet-ing measurement models. Language Testing, 27, 119-140. doi:10.1177/0265532209347363.
Zhang, O.; Shen, L., & Cannady, M.(2010, April). Polytomous IRT or testlet model: An evaluation of scoring models in small testlet size situations. Paper presented at the Annual Meeting of the 15th International Objective Measurement Workshop, Boulder, CO, USA.