Comparing Methods of Determining Test Factor Structure Using Empirical data: The Case of National Entrance Exam in 2016

Document Type : Original Article

Author

Abstract

Objective: The present study aimed to compare the dimensionality assessment methods using National Entrance Exam data and determine the number of dimensions in the exam’s data.Methods: The data from mathematics (mathematics group), chemistry (experimental sciences group) and Philosophy-logic (humanities group) sub-tests of the National Entrance Exam in 2016 AD (1395 solar) were used for analysis.Results: Analysis based on 11 methods resulted in 34 related indices and graphical methods, such as hierarchical cluster analysis, exploratory graph analysis and heat map revealed that different methods, depending on their nature, resulted in general factors, specific factors, and a cluster of items. Results showed that the required uni-dimensionality did not exist in most cases, and the structure of the specialized national exam in 2016 was bi-factorial. The only difference was that the resulting bi-factor structure did not match the specifications of the previous bi-factor model (i.e., a general factor and several specific factors unrelated to each other and the general factor, so that each item is merely related to one specific factor in addition to the general factor). In other words, besides correlating with the general factor, each item is related to more than one specific factor whose result was a complex or a relatively complex structure. Factor analysis of the total data and nonlinear factor analysis revealed that a gradual increase in lower asymptote reduced the number of dimensions. Conclusion: It is recommended to apply a combination of methods to find the dimensions of the National Entrance Exam. In addition, the extent of general factor saturation, reflected in item correlations, considering lower asymptote, the way of dealing with omitted responses in analysis, and comparing results of all data with complete data (data without missing values) can be useful for dimensionality assessment. Furthermore, researchers should consider checking the fit of the models extracted from different explanatory methods by confirmatory factor analysis and the interpretation of the extracted model.

Keywords


Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education7(4), 255-278.
Ackerman, T. A. (1996). Graphical representation of multidmensional item response theory analyses. Applied Psychological Measurement, 20, 311–329.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological assessment, 7(3), 309.
Crutzen, R., & Peters, G. J. Y. (2017). Scale quality: alpha is an inadequate estimate and factor-analytic evidence is needed first of all. Health psychology review11(3), 242-247.
Cucina, J., & Byle, K. (2017). The bifactor model fits better than the higher-order model in more than 90% of comparisons for mental abilities test batteries. Journal of Intelligence, 5(3), 27.
DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International Journal of Testing, 13(4), 354-378.
Feißt, M., Hennigs, A., Heil, J., Moosbrugger, H., Kelava, A., Stolpner, I., ... & Rauch, G. (2019). Refining scores based on patient reported outcomes–statistical and medical perspectives. BMC medical research methodology19(1), 1-9.
Fraser, C., & McDonald, R. P. (2003). NOHARM version 3.0: A windows program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory computer program.
Gefen, D. (2003). Assessing unidimensionality through LISREL: An explanation and an example. Communications of the Association for Information Systems, 12(1), 2.
Green, S. B., & Hershberger, S. L. (2000). Correlated errors in true score models and their effect on coefficient alpha. Structural equation modeling, 7(2), 251-270.
Hambleton, R.K.(1989). Principles and selected applications of item response theory. In R. Linn(editor). Educational Measurement (3rd edition) (pp. 147-200) New York: Macmillan.
Hattie, J. (1985). Methodology review: assessing unidimensionality of tests and ltenls. Applied Psychological Measurement, 9(2), 139-164.
Horn, J. L., & Engstrom, R. (1979). Cattell’s scree test in relation to Bartlett’s chi-square test and other observations on the number of factors problem. Multivariate Behavioral Research, 14(3), 283-300.
Hudson F. Golino (2019). EGA: Exploratory Graph Analysis: Estimating the number of dimensions in psychological data. R package version  0.4.
Izanloo, B.(2019). Optimum methods of Determining underlying dimensions of high-stakes' university entrance exams. Research project reporte approved by National Organization of Education Testing (NOET) (in persion).
Jiao, H. (2004). Evaluating the Dimensionality of the Michigan English Language  Assessment Battery. Spaan Fellow Working Papers in Second or Foreign Language Assessment: Volume 2. University of Michigan, Ann Arbor, MI.
Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User's reference guide. Scientific Software International.
Levy, R., & Svetina, D. (2011). A generalized dimensionality discrepancy measure for dimensionality assessment in multidimensional item response theory. British Journal of Mathematical and Statistical Psychology, 64(2), 208-232.
Lord, F. M., & Novick, M. R. (1968). statistical theories of mental test scores: Addison Wesley.
McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of mathematical and statistical psychology, 34(1), 100-117.
McNemar, Q. (1946). Opinion-attitude methodology. Psychological Bulletin,43, 289-374.
Muthén, L. K., & Muthén, B. O. (2015). Mplus (Version 7.4). Los Angeles, CA.
Nandakumar, R., & Ackerman, T. A. (2004). Test modeling. The sage handbook of quantitative methodology for the social sciences, 93-105.
Pearson, R., Mundfrom, D., & Piccone, A. (2013). A comparison of ten methods for determining the number of factors in exploratory factor analysis. Multiple Linear Regression Viewpoints, 39(1), 1-15.
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.  URL https://www.R-project.org/.
Raykov, T. (2001). Bias of coefficient afor fixed congeneric measures with correlated errors. Applied psychological measurement, 25(1), 69-76.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21(1), 25-36.
Reese, L, M. (1999)A Classical Test Theory Perspective on LSAT Local Item Dependence, LSAC Research Report Series, Statistical Report.
Revelle, W. (2017). How To Use the psych package for Factor Analysis and data reduction.
Revelle, W. (2018) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.8.12.
Rose, N., von Davier, M., & Xu, X. (2010). Modeling nonignorable missing data with item response theory (IRT). ETS Research Report Series, 2010(1), i-53.
Roussos, L. A., & Stout, W. (2007). Dimpack. Version 1.0. The Roussos-Stout Software.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological assessment, 8(4), 350.
Segars, A. H. (1997). Assessing the unidimensionality of measurement: A paradigm and illustration within the context of information systems research. Omega, 25(1), 107-121.
Stellefson, M., & Hanik, B. (2008). Strategies for Determining the Number of Factors to Retain in Exploratory Factor Analysis. Online Submission.
Stout, W. F. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52(4), 589-617.
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55(2), 293-325.
Svetina, D., & Levy, R. (2012). An Overview of Software for Conducting Dimensionality Assessment in Multidimensional Models. Applied Psychological Measurement, 36(8), 659-669. doi: 10.1177/0146621612454593
Svetina, D., & Levy, R. (2012). An overview of software for conducting dimensionality assessment in multidimensional models. Applied Psychological Measurement, 36(8), 659-669.
Svetina, D., & Levy, R. (2014). A framework for dimensionality assessment for multidimensional item response models. Educational Assessment, 19(1), 35-57.
Svetina, D., & Levy, R. (2016). Dimensionality in compensatory MIRT when complex structure exists: Evaluation of DETECT and NOHARM. The Journal of Experimental Education, 84(2), 398-420.
Tate, R. (2003). A comparison of selected empirical methods for assessing theStructure of responses to test items. Applied Psychological Measurement,27, 159-203.
Thurstone, L. L. (1931). The measurement of social attitudes. The Journal of Abnormal and Social Psychology, 26(3), 249.
Vehkalahti, K; Puntanen, S & Tarkkonen, L .(2009). Implications of dimensionality on measurement reliability. Statistical Inference, Econometric Analysis and Matrix Algebra. Festschrift in Honour of Götz Trenkler (Bernhard Schipp, Walter Krämer, eds.), Physica-Verlag/Springer, 143-160.
Zimmerman, W.D & William, R. H (1980). is Classical test theory “Robust” Under Violation of the Assumption of Uncorrelated Errors? Canadian Journal of Psychology, Vol 34(3).