مقایسه روش های تعیین ساختار عاملی آزمون براساس داده های تجربی: مورد آزمون سراسری ورود به دانشگاه 1395

نوع مقاله : مقاله پژوهشی

نویسنده

استادیار دانشکده روان‌شناسی و علوم تربیتی دانشگاه خوارزمی

چکیده

هدف: هدف این پژوهش مقایسه روش­های تعیین تعداد ابعاد براساس داده­های آزمون­های سراسری و تعیین تعداد ابعاد موجود در این داده­هاست.روش پژوهش: پس از بررسی پیشینه نظری و تجربی، از داده­های آزمون­های ریاضی (گروه ریاضی)؛ شیمی (گروه تجربی)؛ و فلسفه-منطق (گروه انسانی) سال 1395 برای تحلیل استفاده شد.یافته‌ها: تحلیل داده­ها با یازده روش تحلیل ابعاد، 34 شاخص مبتنی بر این روش­ها و نیز روش­های نموداری تحلیل خوشه سلسله مراتبی، تحلیل شبکه اکتشافی و نقشه حرارتی نشان داد که روش­های مختلف بسته به ماهیت، عامل یا عوامل کلی، اختصاصی و یا خوشه­های سوال موجود در آزمون­ها را منعکس می­کنند. نتایج حاصل از تحلیل­ها نشان داد تک­بعدی بودن ضروری (essential unidimensionality) در اکثر موارد به معنی دقیق آن برقرار نیست و دست­کم در آزمون­های تخصصی سراسری سال 1395 ساختار از نوع عامل دوگانه (bifactor) است. با این تفاوت که ساختار دوگانه حاصل با مشخصات مدل دوگانه موجود در پیشینه همخوانی ندارد. یعنی هر سوال علاوه بر عامل کلی با بیش از یک عامل اختصاصی ارتباط دارد که به ساختار پیچیده یا نسبتا پیچیده منجر می­شود.نتیجه‌گیری: براساس نتایج تحلیل عاملی کل داده­ها و تحلیل عاملی غیرخطی افزایش تدریجی مجانب پایین باعث کاهش تعداد ابعاد می­شود. پیشنهاد می­شود برای تعیین ابعاد آزمون­های سراسری از ترکیب چند روش استفاده شود. به علاوه به هنگام تحلیل میزان اشباع عامل کلی، که در میزان همبستگی بین سوال­ها منعکس می­شود، لحاظ­کردن مجانب پایین، نوع برخورد با پاسخ­های سفید و مقایسه نتایج حاصل از کل داده­ها با داده­های کامل (داده­های بدون پاسخ سفید) می­تواند در این خصوص مفید باشد. به علاوه میزان برازش عامل­های حاصل از روش­های مختلف اکتشافی به داده­ها با استفاده از روش­های تاییدی نیز بررسی شده و در نهایت به تفسیرپذیری مدل حاصل نیز توجه گردد.

کلیدواژه‌ها


عنوان مقاله [English]

Comparing Methods of Determining Test Factor Structure Using Empirical data: The Case of National Entrance Exam in 2016

نویسنده [English]

  • Balal Izanloo
چکیده [English]

Objective: The present study aimed to compare the dimensionality assessment methods using National Entrance Exam data and determine the number of dimensions in the exam’s data.Methods: The data from mathematics (mathematics group), chemistry (experimental sciences group) and Philosophy-logic (humanities group) sub-tests of the National Entrance Exam in 2016 AD (1395 solar) were used for analysis.Results: Analysis based on 11 methods resulted in 34 related indices and graphical methods, such as hierarchical cluster analysis, exploratory graph analysis and heat map revealed that different methods, depending on their nature, resulted in general factors, specific factors, and a cluster of items. Results showed that the required uni-dimensionality did not exist in most cases, and the structure of the specialized national exam in 2016 was bi-factorial. The only difference was that the resulting bi-factor structure did not match the specifications of the previous bi-factor model (i.e., a general factor and several specific factors unrelated to each other and the general factor, so that each item is merely related to one specific factor in addition to the general factor). In other words, besides correlating with the general factor, each item is related to more than one specific factor whose result was a complex or a relatively complex structure. Factor analysis of the total data and nonlinear factor analysis revealed that a gradual increase in lower asymptote reduced the number of dimensions. Conclusion: It is recommended to apply a combination of methods to find the dimensions of the National Entrance Exam. In addition, the extent of general factor saturation, reflected in item correlations, considering lower asymptote, the way of dealing with omitted responses in analysis, and comparing results of all data with complete data (data without missing values) can be useful for dimensionality assessment. Furthermore, researchers should consider checking the fit of the models extracted from different explanatory methods by confirmatory factor analysis and the interpretation of the extracted model.

کلیدواژه‌ها [English]

  • Dimension
  • factor
  • construct
  • Factor Structure
  • Multiple Choice Test
  • binary data
Ackerman, T. A. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education7(4), 255-278.
Ackerman, T. A. (1996). Graphical representation of multidmensional item response theory analyses. Applied Psychological Measurement, 20, 311–329.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29.
Clark, L. A., & Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological assessment, 7(3), 309.
Crutzen, R., & Peters, G. J. Y. (2017). Scale quality: alpha is an inadequate estimate and factor-analytic evidence is needed first of all. Health psychology review11(3), 242-247.
Cucina, J., & Byle, K. (2017). The bifactor model fits better than the higher-order model in more than 90% of comparisons for mental abilities test batteries. Journal of Intelligence, 5(3), 27.
DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International Journal of Testing, 13(4), 354-378.
Feißt, M., Hennigs, A., Heil, J., Moosbrugger, H., Kelava, A., Stolpner, I., ... & Rauch, G. (2019). Refining scores based on patient reported outcomes–statistical and medical perspectives. BMC medical research methodology19(1), 1-9.
Fraser, C., & McDonald, R. P. (2003). NOHARM version 3.0: A windows program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory computer program.
Gefen, D. (2003). Assessing unidimensionality through LISREL: An explanation and an example. Communications of the Association for Information Systems, 12(1), 2.
Green, S. B., & Hershberger, S. L. (2000). Correlated errors in true score models and their effect on coefficient alpha. Structural equation modeling, 7(2), 251-270.
Hambleton, R.K.(1989). Principles and selected applications of item response theory. In R. Linn(editor). Educational Measurement (3rd edition) (pp. 147-200) New York: Macmillan.
Hattie, J. (1985). Methodology review: assessing unidimensionality of tests and ltenls. Applied Psychological Measurement, 9(2), 139-164.
Horn, J. L., & Engstrom, R. (1979). Cattell’s scree test in relation to Bartlett’s chi-square test and other observations on the number of factors problem. Multivariate Behavioral Research, 14(3), 283-300.
Hudson F. Golino (2019). EGA: Exploratory Graph Analysis: Estimating the number of dimensions in psychological data. R package version  0.4.
Izanloo, B.(2019). Optimum methods of Determining underlying dimensions of high-stakes' university entrance exams. Research project reporte approved by National Organization of Education Testing (NOET) (in persion).
Jiao, H. (2004). Evaluating the Dimensionality of the Michigan English Language  Assessment Battery. Spaan Fellow Working Papers in Second or Foreign Language Assessment: Volume 2. University of Michigan, Ann Arbor, MI.
Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User's reference guide. Scientific Software International.
Levy, R., & Svetina, D. (2011). A generalized dimensionality discrepancy measure for dimensionality assessment in multidimensional item response theory. British Journal of Mathematical and Statistical Psychology, 64(2), 208-232.
Lord, F. M., & Novick, M. R. (1968). statistical theories of mental test scores: Addison Wesley.
McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of mathematical and statistical psychology, 34(1), 100-117.
McNemar, Q. (1946). Opinion-attitude methodology. Psychological Bulletin,43, 289-374.
Muthén, L. K., & Muthén, B. O. (2015). Mplus (Version 7.4). Los Angeles, CA.
Nandakumar, R., & Ackerman, T. A. (2004). Test modeling. The sage handbook of quantitative methodology for the social sciences, 93-105.
Pearson, R., Mundfrom, D., & Piccone, A. (2013). A comparison of ten methods for determining the number of factors in exploratory factor analysis. Multiple Linear Regression Viewpoints, 39(1), 1-15.
R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.  URL https://www.R-project.org/.
Raykov, T. (2001). Bias of coefficient afor fixed congeneric measures with correlated errors. Applied psychological measurement, 25(1), 69-76.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21(1), 25-36.
Reese, L, M. (1999)A Classical Test Theory Perspective on LSAT Local Item Dependence, LSAC Research Report Series, Statistical Report.
Revelle, W. (2017). How To Use the psych package for Factor Analysis and data reduction.
Revelle, W. (2018) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.8.12.
Rose, N., von Davier, M., & Xu, X. (2010). Modeling nonignorable missing data with item response theory (IRT). ETS Research Report Series, 2010(1), i-53.
Roussos, L. A., & Stout, W. (2007). Dimpack. Version 1.0. The Roussos-Stout Software.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological assessment, 8(4), 350.
Segars, A. H. (1997). Assessing the unidimensionality of measurement: A paradigm and illustration within the context of information systems research. Omega, 25(1), 107-121.
Stellefson, M., & Hanik, B. (2008). Strategies for Determining the Number of Factors to Retain in Exploratory Factor Analysis. Online Submission.
Stout, W. F. (1987). A nonparametric approach for assessing latent trait unidimensionality. Psychometrika, 52(4), 589-617.
Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55(2), 293-325.
Svetina, D., & Levy, R. (2012). An Overview of Software for Conducting Dimensionality Assessment in Multidimensional Models. Applied Psychological Measurement, 36(8), 659-669. doi: 10.1177/0146621612454593
Svetina, D., & Levy, R. (2012). An overview of software for conducting dimensionality assessment in multidimensional models. Applied Psychological Measurement, 36(8), 659-669.
Svetina, D., & Levy, R. (2014). A framework for dimensionality assessment for multidimensional item response models. Educational Assessment, 19(1), 35-57.
Svetina, D., & Levy, R. (2016). Dimensionality in compensatory MIRT when complex structure exists: Evaluation of DETECT and NOHARM. The Journal of Experimental Education, 84(2), 398-420.
Tate, R. (2003). A comparison of selected empirical methods for assessing theStructure of responses to test items. Applied Psychological Measurement,27, 159-203.
Thurstone, L. L. (1931). The measurement of social attitudes. The Journal of Abnormal and Social Psychology, 26(3), 249.
Vehkalahti, K; Puntanen, S & Tarkkonen, L .(2009). Implications of dimensionality on measurement reliability. Statistical Inference, Econometric Analysis and Matrix Algebra. Festschrift in Honour of Götz Trenkler (Bernhard Schipp, Walter Krämer, eds.), Physica-Verlag/Springer, 143-160.
Zimmerman, W.D & William, R. H (1980). is Classical test theory “Robust” Under Violation of the Assumption of Uncorrelated Errors? Canadian Journal of Psychology, Vol 34(3).