طراحی و کاربرد روش سنجش انطباقی کامپیوتری برای اجرای آزمون تولیمو در سازمان سنجش آموزش کشور

نوع مقاله : مقاله پژوهشی

نویسنده

استادیار گروه روان‌شناسی بالینی، دانشکده‌ روان‌شناسی و علوم تربیتی، دانشگاه خوارزمی، تهران، ایران

10.22034/emes.2020.39884

چکیده

سنجش مهارت زبان انگلیسی در ارزیابی‌های خطیر به تعداد زیادی پرسش برای آزمون‌هایی به‌شیوه‌ مداد-کاغذی (P&P) نیاز دارد؛ زیرا هر سال افراد بسیاری در این نوع آزمون‌های سرنوشت‌ساز شرکت می‌کنند. هدف از اجرای این پژوهش، طراحی و کاربرد سنجش انطباقی کامپیوتری (CAT) به‌عنوان گزینه‌ای برای سنجش مهارت زبان انگلیسی در سازمان سنجش آموزش کشور بود. CAT برخلاف آزمون‌های سنتی P&P که توالی گزینش پرسش‌ها در آن ثابت و یکنواخت است، از یک شیوه‌ گزینش سؤال بهینه و انطباقی استفاده می‌کند. CAT، برآورد توانایی موقت را به‌طور بهینه مورد هدف قرار می‌دهد و به یک ملاک همگرایی مناسب برای برآورد توانایی می‌رسد و در نتیجه، به یک فرایند سنجش کوتاه‌تر، قابل اطمینان‌تر و کارآمدتر می‌رسد. مورد مطالعه مهارت زبان انگلیسی در مطالعه حاضر، آزمون تولیمو است. این پژوهش در دو مرحله اجرا شده است: در مرحله اول، نمونه‌ای از اجراهای مداد-کاغذی آزمون تولیمو (دوره‌ 114 تا 123)، انتخاب و سپس، سؤال‌ها و توانایی آزمودنی‌ها مدرج‌سازی شد. در مرحله‌ دوم، CATهای بهینه شبیه‌سازی شده به‌عنوان مبنایی برای ارزیابی صحت و کارایی CAT عملیاتی طراحی شدند. نتایج پژوهش نشان داد که برآورد پارامتر توانایی به روش بیشینه‌ درست‌نمایی و ملاک توقف طول ثابت، بیشترین دقت در برآورد پارامتر توانایی آزمودنی‌ها را ایجاد می‌کنند. همچنین، آزمون CAT تولیمو که بر اساس خزانه‌ سؤال بهینه شبیه‌سازی‌شدند، نسبت به CAT عملیاتی که بر اساس خزانه‌ سؤال موجود طراحی شده‌ است، به سؤال‌های کمتری نیاز دارد، درحالی‌که به نتایج دقیق‌تری نسبت به CAT عملیاتی در برآورد پارامتر توانایی منجر می‌شود. بنابراین، باوجود مناسب بودن سؤال‌های موجود در خزانه‌ سؤال مدرج‌سازی شده آزمون تولیمو، طراحی سؤال‌هایی برای هدف اجرای آزمون به شیوه‌ CAT، هم به‌صرفه‌تر است و هم دقیق‌تر پارامتر توانایی را برآورد می‌کند. همچنین، نتایج این مطالعه نشان داد که آزمون تولیمو هم به شیوه‌ CAT شبیه‌سازی شده و CAT عملیاتی طراحی ‌شده به شکلی کارآتر و دقیق‌تر نسبت به تولیمو مداد-کاغذی عمل می‌کند. یافته‌های این پژوهش، نشان می‌دهد که آزمون CAT تولیمو دارای پتانسیل بالایی در کارایی و دقت اندازه‌گیری توانایی زبان انگلیسی است.

کلیدواژه‌ها


عنوان مقاله [English]

Designing and Application of a Method Computerized Adaptive Testing for Implementation TOLIMO Test in the National Organization of Educational Testing

نویسنده [English]

  • Maryam moghadasin
چکیده [English]

Measuring of English Language proficiency in large-scale assessments normally requires a large number of test items and relies on paper-and-pencil (P&P) formats; because many people participate annually in this high stake type of tests. The purpose of this research is to design and apply a Computerized Adaptive Testing (CAT) as an alternative to assessing the English language proficiency in the National Organization of Educational Testing. Unlike the traditional P & P tests in which the sequence of selection of items in it is constant and uniform, CAT uses an optimal and adaptive item selection method.
CAT caters to optimally estimating temporary ability parameter and achieves a suitable convergence criterion for estimating ability, which results in a shorter, more reliable, and more efficient measurement process. In current study, the case study for examination of English language skill was the TOLIMO test. This research has been carried out in two phases. In the first stage, a sample of pencil-paper TOLIMO (period 114 to 123) was selected and then the items and subjects' ability were calibrated. In the second phase, simulated optimal CATs were designed as a basis for evaluating the efficiency of operational CAT. The results show that the ability parameter estimation method, the maximum likelihood and the fixed length of test as test termination criterion, give the most accuracy in the ability parameter estimation. Also, the simulated CAT TOLIMO test based on the optimum item pool needs less items than the designed CAT based on the available item pool, while it leads to obtain more accurate results in comparison to the operational CAT for ability parameter estimation. Therefore, despite the suitability of the items in the graded item pool of TOLIMO, the design of items to implement the examination with CAT method is so much better and more accurately, the ability parameter is estimated. The study also demonstrates that TOLIMO test in the form of simulated CAT and operational CAT can be more efficient and precise in the evaluation criteria than TOLIMO test in the form of P&P. The findings suggest that CAT has a great potential in efficiently and precisely measuring TOLIMO ability of English language.

کلیدواژه‌ها [English]

  • Computerized adaptive testing (CAT (
  • TOLIMO
  • Parameter ability estimation
  • Item pool
  • Content balance and item exposure
شریفی یگانه، نگار؛ فلسفی‌نژاد، محمدرضا؛ دلاور، علی؛ فرخی، نورعلی؛ و جمالی، احسان (1395). تعیین مقایسه‌پذیری برآورد پارامتر توانایی در سنجش انطباقی کامپیوتری و مداد-کاغذی. فصلنامه مطالعات اندازه‌گیری و ارزشیابی آموزشی، 6(14)، 234-203.
مقدسین، مریم (1395). تلفیق رویکرد ریکیسی و رویکرد برنامه‌نویسی ریاضی در طراحی خزانه‌های سؤال بهینه برای سنجش انطباقی کامپیوتری. فصلنامه اندازه‌گیری تربیتی،  7(26)، 149-197.‎
مقدسین، مریم؛ فلسفی‌نژاد، محمدرضا؛ دلاور، علی؛ جمالی، احسان؛ و فرخی، نورعلی. (1394). طراحی خزانه‌های سؤال بهینه برای سنجش انطباقی کامپیوتری با درنظر گرفتن امنیت آزمون. مطالعات اندازه‌گیری و ارزشیابی آموزشی، 5(10)، 178-133.
مینایی، اصغر؛ و فلسفی‌نژاد، محمدرضا (1389). روش‌های سنجش تک‌بعدی بودن سؤال‌ها در مدل دوارزشی IRT. فصلنامه اندازه‌گیری تربیتی، 1(3)، 71-100.
Babcock, B., & Weiss, D. J. (2009). Termination criteria in computerized adaptive tests: Variable-length CATs are not biased. In Proceedings of the 2009 GMAC conference on computerized adaptive testing (Vol. 14).
Barrada, J., Olea, J., Ponsada, V., Abad, F., Ponsoda, V., & Abad, F. J. (2009). Test overlap rate and item exposure rate as indicators of test security in CATs. In Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. Retrieved [date] from www. psych. umn. Edu/psylabs/CATCentral.
Bergstrom, B. A., & Lunz, M. E. (1999). CAT for certification and licensure. In F. Drasgow & J.Olson-Buchanan (Eds.), Innovations in Computerized Assessment (pp. 67-91). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Boyd, A. M., Dodd, B., & Fitzpatrick, S. (2013). A comparison of exposure control procedures in CAT systems based on different measurement models for testlets. Applied Measurement in Education, 26(2), 113-135.
Boyd, A. M., Dodd, B. G., & Choi, S. W. (2010). Polytomous models in computerized adaptive testing. Handbook of polytomous item response theory models, 229-255.
Boyd, A. M. (2003). Strategies for controlling testlet exposure rates in computerized adaptive testing systems.
Brooke, A., Kendrick, D., & Meeraus, A. (1988). GAMS: A user’s guide. Redwood City CA: The Scientific Press.
Chaimongkol, N., Pasiphol, S., & Kanjanawasee, S. (2016). Computerized Adaptive Testing with Reflective Feedback: A Conceptual Framework. Procedia-Social & Behavioral Sciences, 217, 806-812.
Chalhoub-Deville (Ed.). Issues in computer-adaptive testing of reading proficiency. Cambridge: University of Cambridge Local Examinations Syndicate.
Chang, H. H. (2004). Understanding computerized adaptive testing: From Robbins-Monro to Lord and beyond. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences. Thousand Oaks, CA: Sage.
Chang, H. H. (2014). Psychometrics behind computerized adaptive testing. Psychometrika. Published online Feb. 2014. DOI: 10.1007/S11336- 014-9401-5.
Chang, H. H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80, 1-20.
Chang, S. W., & Twu, B. Y. (1998). A Comparative Study of Item Exposure Control Methods in Computerized Adaptive Testing.
Chang, Y. C. I., & Ying, Z. (2004). Sequential estimation in variable length computerized adaptive testing. Journal of Statistical Planning and Inference, 121(2), 249-264.
Chang, H. H., & Ying, Z. (1999). Alpha-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222.
Chang, H. H., & van der Linden, W. J. (2003). Optimal stratification of item pools in a-stratified computerized adaptive testing. Applied Psychological Measurement, 27, 262-274.
Cheng, Y., & Chang, H. (2009). The maximum priority index method for severely con- strained item selection in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 62, 369-383.
Chen, S. Y., Ankenmann, R. D., & Spray, J. A. (1999). Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (No. ACT-RR-99-5): American College Testing Program, Iowa City, IA.
Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational & Behavioral Statistics, 22(3), 265-289.
Choi, S. W., Grady, M. W., & Dodd, B. G. (2011). A new stopping rule for computerized adaptive testing. Educational & Psychological Measurement, 71(1), 37-53.
Christensen, K. B., Makransky, G., & Horton, M. (2017). Critical values for Yen’s Q 3: identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement, 41(3), 178-194.
CITO (1999). WISCAT. Een computergestuurd toetspakket voor rekenen en wiskunde. [Mathcat: A computerized test package for arithmetic and mathematics]. CITO: Arnhem.
CITO (2002). NT2cat. Een computergestuurd toetspakket voor Nederlands als tweede taal. [DSLcat. A computerized test package for Dutch as a Second Language]. CITO: Arnhem.
CITO (2008). TURCAT. Een computergestuurd toetspakket voor Turks als tweede taal. [TURCAT. A computerized test package for Turkish as a Second Language]. CITO: Arnhem.
Davey, T., & Nering, M. (2002). Controlling item exposure and maintaining item security. Computer-based testing: Building the foundation for future assessments, 165-191.
Davis, L. L. (2002). Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items. Doctoral dissertation.
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.
Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1993). Computerized adaptive testing using the partial credit model: Effects of item pool characteristics and different stopping rules. Educational & psychological measurement, 53(1), 61-77.
Dunkel, P. (1999). Research and development of a computer-adaptive test of listening comprehension in the less-commonly taught language Hausa. In M. Chaloub-Deville (Chair), Issues in computer-adaptive testing of second language reading proficiency (pp. 1-3). Symposium conducted at the Center for Advanced Research on Language Acquisition of the University of Minnesota, Bloomington, MN.
Eggen, T. J. H. M. (2004). CATs for kids: easy and efficient. Paper presented at the 2004 meeting of Association of Test Publishers. Palm Springs, CA.
Flaugher, R. (2000). Item pools. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 37-59). Mahwah, NJ: Lawrence Erlbaum.
French, B. F., & Thompson, T. D. (2003). The evaluation of exposure control procedures for an operational CAT. In Poster presented at the annual meeting of the American Educational Research Association (AERA), Chicago, IL.
Gardner, W., Shear, K., Kelleher, K. J., Pajer, K. A., Mammen, O., Buysse, D., & Frank, E. (2004). Computerized adaptive measurement of depression: a simulation study. BMC psychiatry, 4(1), 13.
Georgiadou, E. G., Triantafillou, E., & Economides, A. A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. The Journal of Technology, Learning and Assessment, 5(8).
Gu, L. (2007). Designing optimal item pools for computerized adaptive tests with exposure controls. Unpublished doctoral dissertation. Michigan State University.
Gu, L. & Reckase, M. D. (2007). Designing optimal item pools for computerized adaptive tests with Sympson-Hetter exposure control.  Paper Presented at the 2007 GMAC Conference on Computerized Adaptive Testing, Minneapolis, MN.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park CA: Sage.
Han, K. T. (2012). An efficiency balanced information criterion for item selection in computerized adaptive testing. Journal of Educational Measurement, 49(3), 225-246.
Han, K. T. (2011). User’s manual: SimulCAT. Retrieved March, 1, 2013.
Hau, K. T., & Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first. Journal of Educational Measurement, 38(3), 249-266.
He, W., & Reckase, M.  (2010). Optimal item pool design for a highly constrained computerized adaptive test. Unpublished doctoral dissertaion. Michigan State University.
Kanjanawasee, S. (2012). Modern Test Theory (Ed.4). Bangkok: Chulalongkorn University Press.
Kaya-Carton, E., Carton, A. S., & Dandonoli, P. (1991). Developing a computer-adaptive test of French reading proficiency. In P. Dunkel (Ed.). Computer-assisted languagelearning and testing: Research issues and practice. New York: Newbury House.
Kalender, İ. (2011). Effects of different computerized adaptive testing strategies on recovery of ability. Unpublished doctoral dissertation, Middle East Technical University, Ankara, Turkey.
Laufer, B., & Goldstein, Z. (2004). Testing vocabulary knowledge: Size, Strength, and Computer adaptiveness. Language Learning, 54, 399-436. doi: 10.1111/j.0023-8333.2004.00260.x.
Leung, C. K., Chang, H. H., & Hau, K. T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm. Applied Psychological Measurement, 26(4), 376-392.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied measurement in education, 2(4), 359-375.
Lee, H., & Dodd, B. G. (2012). Comparison of exposure controls, item pool characteristics, and population distributions for CAT using the partial credit model. Educational & Psychological Measurement, 72(1), 159-175.
Madsen, H. S. (1991). Computer-adaptive testing of listening and reading comprehension: The Brigham Young University approach. In P. Dunkel (Ed.), Computer-assistedlanguage learning and testing: Research issues and practice. New York: Newbury House.
McBride, J. R., & Weiss, D. J. (1976). Some properties of a Bayesian adaptive ability testing strategy (Research Rep No. 76-1). Minneapolis, MN: Psychometric Methods Program, Department of Psychology.
Millman, J., & Arter, J. A. (1984). Issues in item banking. Journal of Educational Measurement, 21, 315-330.
National Council of State Boards of Nursing, & National Council of State Boards of Nursing. (2011). RN practice analysis: Linking the NCLEX-RN examination to practice. NCSBN Research Brief, 53.
Ozturk, N. B., & Dogan, N. (2015). Investigating Item Exposure Control Methods in Computerized Adaptive Testing. Educational Sciences: Theory and Practice, 15(1), 85-98.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356.
Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2002). Practical Considerations in Computer-Based Testing. New York: Springer.
Phankokkruad, M. (2012). Association Rules for Data Mining in Item Classification Algorithm: Web Service Approach.
Reckase, M. D. (1989). Adaptive testing: The evolution of a good idea. Educational Measurement: Issues & Practice, 8(3), 11-15.
Reckase, M. D. (2003). Item pool design for computerized adaptive tests. Paper presented at the National Council on Measurement in Education, Chicago, IL.
Reckase, M. D., & He, W. (2004). The ideal item pool for the NCLEX-RN examination— Report to NCSBN: Michigan State University.
Reckase, M. D., & He, W. (2005). Ideal item pool design for the NCLEX-RN exam. Michigan State University, East Lansing, MI.
Reckase, M. D. (2007). The design of p-optimal item bank for computerized adaptive tests. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing.
Reckase, M. D. (2009). Optimal Item Pool Design for the 2009 NCLEX Exam. A Report SubMTIted to National Council of State Boards of Nursing March 2009.
Reckase, M. D., & He, W. (2009a). Optimal item pool design for the 2009 NCLEX Exam--report to the National Council of State Boards of Nursing (NCSBN): Michigan State University.
Reckase, M. D., & He, W. (2009b). The influence of item pool quality on the functioning of computerized adaptive tests. Paper presented at the annual meeting of Psychometric Society, Cambridge, U.K.
Reckase, M. D. (2010). Designing Item Pools to Optimize the Functioning of Computerized Adaptive Test. Psychological Test and Assessment Modeling,52(2), 127-141.
Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 35(4), 311-327.
Rudner, L. M. (2010). Implementing the Graduate Management Admission Test computerized adaptive test. In W. J. van der Linden & C. W. Glas (Eds.), Elements ofadaptive testing. New York: Springer.
Stocking, M. L., Swanson, L., & Pearlman, M. (1993). Application of an automated item selection method to real data. Applied Psychological Measurement, 17, 167–176.
Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item-exposure rates in computerized adaptive testing. Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973-977). San Diego, CA: Navy Personnel Research and Development Center.
Tseng, W. T. (2016). Measuring English vocabulary size via computerized adaptive testing. Computers & Education, 97, 69-85
Van der Linden, W. J., Scrams, D. J., & Schnipke, D. L. (1999). Using response-time constraints to control for speededness in computerized adaptive testing. Applied PsychologicalMeasurement, 23, 195-210.
Van der Linden, W. J., & Glas, C.A.W. (2000). Computerized adaptive testing: Theory and practise. Boston, MA: Kluwer Academic Publishers.
Van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. In W. J. van der Linden & C. W. Glas (Eds.), Elements of adaptive testing. New York: Springer.
Veldkamp, B. P, Vershoor, A. J., Eggen, T. J. (2007). A Multiple Objective Test Assembly Approach for Exposure Control Problems in Computerized Adaptive Testing. Measurement and Research Department Reports.
Verschoor, A. J., & Straetmans, G. J. J. N. (2000). MathCAT: A flexible testing system in mathematics education for adults. In W.J. van der Linden, and C. A. W. Glas (Eds.) Computerized adaptive testing: Theory and practice (pp. 101-116). Boston, MA: Kluwer Academic Publishers.
Vispoel, W. P. (1993). Computerized adaptive and fixed-item versions of the ITED vocabulary subtest. Education and Psychological Measurement, 53, 779-788. doi: 10.1177/0013164493053003022.
Vispoel, W. P. (1998). Psychometric characteristics of computer-adaptive and self adaptive vocabulary tests: The role of answer feedback and test anxiety. Journal of Educational Measurement, 35, 155-167. doi: 10.1111/j.1745-3984.1998.tb00532.x
Vispoel, W. P., Rocklin, T. R., & Wang, T. (1994). Individual differences and test administration procedures: A comparison of fixed-item, computerized-adaptive, and self-adapted testing. Applied Measurement in Education, 7, 53-79. doi: 10.1207/s15324818ame0701_5
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., Mislevy, R. J., Steinberg, L., Thissen, D. (1990). Computerized adaptive testing: A primer. Hillsdale, NJ: Lawrence Erlbaum Associates.
Wainer, H., Dorans, N. J., Eignor, D., Flaugher, R., Green, B. F., Mislevy, R. J., Steinberg, L., & Thissen, D. (2000).Computerized adaptive testing: a primer (2nd edition). Mahwah, NJ: Lawrence Erlba.
Wang, H. P., Kuo, B. C., Chao, R. C., & Tsai, Y. H. (2012). The Development and Evaluation of a Computerized Adaptive Testing System for Chinese Proficiency-base on CEFR. Procedia-Social and Behavioral Sciences, 64, 34-42.
Yen, W. M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 8(2), 125-145.
Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104-1.