اثر همبستگی آزمون لنگر با آزمون کل بر نتایج همترازسازی: مرور سیستماتیک

نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشجوی دکتری سنجش و اندازه گیری، دانشکده روانشناسی و علوم تربیتی، دانشگاه تهران، تهران، ایران

2 دانشیار بخش تخصصی پژوهش و سنجش، دانشکده روانشناسی و علوم تربیتی، دانشگاه تهران، تهران، ایران

10.22034/emes.2023.1971260.2430

چکیده

هدف: یکی از ویژگی‌های آزمون لنگر که از مؤلفه‌های مهم همترازسازی است، همبستگی آن با آزمون کل است. در این مرور سیستماتیک، اثر این ویژگی بر فرایند همترازسازی و عوامل مؤثر بر آن بررسی گردید.
روش پژوهش: یک مرور سیستماتیک بر اساس اطلاعات موجود در پایگاه‌های داده PubMed، Medline، ERIC، JSTOR و Wiley، وب­سایت­های SAGE، ETS، ACADEMIA و نیز بررسی منابع مندرج در برخی مقاله‌های مهم اجرا شد. جستجو در بازه زمانی 1950 تا 2022 تنها برای منابع انگلیسی صورت پذیرفت. اصطلاحات جستجو شامل همترازسازی، لنگر و همبستگی بود که با ترکیب آن‌ها، راهبردهای جستجو به دست آمد.
یافته‌ها: با توجه به ملاک‌های ورود، 18 مطالعه از 167 منبع جستجو شده، برای بررسی به این مرور راه یافتند. کیفیت این پژوهش­ها با استفاده از ابزار سنجش کیفیت مطالعه­ها با طرح­های مختلف (QATSDD) مورد ارزیابی قرار گرفت. نتایج مطالعه نشان داد که طول آزمون، پایایی آزمون، نوع لنگر از نظر ویژگی‌های آماری، ساختار محتوایی آزمون لنگر و تفاوت در توانایی گروه‌ها، عواملی هستند که بر همبستگی آزمون لنگر و آزمون کل مؤثر است. علاوه بر این، نتایج حاکی از آن بود که با افزایش این همبستگی، کیفیت و دقت برآورد پارامترها در فرایند همترازسازی بهبود می­یابد و از خطای استاندارد همترازسازی کاسته می­شود.
نتیجه‌گیری: به دلیل اهمیت همبستگی میان آزمون لنگر و آزمون کل، لازم است مقدار این همبستگی و عوامل مؤثر بر آن در مراحل ساخت آزمون و قبل از اجرای تحلیل‌های مرتبط با همترازسازی به‌دقت بررسی و تحلیل شود تا از بروز خطای همترازسازی و سوگیری در نتایج کاسته شود.

کلیدواژه‌ها


عنوان مقاله [English]

The Effect of the Anchor to Total Test Correlation on Equating Results: A Systematic Review

نویسندگان [English]

  • Vahideh Asadi 1
  • Ali Moghadamzadeh 2
  • Keyvan Salehi 2
1 PhD Student of Educational Measurement and Evaluation, Faculty of Psychology and Education, University of Tehran, Tehran, Iran
2 Associate Professor, Division of Research and Assessment, Faculty of Psychology and Education, University of Tehran, Tehran, Iran
چکیده [English]

Objective: One of the features of the anchor test, which can affect the equating process, is its correlation with the total test. This systematic review addressed the effects of this feature on the equating process and the factors affecting it.
Methods: To this end, the terms equating, anchor, correlation, and a combination of them were searched on PubMed, Medline, ERIC, JSTOR, and Wiley databases, SAGE, ETS, and ACADEMIA websites, and references of some important articles. The search was restricted to English sources from 1950 to 2022.
Results: Based on the inclusion criteria, 18 out of the 167 extracted documents were selected for further analysis. The quality of documents was measured using the Quality Assessment Tool for Studies with Diverse Designs (QATSDD). The results showed that the test length, test reliability, statistical characteristics of the anchor, the content structure of the anchor test, and differences in the ability of examinee groups were the most important factors affecting the correlation between the anchor test and the total test. The results also demonstrated that the increased correlation between these two tests improved the quality and accuracy of parameter estimation in the equating process and reduced the standard error of equating.
Conclusion: Considering the importance of the correlation between the anchor test and the total test, it is necessary to carefully examine and analyze the value of this correlation and the factors affecting it in the test development process before equating related analysis to minimize errors and biased results.

کلیدواژه‌ها [English]

  • Keywords: Equating
  • Anchor Test
  • Correlation
  • Systematic Review

References

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). American Council on Education.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
Arikan, C. A., & Gelbal, S. (2018). The effect of mini and midi anchor tests on test equating. The International Journal of Progressive Education, 14(2), 148-160. https://doi.org/10.29329/ijpe.2018.1 39.11
Balla, J. (1988). The effects of reducing correlation of external anchors on test equating methods for the equivalent groups and non-equivalent groups designs. International Journal of Educational Research, 12(4), 409-425. https://doi.org/10.1016/0883-0355(88)90034-1
Braun, H. I., & Holland, P. W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). Academic.
Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. CASMA. https://education.uiowa.edu/sites/education.uiowa.edu/files/2021-11/casma-monograph-1.pdf
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test.
Educational Measurement, 22(1), 13–20. https://www.jstor.org/stable/1434562
Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28(4), 227–246. https://doi.org/10.1177/0146621604265031
Dorans, N. J., Liu, J., & Hammond, S. (2008). Anchor test type and population invariance: An
exploration across subpopulations and test administrations. Applied Psychological Measurement, 32(1), 81–97. https://doi.org/10.1177/0146621607311580
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating (RR-10-29). ETS. https://files.eric.ed.gov/fulltext/ED523737.pdf
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2011). Equating test scores: toward best practices. In A. A. von Davier (Ed.), Statistical models for test equating, scaling and linking (pp. 21-42). Springer.
Fenton, L., Lauckner, H., & Gilbert, R. (2015). The QATSDD critical appraisal tool: comments and critiques. Evaluation in clinical Practice, 21, 1125-1128. https://doi.org/10.1111/jep.12487
Gonzalez, J., & Wiberg, M. (2017). Applying test equating method using R. Springer.
Haberman, S., & Dorans, N. J. (2009, April). Scale consistency, drift, stability: Definitions, distinctions, and principles [Paper presentation]. National Council on Measurement in Education, San Diego, CA. http://www.ets.org/legal/index.html
Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups (Doctoral Dissertation, University of Iowa). https://doi.org/10.17077/etd.bc5ticit
Kanamori, L. F., Xu, C., Hasan, S. S., & Doi, S. A. (2021). Quality versus risk of bias assessment in clinical research. Clinical Epidemiology, 129, 172-175. https://doi.org/10.1016/j.jclinepi.2020.09.044
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item
equating with nonrandom groups. Educational Measurement, 22(3), 197–206. http://www.jstor.org/stable/1435033
Kolen, M. J. (2020). Equating with small samples (Commentary). Applied Measurement in Education, 33(1), 77-82. https://doi.org/10.1080/08957347.2019.1674308
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). Springer.
Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking (3rd ed.). Springer.
Lasserson, T. J., Thomas, J., & Higgins, J. P. T. (2019). Starting a review. In J. P. T. Higgins, J. Thomas, J. Chandler, M. Cumpston, T. Li, M. J. Page & V. A. Welch (Eds.), Cochrane Handbook for systematic review of interventions (2nd ed., pp. 1-12). Wiley-Blackwell.
Lin, P., Dorans, N., & Weeks, J. (2016). Linking composite scores: Effects of anchor test length and content representativeness (Research Report No. RR-16-36). Educational Testing Service. https://doi.org/10.10 02/ets2.12122
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2009). The effects of different types of anchor tests on observed score equating. ETS. https://www.ets.org/research/contact.html
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011a). Test score equating using a mini-version anchor and a midi anchor: A case study using SAT data. Educational Measurement, 48(4), 361-379. https://doi.org/10.1111/j.1745-3984.2011.00150.x
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011b). Observed score equating using a mini-version anchor and an anchor with less spread of difficulty: A comparison study. Educational and Psychological Measurement, 71, 346–361. https://doi.org/10.1177/0013164410375571
Liu, J., & Walker, M. E. (2007). Score linking issues related to test content changes. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 109-134). Springer.
Livingston, S. A. (2004). Equating test scores (without IRT). ETS. https://www.ets.org/Media/Research/p df/LIVINGSTON.pdf
Lord, F. M. (1975). A survey of equating methods based on item characteristic curve theory (RB 75-13).
Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1975.tb01052.x
Lord, F. M. (1977). Practical applications of item characteristic curve theory. Educational Measurement, 14(2), 117-138. http://doi.org/10.2307/1434011
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.
Marengo, D., Miceli, R., Rosato, R., & Settanni, M. (2018). Placing multiple tests on a common scale using a post-test anchor design: Effects of item position and order on the stability of parameter estimates. Applied Mathematics and Statistics, 4, 1-14. http://doi.org/10.3389/fams.2018.00050
Moses, T., Deng, W., & Zhang, Y. L. (2010). The use of two anchors in nonequivalent groups with anchor test (NEAT) equating. ETS. http://doi.org/10.1002/j.2333-8504.2010.tb02230.x
Moses, T., & Kim, S. (2007). Reliability and the nonequivalent groups with anchor test design (RR-07-16). ETS. https://doi.org/10.1002/j.2333-8504.2007.tb02058.x
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S.,... Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Page, M. J., McKenzie, J. E., & Higgins, J. P. T. (2018). Tools for assessing risk of reporting biases in studies and syntheses of studies: A systematic review. BMJ open, 8(3), 1-16. https://doi.org/10.11 36/bmjopen-2017-019703
Petersen, N. S. (2007). Equating: best practices and challenges to best practices. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 59-72). Springer.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L.
Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). Macmillan.
Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear score equating models. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 71–135). Academic.
Puhan, G. (2010). A comparison of chained linear and poststratification linear equating under different testing conditions. Educational Measurement, 47(1), 54-75. https://doi.org/10.1111/j.1745-3984.2009 .00099.x
Ricker, K. L., & von Davier, A. A. (2007). The Impact of anchor test length on equating results in a nonequivalent groups design. ETS. https://www.ets.org/research/contact.html
Ryan, J., & Brockmann, F. (2018). A practitioner’s introduction to equating with primers on classical test theory and item response theory. The Council of Chief State School Officers. https://ccsso.org/sites/de fault/files/201806/A%20Practitioners%20Introduction%20to%20Equating%20revised%20edition.pdf
Santos, C. M. C., Pimenta, C. A. M., & Nobre, M. R. C. (2007). The PICO strategy for the research question construction and evidence search. Rev Latino-am Enfermagem, 15(3), 508–5011. https://doi.org/10.15 90/s0104-11692007000300023
Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P): elaboration and explanation. BMJ, 349:g7647. https://doi.org/10.1136 /bmj.g7647
Shea, J. A., & and Norcini, J. J. (1995). Licensure testing: Purposes, procedures, and practices. University of Nebraska-Lincoln. https://digitalcommons.unl.edu/buroslicensure/16/
Sinharay, S. (2017). On the choice of anchor test in equating. Educational Measurement: Issues and Practice, 37(4), 1-6. https://doi.org/10.1111/emip.12175
Sinharay, S., Haberman, S., Holland, P., & Lewis, C. (2012). A note on the choice of an anchor test in equating. ETS. https://doi.org/10.1002/j.2333-8504.2012.tb02296.x
Sinharay, S., & Holland, P. W. (2006a). The correlation between the scores of a test and an anchor test. ETS. https://doi.org/10.1002/j.2333-8504.2006.tb02010.x
Sinharay, S., & Holland, P. W. (2006b). Choice of anchor test in equating. ETS. https://doi.org/10.1 002/j.2333-8504.2006.tb02040.x
Sinharay, S., & Holland, P. W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed? Educational Measurement, 44, 249–275. https://doi.org/10 .1111/j.1745-3984.2007.00037.x
Sirriyeh, R., Lawton, R., Gardner, P., & Armitage, G. (2011). Reviewing studies with diverse designs: the development and evaluation of a new tool. Evaluation in Clinical Practice, 18(4), 746-752. https://doi.org/10.1111/j.1365-2753.2011.01662.x
Suh, Y., Morch, A. A., Kane, M. T., & Ripkey, D. R. (2009). An empirical comparison of five linear equating methods for the NEAT design. Measurement: Interdisciplinary Research and Perspectives, 7(3), 147-173. https://doi.org/10.1080/15366360903418048
Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small sample under the NEAT design: A simulation study (Doctoral Dissertation, University of North Carolina). https://libres.uncg.edu/ir/uncg/listing.aspx?id=8164
Tai, J., Ajjawi, R., Bearman, M., & Wiseman, P. (2020). Conceptualizations and measures of student engagement: A worked example of systematic review. In O. Zawacki-Richter, M. Kerres, S. Bendenlier, M. Bond & K. Buntins (Eds.), Systematic reviews in educational research (pp. 91-110). Springer. https://doi.org/10.1007/978-3-658-27602-7
Trierweiler, T. J., Lewis, C., & Smith, R. L. (2016). Further study of the choice of anchor tests in equating. Educational Measurement, 53, 498–518. https://doi.org/10.1111/jedm.12128
van der Linden, W. J., & Wiberg, M. (2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34(8), 620-640. https://doi.org/10.1177/0146621609349803
von Davier, A. A. (2008). New results on the linear equating methods for the non-equivalent-groups design. Educational and Behavioral Statistics, 33(2), 186-203. https://doi.org/10.3102/1076998608302633
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating.
Springer.
Wallin, G., Haggstrom, J., & Wiberg, M. (2021). How important is the choice of bandwidth in kernel equating? Applied Psychology Measurement, 45(7-8), 518-535. https://doi.org/10.1177/0146621621 1040486
Wei, H. (2010, May). Impact of non-representative anchor items on scale stability [Paper presentation]. National Council on Measurement in Education, Denver, Pearson.
Yang, W. L., & Houang, R. T. (1996, April). The effect of anchor length and equating method on the accuracy of test equating: comparison of linear and IRT-based equating using an anchor-item design [Paper presentation]. American Educational Research Association, New York, NY. https://eric.ed.gov/?id=ED401308
Yi, H. S. (2009). Evaluating the performance of non-equivalent groups anchor test equating under various conditions of anchor test construction. Educational Evaluation, 22(3), 847-869. https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART001378603
Zhang, M., & Kolen, M. J. (2013). Effect of the number of common items on equating precision and estimation of the lower bound to the number of common items needed. Center for Advanced Studies
in Measurement and Assessment (CASMA). https://www.education.uiowa.edu/casma