The Effect of the Anchor to Total Test Correlation on Equating Results: A Systematic Review

Asadi, Vahideh; Moghadamzadeh, Ali; Salehi, Keyvan

doi:10.22034/emes.2023.1971260.2430

The Effect of the Anchor to Total Test Correlation on Equating Results: A Systematic Review

Document Type : Original Article

Authors

¹ PhD Student of Educational Measurement and Evaluation, Faculty of Psychology and Education, University of Tehran, Tehran, Iran

² Associate Professor, Division of Research and Assessment, Faculty of Psychology and Education, University of Tehran, Tehran, Iran

10.22034/emes.2023.1971260.2430

Abstract

Objective: One of the features of the anchor test, which can affect the equating process, is its correlation with the total test. This systematic review addressed the effects of this feature on the equating process and the factors affecting it.
Methods: To this end, the terms equating, anchor, correlation, and a combination of them were searched on PubMed, Medline, ERIC, JSTOR, and Wiley databases, SAGE, ETS, and ACADEMIA websites, and references of some important articles. The search was restricted to English sources from 1950 to 2022.
Results: Based on the inclusion criteria, 18 out of the 167 extracted documents were selected for further analysis. The quality of documents was measured using the Quality Assessment Tool for Studies with Diverse Designs (QATSDD). The results showed that the test length, test reliability, statistical characteristics of the anchor, the content structure of the anchor test, and differences in the ability of examinee groups were the most important factors affecting the correlation between the anchor test and the total test. The results also demonstrated that the increased correlation between these two tests improved the quality and accuracy of parameter estimation in the equating process and reduced the standard error of equating.
Conclusion: Considering the importance of the correlation between the anchor test and the total test, it is necessary to carefully examine and analyze the value of this correlation and the factors affecting it in the test development process before equating related analysis to minimize errors and biased results.

Keywords

References

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). American Council on Education.

Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.

Arikan, C. A., & Gelbal, S. (2018). The effect of mini and midi anchor tests on test equating. The International Journal of Progressive Education, 14(2), 148-160. https://doi.org/10.29329/ijpe.2018.1 39.11

Balla, J. (1988). The effects of reducing correlation of external anchors on test equating methods for the equivalent groups and non-equivalent groups designs. International Journal of Educational Research, 12(4), 409-425. https://doi.org/10.1016/0883-0355(88)90034-1

Braun, H. I., & Holland, P. W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). Academic.

Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. CASMA. https://education.uiowa.edu/sites/education.uiowa.edu/files/2021-11/casma-monograph-1.pdf

Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test.
Educational Measurement, 22(1), 13–20. https://www.jstor.org/stable/1434562

Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28(4), 227–246. https://doi.org/10.1177/0146621604265031

Dorans, N. J., Liu, J., & Hammond, S. (2008). Anchor test type and population invariance: An
exploration across subpopulations and test administrations. Applied Psychological Measurement, 32(1), 81–97. https://doi.org/10.1177/0146621607311580

Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating (RR-10-29). ETS. https://files.eric.ed.gov/fulltext/ED523737.pdf

Dorans, N. J., Moses, T. P., & Eignor, D. R. (2011). Equating test scores: toward best practices. In A. A. von Davier (Ed.), Statistical models for test equating, scaling and linking (pp. 21-42). Springer.

Fenton, L., Lauckner, H., & Gilbert, R. (2015). The QATSDD critical appraisal tool: comments and critiques. Evaluation in clinical Practice, 21, 1125-1128. https://doi.org/10.1111/jep.12487

Gonzalez, J., & Wiberg, M. (2017). Applying test equating method using R. Springer.

Haberman, S., & Dorans, N. J. (2009, April). Scale consistency, drift, stability: Definitions, distinctions, and principles [Paper presentation]. National Council on Measurement in Education, San Diego, CA. http://www.ets.org/legal/index.html

Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups (Doctoral Dissertation, University of Iowa). https://doi.org/10.17077/etd.bc5ticit

Kanamori, L. F., Xu, C., Hasan, S. S., & Doi, S. A. (2021). Quality versus risk of bias assessment in clinical research. Clinical Epidemiology, 129, 172-175. https://doi.org/10.1016/j.jclinepi.2020.09.044

Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item
equating with nonrandom groups. Educational Measurement, 22(3), 197–206. http://www.jstor.org/stable/1435033

Kolen, M. J. (2020). Equating with small samples (Commentary). Applied Measurement in Education, 33(1), 77-82. https://doi.org/10.1080/08957347.2019.1674308

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). Springer.

Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking (3rd ed.). Springer.

Lasserson, T. J., Thomas, J., & Higgins, J. P. T. (2019). Starting a review. In J. P. T. Higgins, J. Thomas, J. Chandler, M. Cumpston, T. Li, M. J. Page & V. A. Welch (Eds.), Cochrane Handbook for systematic review of interventions (2nd ed., pp. 1-12). Wiley-Blackwell.

Lin, P., Dorans, N., & Weeks, J. (2016). Linking composite scores: Effects of anchor test length and content representativeness (Research Report No. RR-16-36). Educational Testing Service. https://doi.org/10.10 02/ets2.12122

Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2009). The effects of different types of anchor tests on observed score equating. ETS. https://www.ets.org/research/contact.html

Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011a). Test score equating using a mini-version anchor and a midi anchor: A case study using SAT data. Educational Measurement, 48(4), 361-379. https://doi.org/10.1111/j.1745-3984.2011.00150.x

Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011b). Observed score equating using a mini-version anchor and an anchor with less spread of difficulty: A comparison study. Educational and Psychological Measurement, 71, 346–361. https://doi.org/10.1177/0013164410375571

Liu, J., & Walker, M. E. (2007). Score linking issues related to test content changes. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 109-134). Springer.

Livingston, S. A. (2004). Equating test scores (without IRT). ETS. https://www.ets.org/Media/Research/p df/LIVINGSTON.pdf

Lord, F. M. (1975). A survey of equating methods based on item characteristic curve theory (RB 75-13).
Educational Testing Service. https://doi.org/10.1002/j.2333-8504.1975.tb01052.x

Lord, F. M. (1977). Practical applications of item characteristic curve theory. Educational Measurement, 14(2), 117-138. http://doi.org/10.2307/1434011

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.

Marengo, D., Miceli, R., Rosato, R., & Settanni, M. (2018). Placing multiple tests on a common scale using a post-test anchor design: Effects of item position and order on the stability of parameter estimates. Applied Mathematics and Statistics, 4, 1-14. http://doi.org/10.3389/fams.2018.00050

Moses, T., Deng, W., & Zhang, Y. L. (2010). The use of two anchors in nonequivalent groups with anchor test (NEAT) equating. ETS. http://doi.org/10.1002/j.2333-8504.2010.tb02230.x

Moses, T., & Kim, S. (2007). Reliability and the nonequivalent groups with anchor test design (RR-07-16). ETS. https://doi.org/10.1002/j.2333-8504.2007.tb02058.x

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S.,... Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

Page, M. J., McKenzie, J. E., & Higgins, J. P. T. (2018). Tools for assessing risk of reporting biases in studies and syntheses of studies: A systematic review. BMJ open, 8(3), 1-16. https://doi.org/10.11 36/bmjopen-2017-019703

Petersen, N. S. (2007). Equating: best practices and challenges to best practices. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 59-72). Springer.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L.
Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). Macmillan.

Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear score equating models. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 71–135). Academic.

Puhan, G. (2010). A comparison of chained linear and poststratification linear equating under different testing conditions. Educational Measurement, 47(1), 54-75. https://doi.org/10.1111/j.1745-3984.2009 .00099.x

Ricker, K. L., & von Davier, A. A. (2007). The Impact of anchor test length on equating results in a nonequivalent groups design. ETS. https://www.ets.org/research/contact.html

Ryan, J., & Brockmann, F. (2018). A practitioner’s introduction to equating with primers on classical test theory and item response theory. The Council of Chief State School Officers. https://ccsso.org/sites/de fault/files/201806/A%20Practitioners%20Introduction%20to%20Equating%20revised%20edition.pdf

Santos, C. M. C., Pimenta, C. A. M., & Nobre, M. R. C. (2007). The PICO strategy for the research question construction and evidence search. Rev Latino-am Enfermagem, 15(3), 508–5011. https://doi.org/10.15 90/s0104-11692007000300023

Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P): elaboration and explanation. BMJ, 349:g7647. https://doi.org/10.1136 /bmj.g7647

Shea, J. A., & and Norcini, J. J. (1995). Licensure testing: Purposes, procedures, and practices. University of Nebraska-Lincoln. https://digitalcommons.unl.edu/buroslicensure/16/

Sinharay, S. (2017). On the choice of anchor test in equating. Educational Measurement: Issues and Practice, 37(4), 1-6. https://doi.org/10.1111/emip.12175

Sinharay, S., Haberman, S., Holland, P., & Lewis, C. (2012). A note on the choice of an anchor test in equating. ETS. https://doi.org/10.1002/j.2333-8504.2012.tb02296.x

Sinharay, S., & Holland, P. W. (2006a). The correlation between the scores of a test and an anchor test. ETS. https://doi.org/10.1002/j.2333-8504.2006.tb02010.x

Sinharay, S., & Holland, P. W. (2006b). Choice of anchor test in equating. ETS. https://doi.org/10.1 002/j.2333-8504.2006.tb02040.x

Sinharay, S., & Holland, P. W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed? Educational Measurement, 44, 249–275. https://doi.org/10 .1111/j.1745-3984.2007.00037.x

Sirriyeh, R., Lawton, R., Gardner, P., & Armitage, G. (2011). Reviewing studies with diverse designs: the development and evaluation of a new tool. Evaluation in Clinical Practice, 18(4), 746-752. https://doi.org/10.1111/j.1365-2753.2011.01662.x

Suh, Y., Morch, A. A., Kane, M. T., & Ripkey, D. R. (2009). An empirical comparison of five linear equating methods for the NEAT design. Measurement: Interdisciplinary Research and Perspectives, 7(3), 147-173. https://doi.org/10.1080/15366360903418048

Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small sample under the NEAT design: A simulation study (Doctoral Dissertation, University of North Carolina). https://libres.uncg.edu/ir/uncg/listing.aspx?id=8164

Tai, J., Ajjawi, R., Bearman, M., & Wiseman, P. (2020). Conceptualizations and measures of student engagement: A worked example of systematic review. In O. Zawacki-Richter, M. Kerres, S. Bendenlier, M. Bond & K. Buntins (Eds.), Systematic reviews in educational research (pp. 91-110). Springer. https://doi.org/10.1007/978-3-658-27602-7

Trierweiler, T. J., Lewis, C., & Smith, R. L. (2016). Further study of the choice of anchor tests in equating. Educational Measurement, 53, 498–518. https://doi.org/10.1111/jedm.12128

van der Linden, W. J., & Wiberg, M. (2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34(8), 620-640. https://doi.org/10.1177/0146621609349803

von Davier, A. A. (2008). New results on the linear equating methods for the non-equivalent-groups design. Educational and Behavioral Statistics, 33(2), 186-203. https://doi.org/10.3102/1076998608302633

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating.
Springer.

Wallin, G., Haggstrom, J., & Wiberg, M. (2021). How important is the choice of bandwidth in kernel equating? Applied Psychology Measurement, 45(7-8), 518-535. https://doi.org/10.1177/0146621621 1040486

Wei, H. (2010, May). Impact of non-representative anchor items on scale stability [Paper presentation]. National Council on Measurement in Education, Denver, Pearson.

Yang, W. L., & Houang, R. T. (1996, April). The effect of anchor length and equating method on the accuracy of test equating: comparison of linear and IRT-based equating using an anchor-item design [Paper presentation]. American Educational Research Association, New York, NY. https://eric.ed.gov/?id=ED401308

Yi, H. S. (2009). Evaluating the performance of non-equivalent groups anchor test equating under various conditions of anchor test construction. Educational Evaluation, 22(3), 847-869. https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART001378603

Zhang, M., & Kolen, M. J. (2013). Effect of the number of common items on equating precision and estimation of the lower bound to the number of common items needed. Center for Advanced Studies
in Measurement and Assessment (CASMA). https://www.education.uiowa.edu/casma

Educational Measurement and Evaluation Studies

Article View: 114
PDF Download: 88

The Effect of the Anchor to Total Test Correlation on Equating Results: A Systematic Review

References

References

Volume 13, Issue 43
October 2023
Pages 7-27

Files

Share

How to cite

Statistics

The Effect of the Anchor to Total Test Correlation on Equating Results: A Systematic Review

References

References

Volume 13, Issue 43October 2023Pages 7-27

Files

Share

How to cite

Statistics

Volume 13, Issue 43
October 2023
Pages 7-27