The Effect of the Anchor to Total Test Correlation on Equating Results: A Systematic Review

Document Type : Original Article


1 PhD Student of Educational Measurement and Evaluation, Faculty of Psychology and Education, University of Tehran, Tehran, Iran

2 Associate Professor, Division of Research and Assessment, Faculty of Psychology and Education, University of Tehran, Tehran, Iran



Objective: One of the features of the anchor test, which can affect the equating process, is its correlation with the total test. This systematic review addressed the effects of this feature on the equating process and the factors affecting it.
Methods: To this end, the terms equating, anchor, correlation, and a combination of them were searched on PubMed, Medline, ERIC, JSTOR, and Wiley databases, SAGE, ETS, and ACADEMIA websites, and references of some important articles. The search was restricted to English sources from 1950 to 2022.
Results: Based on the inclusion criteria, 18 out of the 167 extracted documents were selected for further analysis. The quality of documents was measured using the Quality Assessment Tool for Studies with Diverse Designs (QATSDD). The results showed that the test length, test reliability, statistical characteristics of the anchor, the content structure of the anchor test, and differences in the ability of examinee groups were the most important factors affecting the correlation between the anchor test and the total test. The results also demonstrated that the increased correlation between these two tests improved the quality and accuracy of parameter estimation in the equating process and reduced the standard error of equating.
Conclusion: Considering the importance of the correlation between the anchor test and the total test, it is necessary to carefully examine and analyze the value of this correlation and the factors affecting it in the test development process before equating related analysis to minimize errors and biased results.



Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). American Council on Education.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
Arikan, C. A., & Gelbal, S. (2018). The effect of mini and midi anchor tests on test equating. The International Journal of Progressive Education, 14(2), 148-160. 39.11
Balla, J. (1988). The effects of reducing correlation of external anchors on test equating methods for the equivalent groups and non-equivalent groups designs. International Journal of Educational Research, 12(4), 409-425.
Braun, H. I., & Holland, P. W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). Academic.
Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. CASMA.
Budescu, D. (1985). Efficiency of linear equating as a function of the length of the anchor test.
Educational Measurement, 22(1), 13–20.
Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28(4), 227–246.
Dorans, N. J., Liu, J., & Hammond, S. (2008). Anchor test type and population invariance: An
exploration across subpopulations and test administrations. Applied Psychological Measurement, 32(1), 81–97.
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2010). Principles and practices of test score equating (RR-10-29). ETS.
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2011). Equating test scores: toward best practices. In A. A. von Davier (Ed.), Statistical models for test equating, scaling and linking (pp. 21-42). Springer.
Fenton, L., Lauckner, H., & Gilbert, R. (2015). The QATSDD critical appraisal tool: comments and critiques. Evaluation in clinical Practice, 21, 1125-1128.
Gonzalez, J., & Wiberg, M. (2017). Applying test equating method using R. Springer.
Haberman, S., & Dorans, N. J. (2009, April). Scale consistency, drift, stability: Definitions, distinctions, and principles [Paper presentation]. National Council on Measurement in Education, San Diego, CA.
Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups (Doctoral Dissertation, University of Iowa).
Kanamori, L. F., Xu, C., Hasan, S. S., & Doi, S. A. (2021). Quality versus risk of bias assessment in clinical research. Clinical Epidemiology, 129, 172-175.
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item
equating with nonrandom groups. Educational Measurement, 22(3), 197–206.
Kolen, M. J. (2020). Equating with small samples (Commentary). Applied Measurement in Education, 33(1), 77-82.
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). Springer.
Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking (3rd ed.). Springer.
Lasserson, T. J., Thomas, J., & Higgins, J. P. T. (2019). Starting a review. In J. P. T. Higgins, J. Thomas, J. Chandler, M. Cumpston, T. Li, M. J. Page & V. A. Welch (Eds.), Cochrane Handbook for systematic review of interventions (2nd ed., pp. 1-12). Wiley-Blackwell.
Lin, P., Dorans, N., & Weeks, J. (2016). Linking composite scores: Effects of anchor test length and content representativeness (Research Report No. RR-16-36). Educational Testing Service. 02/ets2.12122
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2009). The effects of different types of anchor tests on observed score equating. ETS.
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011a). Test score equating using a mini-version anchor and a midi anchor: A case study using SAT data. Educational Measurement, 48(4), 361-379.
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011b). Observed score equating using a mini-version anchor and an anchor with less spread of difficulty: A comparison study. Educational and Psychological Measurement, 71, 346–361.
Liu, J., & Walker, M. E. (2007). Score linking issues related to test content changes. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 109-134). Springer.
Livingston, S. A. (2004). Equating test scores (without IRT). ETS. df/LIVINGSTON.pdf
Lord, F. M. (1975). A survey of equating methods based on item characteristic curve theory (RB 75-13).
Educational Testing Service.
Lord, F. M. (1977). Practical applications of item characteristic curve theory. Educational Measurement, 14(2), 117-138.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.
Marengo, D., Miceli, R., Rosato, R., & Settanni, M. (2018). Placing multiple tests on a common scale using a post-test anchor design: Effects of item position and order on the stability of parameter estimates. Applied Mathematics and Statistics, 4, 1-14.
Moses, T., Deng, W., & Zhang, Y. L. (2010). The use of two anchors in nonequivalent groups with anchor test (NEAT) equating. ETS.
Moses, T., & Kim, S. (2007). Reliability and the nonequivalent groups with anchor test design (RR-07-16). ETS.
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S.,... Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71.
Page, M. J., McKenzie, J. E., & Higgins, J. P. T. (2018). Tools for assessing risk of reporting biases in studies and syntheses of studies: A systematic review. BMJ open, 8(3), 1-16. 36/bmjopen-2017-019703
Petersen, N. S. (2007). Equating: best practices and challenges to best practices. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 59-72). Springer.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L.
Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). Macmillan.
Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear score equating models. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 71–135). Academic.
Puhan, G. (2010). A comparison of chained linear and poststratification linear equating under different testing conditions. Educational Measurement, 47(1), 54-75. .00099.x
Ricker, K. L., & von Davier, A. A. (2007). The Impact of anchor test length on equating results in a nonequivalent groups design. ETS.
Ryan, J., & Brockmann, F. (2018). A practitioner’s introduction to equating with primers on classical test theory and item response theory. The Council of Chief State School Officers. fault/files/201806/A%20Practitioners%20Introduction%20to%20Equating%20revised%20edition.pdf
Santos, C. M. C., Pimenta, C. A. M., & Nobre, M. R. C. (2007). The PICO strategy for the research question construction and evidence search. Rev Latino-am Enfermagem, 15(3), 508–5011. 90/s0104-11692007000300023
Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P): elaboration and explanation. BMJ, 349:g7647. /bmj.g7647
Shea, J. A., & and Norcini, J. J. (1995). Licensure testing: Purposes, procedures, and practices. University of Nebraska-Lincoln.
Sinharay, S. (2017). On the choice of anchor test in equating. Educational Measurement: Issues and Practice, 37(4), 1-6.
Sinharay, S., Haberman, S., Holland, P., & Lewis, C. (2012). A note on the choice of an anchor test in equating. ETS.
Sinharay, S., & Holland, P. W. (2006a). The correlation between the scores of a test and an anchor test. ETS.
Sinharay, S., & Holland, P. W. (2006b). Choice of anchor test in equating. ETS. 002/j.2333-8504.2006.tb02040.x
Sinharay, S., & Holland, P. W. (2007). Is it necessary to make anchor tests mini-versions of the tests being equated or can some restrictions be relaxed? Educational Measurement, 44, 249–275. .1111/j.1745-3984.2007.00037.x
Sirriyeh, R., Lawton, R., Gardner, P., & Armitage, G. (2011). Reviewing studies with diverse designs: the development and evaluation of a new tool. Evaluation in Clinical Practice, 18(4), 746-752.
Suh, Y., Morch, A. A., Kane, M. T., & Ripkey, D. R. (2009). An empirical comparison of five linear equating methods for the NEAT design. Measurement: Interdisciplinary Research and Perspectives, 7(3), 147-173.
Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small sample under the NEAT design: A simulation study (Doctoral Dissertation, University of North Carolina).
Tai, J., Ajjawi, R., Bearman, M., & Wiseman, P. (2020). Conceptualizations and measures of student engagement: A worked example of systematic review. In O. Zawacki-Richter, M. Kerres, S. Bendenlier, M. Bond & K. Buntins (Eds.), Systematic reviews in educational research (pp. 91-110). Springer.
Trierweiler, T. J., Lewis, C., & Smith, R. L. (2016). Further study of the choice of anchor tests in equating. Educational Measurement, 53, 498–518.
van der Linden, W. J., & Wiberg, M. (2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34(8), 620-640.
von Davier, A. A. (2008). New results on the linear equating methods for the non-equivalent-groups design. Educational and Behavioral Statistics, 33(2), 186-203.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating.
Wallin, G., Haggstrom, J., & Wiberg, M. (2021). How important is the choice of bandwidth in kernel equating? Applied Psychology Measurement, 45(7-8), 518-535. 1040486
Wei, H. (2010, May). Impact of non-representative anchor items on scale stability [Paper presentation]. National Council on Measurement in Education, Denver, Pearson.
Yang, W. L., & Houang, R. T. (1996, April). The effect of anchor length and equating method on the accuracy of test equating: comparison of linear and IRT-based equating using an anchor-item design [Paper presentation]. American Educational Research Association, New York, NY.
Yi, H. S. (2009). Evaluating the performance of non-equivalent groups anchor test equating under various conditions of anchor test construction. Educational Evaluation, 22(3), 847-869.
Zhang, M., & Kolen, M. J. (2013). Effect of the number of common items on equating precision and estimation of the lower bound to the number of common items needed. Center for Advanced Studies
in Measurement and Assessment (CASMA).