References
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). American Council on Education.
Angoff, W. H. (1984). Scales, norms, and equivalent scores. Educational Testing Service.
Balla, J. (1988). The effects of reducing correlation of external anchors on test equating methods for the equivalent groups and non-equivalent groups designs. International Journal of Educational Research, 12(4), 409-425. https://doi.org/10.1016/0883-0355(88)90034-1
Braun, H. I., & Holland, P. W. (1982). Observed score test equating: A mathematical analysis of some ETS equating procedures. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 9-49). Academic.
Brennan, R. L., Wang, T., Kim, S., & Seol, J. (2009). Equating recipes. CASMA. https://education.uiowa.edu/sites/education.uiowa.edu/files/2021-11/casma-monograph-1.pdf
Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28(4), 227–246. https://doi.org/10.1177/0146621604265031
Dorans, N. J., Liu, J., & Hammond, S. (2008). Anchor test type and population invariance: Anexploration across subpopulations and test administrations. Applied Psychological Measurement, 32(1), 81–97. https://doi.org/10.1177/0146621607311580
Dorans, N. J., Moses, T. P., & Eignor, D. R. (2011). Equating test scores: toward best practices. In A. A. von Davier (Ed.), Statistical models for test equating, scaling and linking (pp. 21-42). Springer.
Fenton, L., Lauckner, H., & Gilbert, R. (2015). The QATSDD critical appraisal tool: comments and critiques. Evaluation in clinical Practice, 21, 1125-1128. https://doi.org/10.1111/jep.12487
Gonzalez, J., & Wiberg, M. (2017). Applying test equating method using R. Springer.
Haberman, S., & Dorans, N. J. (2009, April). Scale consistency, drift, stability: Definitions, distinctions, and principles [Paper presentation]. National Council on Measurement in Education, San Diego, CA. http://www.ets.org/legal/index.html
Hagge, S. L. (2010). The impact of equating method and format representation of common items on the adequacy of mixed-format test equating using nonequivalent groups (Doctoral Dissertation, University of Iowa). https://doi.org/10.17077/etd.bc5ticit
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-itemequating with nonrandom groups. Educational Measurement, 22(3), 197–206. http://www.jstor.org/stable/1435033
Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices (2nd ed.). Springer.
Kolen, M. J. & Brennan, R. L. (2014). Test equating, scaling, and linking (3rd ed.). Springer.
Lasserson, T. J., Thomas, J., & Higgins, J. P. T. (2019). Starting a review. In J. P. T. Higgins, J. Thomas, J. Chandler, M. Cumpston, T. Li, M. J. Page & V. A. Welch (Eds.), Cochrane Handbook for systematic review of interventions (2nd ed., pp. 1-12). Wiley-Blackwell.
Lin, P., Dorans, N., & Weeks, J. (2016). Linking composite scores: Effects of anchor test length and content representativeness (Research Report No. RR-16-36). Educational Testing Service. https://doi.org/10.10 02/ets2.12122
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011a). Test score equating using a mini-version anchor and a midi anchor: A case study using SAT data. Educational Measurement, 48(4), 361-379. https://doi.org/10.1111/j.1745-3984.2011.00150.x
Liu, J., Sinharay, S., Holland, P. W., Feigenbaum, M., & Curley, E. (2011b). Observed score equating using a mini-version anchor and an anchor with less spread of difficulty: A comparison study. Educational and Psychological Measurement, 71, 346–361. https://doi.org/10.1177/0013164410375571
Liu, J., & Walker, M. E. (2007). Score linking issues related to test content changes. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 109-134). Springer.
Lord, F. M. (1977). Practical applications of item characteristic curve theory. Educational Measurement, 14(2), 117-138. http://doi.org/10.2307/1434011
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum.
Marengo, D., Miceli, R., Rosato, R., & Settanni, M. (2018). Placing multiple tests on a common scale using a post-test anchor design: Effects of item position and order on the stability of parameter estimates. Applied Mathematics and Statistics, 4, 1-14. http://doi.org/10.3389/fams.2018.00050
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S.,... Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71
Page, M. J., McKenzie, J. E., & Higgins, J. P. T. (2018). Tools for assessing risk of reporting biases in studies and syntheses of studies: A systematic review. BMJ open, 8(3), 1-16. https://doi.org/10.11 36/bmjopen-2017-019703
Petersen, N. S. (2007). Equating: best practices and challenges to best practices. In N. J. Dorans, M. Pommerich & P. W. Holland (Eds.), Linking and aligning scores and scales (pp. 59-72). Springer.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L.
Linn (Ed.), Educational measurement (3rd ed., pp. 221–262). Macmillan.
Petersen, N. S., Marco, G. L., & Stewart, E. E. (1982). A test of the adequacy of linear score equating models. In P. W. Holland & D. B. Rubin (Eds.), Test equating (pp. 71–135). Academic.
Shamseer, L., Moher, D., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., Shekelle, P., & Stewart, L. (2015). Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P): elaboration and explanation. BMJ, 349:g7647. https://doi.org/10.1136 /bmj.g7647
Sirriyeh, R., Lawton, R., Gardner, P., & Armitage, G. (2011). Reviewing studies with diverse designs: the development and evaluation of a new tool. Evaluation in Clinical Practice, 18(4), 746-752. https://doi.org/10.1111/j.1365-2753.2011.01662.x
Suh, Y., Morch, A. A., Kane, M. T., & Ripkey, D. R. (2009). An empirical comparison of five linear equating methods for the NEAT design. Measurement: Interdisciplinary Research and Perspectives, 7(3), 147-173. https://doi.org/10.1080/15366360903418048
Sunnassee, D. (2011). Conditions affecting the accuracy of classical equating methods for small sample under the NEAT design: A simulation study (Doctoral Dissertation, University of North Carolina). https://libres.uncg.edu/ir/uncg/listing.aspx?id=8164
Tai, J., Ajjawi, R., Bearman, M., & Wiseman, P. (2020). Conceptualizations and measures of student engagement: A worked example of systematic review. In O. Zawacki-Richter, M. Kerres, S. Bendenlier, M. Bond & K. Buntins (Eds.), Systematic reviews in educational research (pp. 91-110). Springer. https://doi.org/10.1007/978-3-658-27602-7
Trierweiler, T. J., Lewis, C., & Smith, R. L. (2016). Further study of the choice of anchor tests in equating. Educational Measurement, 53, 498–518. https://doi.org/10.1111/jedm.12128
van der Linden, W. J., & Wiberg, M. (2010). Local observed-score equating with anchor-test designs. Applied Psychological Measurement, 34(8), 620-640. https://doi.org/10.1177/0146621609349803
von Davier, A. A. (2008). New results on the linear equating methods for the non-equivalent-groups design. Educational and Behavioral Statistics, 33(2), 186-203. https://doi.org/10.3102/1076998608302633
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating.
Springer.
Wei, H. (2010, May). Impact of non-representative anchor items on scale stability [Paper presentation]. National Council on Measurement in Education, Denver, Pearson.
Yang, W. L., & Houang, R. T. (1996, April). The effect of anchor length and equating method on the accuracy of test equating: comparison of linear and IRT-based equating using an anchor-item design [Paper presentation]. American Educational Research Association, New York, NY. https://eric.ed.gov/?id=ED401308
Zhang, M., & Kolen, M. J. (2013). Effect of the number of common items on equating precision and estimation of the lower bound to the number of common items needed. Center for Advanced Studiesin Measurement and Assessment (CASMA). https://www.education.uiowa.edu/casma