Utilizing the Decision-Making Approach to Rank Composite Score Construction Methods

Document Type : Original Article

Author

Assistant professor, Education departement, Shahid Chamran University of Ahvaz, Ahvaz, Iran

10.22034/emes.2022.550667.2366

Abstract

Objective: Battery Test is usually used for decision-making in education and Admission decisions. There are several methods to construct composite scores so each method makes a different effect on the admission decision. However, which decision makes fewer errors?
Methods: present research has been conducted to rank different methods of composite score construction based on their CSEM. 10,000 random sample Data from participants of the Iran university entrance exam were used to rank six composite score construction methods. The participants' raw score arises from summing up correct responses. Normalizing and Arcsine transformation methods were used to Construct scale scores, also we used nominal, effective and Shannon weighting schemes to combine subtest scale scores. In order to rank composite score construction methods, a new approach was employed based on the MADM decision-making approach
Results: The results revealed that the methods that use Arcsine to construct scale scores and nominal or Shannon weighting schemes to combine subtest scale scores have taken the higher ranks, and less error will occur at admission decision.
Conclusion: Using the Arc Sine scale score, due to less error and easier conversion, can help the interpretation and accuracy of composite test scores, while different weighting methods do not affect the accuracy of scores and in accordance with the test conditions or test builders' decision can be used.

Keywords


Allen, M. J., & Wendy, Y. M. (1979). Introduction to Measurement Theory. California: Cole publishing company.
Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Council on Education. (Reprinted as 'W. A. Angoff, Scales, norms, and equivalent scores'. Princeton, NJ: Educational Testing Service, 1984.)
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, (2014). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Azar, A., Rajabzade A. (2015). Applied Decision Making MADM Approach. Tehran: Negah Danesh.
Brennan, Robert L., Lee, Won-Chan. (1999) Conditional Scale-Score Standard Errors of Measurement under Binomial and Compound Binomial Assumptions, Educational and Psychological Measurement, Vol 59, Issue 1, pp. 5 – 24.
Brooks, G. P., Johnson, G. A.(2014). TAP: Test Analysis Program [computer software]. Chicago.
Chang, S. W. (2009). Choice of weighting schemes in forming the composites, bulletin of educational psychology,40(3), 489-510, national Taiwan normal university, Taipei, Taiwan, R.O.C.
Chang, S. W. (2006), Methods in Scaling the Basic Competence Test, Educational and Psychological Measurement, 66(6), 907-929.
Dorans N. J., Pommerich, M. & Holland P. W. (2007). A Framework and History for Score Linking. In Holland P. W. (Eds.), Linking and Aligning Scores and Scales (pp 5-30). New York: Springer.
De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). pp. 207–214, New York: Springer.
Feldt, L. S. (2004). Estimating the reliability of a test battery composite or a test score based on weighted item scoring. Measurement and Evaluation in Counseling and Development, 37(3), 184-190.
Gulliksen, H. (1950). Theory of mental test. New York: John Wiley & sons.
Gronlund. N. E. & Linn R. T. (1990), measurement and evaluation in teaching. New York: Macmillan.
Haertel, H. E. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4rd. ed., pp. 65-86). CT: American Council on Education and Praeger.
Iowa Assessment (2016). Iowa Tests of Basic Skills, Retrieved itp.education.uiowa.edu
Ishizaka, A., Nemery, P. (2013). Multi-criteria Decision Analysis: Methods and Software, New York: John Wiley & sons.
Kane, M., & Case, S. M. (2004). The reliability and validity of weighted composite scores. Applied Measurement in Education, 17, 221-240.
Kolen, M. J., Hanson, B. A., & Brennan, R. L. (1992). Conditional standard errors of measurement of scale scores. Journal of Educational Measurement, 29, 285-307.
Kolen, M. J., & Hanson, B. A. (1989). Scaling the ACT Assessment. In R. L. Brennan (Ed.), Methodology used in scaling the ACT Assessment and P-ACT+ (pp. 35-55). Iowa City, IA: American College Testing Program.
Kolen, M. J., Zeng, L., & Hanson, B. A. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129-140.
Kolen, M.J. (1991). Smoothing methods for estimating test score distributions. Journal of Educational Measurement, 28, 257-282.
Kolen, M. J., & Brennan, R. L. (2014). Test Equating, Scaling and Linking (3rd Ed.). New York: Springer.
Kolen, M.J. (2006), Scaling and norming. In R. L. Brennan (Ed.), Educational measurement (4rd ed., pp. 236-241). CT: American Council on Education, and Praeger.
Kolen, M. J, Wang, T., Lee, W. Chon. (2012), Conditional Standard Errors of Measurement for Composite Scores Using IRT, International Journal of Testing, 12, 1-20.
Lord, F. M., & Novick, M. R. (1967). Statistical theory of mental test scores. MA: Adisson-wesley.
Nunnally, J. c., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill.
Magnusson, D. (1967). Test theory. MA: Addison-Wesley.
Nitko, A. J. (2001), Educational assessment and evaluation (3rd Ed.). New Jersey: Merrill prentice-hall.
Pei, L. K., & Maller, S. J. (2006). Monte Carlo simulation study of differential weights on composite reliability and validity. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: American Council on Education, and Macmillan.
Price, R. L., Raju, N., Lurrie, A. Wilkins, C. & Zhu, J. (2006). Conditional standard errors of measurement for composite scores on the Wechsler Preschool and Primary Scale of Intelligence-Third Edition, Psychological Reports,98,237-252 
Rudner, L. M. (2001). Informed test component weighting. Educational Measurement: Issues and Practice, 20(1), 16-19.
Sutton, R. (2004). Teaching under high-stakes testing: Dilemmas and decisions of a teacher educator. Journal of Teacher Education, 55(5), 463-475.
Testing, National Organization. (2015, Sep 01). NOET web page. Retrieved from www.sanjesh.org
The ACT, The ACT technical manual (2014), Retrieved www.act.org
The SAT, SAT technical manual (2015), Retrieved collegereadiness.collegeboard.org.
Wang, T. (1998). Weights that maximize reliability under a congeneric model. Applied psychological measurement, 22(2), 179-187.
Wang, M. W., & Stanley, J. C. (1970). Differential weighting: A review of methods and empirical studies. Review of Educational Research, 4, 663- 705.
Woodruff, D., Traynor, A., Cui, Z., Fang, Y., (2013). A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement, ACT Research report series, no.7. Retrieved from www.act.org.
Zolfagharnasab, S., Khodaei, E., Yadegarzadeh, G. (2013). Optimum Weighting to Entrance Subtests and Their Items to Make Composite Score. Educational Measurement and Evaluation Studies, 3(4), 79-104.