Comparing the Methods of Assigning Optimum Weights to the Components of Composite Tests

Document Type : Original Article



An important issue in obtaining the score in a composite test is the status of combining to combine different component scores of the test to compute the total scores of examinees. These weights should be selected in a manner that not only considers the psychometric properties of each component and their determining elements, but also minimizes the difference between the observed score and the real score of each examinee which explains his/her real ability. In other words, the framework of decision-making is designed with respect to different considerations such as validity, test times, reliability, etc.
There have been suggestions for obtaining to obtain the maximum reliability of composite scores in the last few decades. These include the implicit approach and the explicit approach. The implicit approach involves adding the raw scores and using IRT model. The explicit approach involves weighting the components by the difficulty of the items, assigning the weights to component scores based on the reliability measures of the components, and weighting the components by maximizing the validity of the composite scores. In this paper, we introduce the approach of obtaining the maximum reliability in Classical Test Theory and Item Response Theory. Besides considering the pros and cons of each method, we investigate the estimates of the reliability and the standard error of measurement of the composite scores for data in a simulation study


-     Bond, TG and Fox, CM (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd Ed), Mahwah, NJ, USA: Lawrence Erlbaum.
-     Childs, R, Eligie, S, Gadalla, T, Traub, R and Jaciw, A. (2004). IRT-linked standard errors of weighted composites, Practical Assessment. Research and Evaluation, 9, (13).
-     Cronbach, L, Glesser, G, Nanda, H and Rajaratnam, N (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles, Chichester: Wiley.
-     Feldt, L and Brennan, R (1989). Reliability, Educational Measurement(3rd Edition, R Linn Ed), the American Council on Education, MacMillan, pp. 105-146.
-     Gill, T and Bramley, T (2008). Using simulated data to model the effect of inter-marker correlation on classification, Research Matters: A Cambridge Assessment Publication, 5, pp. 29-36.
-     Hambleton R and Swaminathan, H. (1983). Item response theory: Principles and applications, the Netherlands: Kluwer-Nijoff.
-     Hambleton, R, Swaminathan, H, and Rogers, J (1991). Fundamentals of Item Response Theory, Newbury Park, Ca, 12, pp. 177- 184.
-     Runder, L (2001). Informed test component weighting, Educational Measurement: Issues and Practice, 20, pp. 16-19.
-     Wang, M and Stanley, J (1970). Differential weighting: A review of methods and empirical studies, Review of Educational Research, 40, pp. 663-705.
-     Webb, N; Shavelson, R and Haertel, E. (2007). Reliability coefficient and generalizability theory, Handbooks of Statistics 26: Psychometrics (C Rao and S Sinharay Eds), pp. 81-120.
-     Wright, B and Masters G (1982). Rating scale analysis, Rasch Measurement, Chicago, IL, USA: MESA Press.
-     Wu, M and Adams, R (2006). Modelling mathematics problem solving item responses using a multidimensional IRT model, Mathematics Education Research Journal, 18, pp. 93-113.