Smoothing methods role in raw scores Non-linear transformation to Normalized scale scores

Document Type : Original Article

Authors

university of tehran psychology and education faculity

Abstract

In order to the better interpreting and comparing scores in test batteries the raw scores in each test  are converted to a common scale that called scale score. One of the prevalent methods to transform raw scores to scale scores is normalizing. In this research to investigate the role of frequency pre-smoothing and score post-smoothing in normalizing scaling method we used 10000 random simulated sample data and 10000 random real sample data from Iran university entrance exam applicants. The role of smoothing methods in normal scaling method has been analyzed by conditional standard error of measurement that called CSEM, frequency charts and statistical indexes like moments. The results showed that reliability coefficient for all scaling methods are high, but analyzing charts, moments and conditional standard error of measurement illustrated that normalized scale score obtained from the frequencies smoothing method are the more accurate and fewer errors, Furthermore using pre-smoothing lead to reduce score error undulation.  

Keywords


ACT. (2014). The ACT technical manual. Retrieved from www.act.org
Allen, M. J. & Wendy, Y. M. (1979). Introduction to Measurement Theory. California: Cole publishing company.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). Standards for educational and psychological testing. Washington, DC: American Psychological Association.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorridike (Ed.). Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Council on Education. (Reprinted as 'W. A. Angoff, Scales, norms, and equivalent scores'. Princeton, NJ: Educational Testing Service, 1984.).
Brennan, Robert L. & Lee, Won-Chan (1999). Conditional Scale-Score Standard Errors of Measurement under Binomial and Compound Binomial Assumptions. Educational and Psychological Measurement, 59 (1), 5 – 24.
Brooks, G. P. & Johnson, G. A. (2003). TAP: Test Analysis Program. Applied Psychological Measurement, 27 (4), 303-304.
Brooks, G. P. & Johnson, G. A. (2014). TAP: Test Analysis Program version (14.7.4) [computer software]. retrieved from http://www.ohio.edu/people/brooksg/software.htm.
Chang, S. W. (2006). Methods in Scaling the Basic Competence Test. Educational and Psychological Measurement, 66 (6), 907-929.
De Boor, C. (2001). A Practical Guide to Splines (Revised Edition). New York: Springer. pp. 207–214.
Dorans N. J.; Pommerich, M. & Holland, P. W. (2007). A Framework and History for Score Linking. In Holland P. W. (Eds.). Linking and Aligning Scores and Scales (pp 5-30). New York: Springer.
Feldt, L. S. & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105-146). New York, NY: Macmillan.
Feldt, L. S. & Quails, A. L. (1996). Estimation of measurement error variance at specific score levels. Journal of Educational Measurement, 33, 141-156.
Gulliksen, H. (1950). Theory of mental test. New York: John Wiley & sons.
Haertel, H. E. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65-86). CT: American Council on Education and Praeger.
Iowa Assessment (2016). Iowa Test Of Basic Skills, Retrieved: www.itp.education.uiowa.edu
Kolen, M. J., Hanson, B. A. & Brennan, R. L. (1992). Conditional standard errors of measurement of scale scores. Journal of Educational Measurement, 29, 285-307.
Kolen, M. J. & Hanson, B. A. (1989). Scaling the ACT Assessment. In R. L. Brennan (Ed.), Methodology used in scaling the ACT Assessment and P-ACT+ (pp. 35-55). Iowa City, IA: American College Testing Program.
Kolen, M. J. (1991). Smoothing methods for estimating test score distributions. Journal of Educational Measurement, 28, 257-282.
Kolen, M. J. & Brennan, R. L. (2014). Test Equating, Scaling and Linking, 3rd Ed. New York: Springer.
Kolen, M. J.; Wang, T. & Lee, W. Chon (2012). Conditional Standard Errors of Measurement for Composite Scores Using IRT. International Journal of Testing, 12, 1-20.
Lee, W. C.; Brennan, R. L. & Kolen, M. J. (2000). Estimators of Conditional Scale-Score Standard Errors of Measurement: A Simulation Study. Journal of Educational Measurement, 37, 1–20. 
Lord, F. M. (1955). Estimating Test Reliability. ETS Research Bulletin Series, 1955, 1–17. 
Lord, F. M. (1965). A strong true-score theory with applications. Psychometrika, 30,239-270.
Lord, F. M. (1969). Estimating true-score distributions in psychological testing (An empirical Bayes estimation problem). Psychometrika, 34, 259-299.
Liu, C. (2011). A comparison of statistics for selecting smoothing parameters for log-linear pre-smoothing and cubic spline post-smoothing under a random groups design (Doctoral Dissertation). Available from ProQuest Dissertations and Theses database. (UMI No. 3461186).
Mood, M. A.; Gray bill, A. F. & Boes, C. D. (2008). Introduction to the Theory of Statistics. C.A: McGraw-Hill.
Moses, T. & Holland, P. W. (2009). Selection strategies for univariate log-linear smoothing models and their effect on equating function accuracy. Journal of Educational Measurement, 46, 159–176.
SAT (2015). SAT technical manual. Retrieved from collegereadiness.collegeboard.org.
Woodruff, D.; Traynor, A.; Cui, Z. & Fang, Y. (2013). A Comparison of Three Methods for Computing Scale Score Conditional Standard Errors of Measurement.  ACT Research report series, no.7.