书籍介绍:
This book signals the maturity of the methodologies and technology that surround test score equating and the acceptance of these methodologies and of the open-source software by the testing industry’s experts and academia. Test equating is a statistical procedure for adjusting the test form differences in standardized testing. Test equating methodologies originated in testing companies many decades ago. Due to the pragmatic operational perspective, many of these methodologies had lacked rigorous theoretical background and research for many years. Then in the 1980s, test score equating started to receive the appropriate attention from researchers and academia (Holland and Rubin 1982). Several notable journal articles and books were written at that time, and the field was brought forward significantly. The availability of personal and fast computers also facilitated the implementation of the established equating methods. For example, psychometricians and technology specialists at Educational Testing Service developed GENASYS, a very comprehensive, modular, and sophisticated software system to support the large-scale application of test score analysis and equating with all the available methodologies at that time. The first edition of the Kolen and Brennan (1995) book became a landmark in equating. A second wave of theoretical interest and the development of new methods took place around 2005 (Kolen and Brennan 2004; von Davier et al. 2004; Dorans et al. 2007).
Since 2005, there has been a noticeable shift in test equating. First, there has been an increasing interest in and use of test equating all over the world. Second, as it was illustrated in the edited volume of von Davier (2011a), there has been a renewed
interest in searching for better methodologies that improve the score conversion in terms of bias (the observed-score equating framework, various kernel methods, local equating), in terms of error (kernel equating, equating with exponential families), in terms of hypothesis testing (Lagrange multiplier tests for item response theory (IRT) methods, Wald test for kernel methods), in terms of approximations for small sample equating, and in terms of monitoring the quality control of equating results (using time series approaches); third, there has been an increased interest in equating by university professors in statistics and educational measurement; and more importantly, there has been a significant software development using different software environments, from SAS to R. Moreover, the rapid changes in the technology made the upgrade of proprietary software challenging; the
society became more appreciative of the open-source environment that facilitates transparency, research, and community work.
。。。。。。