Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing

Sebastian Born; Aron Fink; Christian Spoden; Andreas Frey

Download from

dx.doi.org

More download options

Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing

Sebastian Born, Aron Fink, Christian Spoden & Andreas Frey

Frontiers in Psychology 10 (2019) Copy BIBT_EX

Abstract

The increasing digitalization in the field of psychological and educational testing opens up new opportunities to innovate assessments in many respects (e.g., new item formats, flexible test assembly, efficient data handling). In particular, computerized adaptive testing provides the opportunity to make tests more individualized and more efficient. The newly developed continuous calibration strategy (CCS) from Fink, Born, Spoden, and Frey (2018) makes it possible to construct computerized adaptive tests in application areas where separate calibration studies are not feasible. Due to the goal of reporting on a common metric across test cycles, the equating is crucial for the CCS. The quality of the equating depends on the common items selected and the scale transformation method applied. Given the novelty of the CCS, the aim of the study was to evaluate different equating setups in the CCS and to derive practical recommendations. The impact of different equating setups on the precision of item parameter estimates and on the quality of the equating was examined in a Monte Carlo simulation, based on a fully crossed design with the factors common item difficulty distribution (bimodal, normal, uniform), scale transformation method (mean/mean, mean/sigma, Haebara, Stocking-Lord), and sample size per test cycle (50, 100, 300). The quality of the equating was operationalized by three criteria (proportion of feasible equatings, proportion of drifted items, and error of transformation constants). The precision of the item parameter estimates increased with increasing sample size per test cycle, but no substantial difference was found with respect to the common item difficulty distribution and the scale transformation method. With regard to the feasibility of the equatings, no differences were found for the different scale transformation methods. However, when using the moment methods (mean/mean, mean/sigma), quite extrem levels of error for the transformation constants A and B occurred. Among the characteristic curve method the performance of the Stocking-Lord method was slightly better than for the Haebara method. Thus, while no clear recommendation can be made with regard to the common item difficulty distribution, the characteristic curve methods turned out to be the most favorable scale transformation methods within the CCS.

Cite

Plain text

BibTeX

Formatted text

Zotero

EndNote

Reference Manager

RefWorks

Options

Mark as duplicate

Find it on Scholar

Request removal from index

Revision history

Edit

Keywords

Item response theory (IRT) continous calibration equating simulation computerized adaptive testing (CAT)

Reprint years

DOI

10.3389/fpsyg.2019.01277

My notes

Analytics

Added to PP
2019-06-07

Downloads
5 (#1,510,250)

6 months
2 (#1,232,442)

Historical graph of downloads

How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references

Applied ethics	Epistemology	History of Western Philosophy	Meta-ethics	Metaphysics	Normative ethics
Philosophy of biology	Philosophy of language	Philosophy of mind	Philosophy of religion	Science Logic and Mathematics	More ...

Evaluating Different Equating Setups in the Continuous Item Pool Calibration for Computerized Adaptive Testing

Abstract

Categories

Keywords

Reprint years

DOI

Links

PhilArchive

External links

Through your library

My notes

Similar books and articles

Analytics

Citations of this work

References found in this work