A Short Guide to Item Response Theory ModelsOverviewAuthors:
Modern Item Response Theory (IRT) provides a sophisticated framework for
measuring latent traits, such as abilities, knowledge, or attitudes. It is widely used in
education, psychology, healthcare, and other fields where measurement is essential.
Unlike traditional testing approaches, item response theory models account for the
varying difficulty levels of test items and the diverse abilities of test-takers. They are
designed to investigate the relationship between latent traits and observed responses
to test items with the aim to enhance the efficiency of measurement.
A main objective of the book is to provide a structured survey of diverse models
that have been put forth, emphasizing both their distinctions and commonalities. It
serves as an introductory guide for beginners while also serving as a resource for
those seeking an overview over the plethora of available models.
The book covers foundational concepts, essential principles, and practical applications of Item Response Theory. In presenting individual models, brevity is maintained, with a focus on conciseness rather than formal complexity. The objective
is to be accessible to a broad audience, emphasizing fundamental concepts and the
interrelationships between models. Starred sections contain more details for the more
interested reader.
Some of the topics that are covered in this book are:
• Foundations of IRT: We begin with the binary Rasch model, illustrating the fundamental principles that underpin Item Response Theory, from latent trait basics to
mathematical formulations.
• Ordinal IRT Models: The key idea is to model the probability of an individual’s
response falling into a specific category for each test item. This modeling is based
on the assumption that individuals possess a latent trait (ability or attribute) that
influences their likelihood of responding in a particular category. The ordinal
nature of the responses implies that the categories have a natural ordering.
• Response Models for Count Data: It is an extension of traditional Item Response
Theory that is designed to handle situations where the outcome variable is a count
(i.e., a non-negative integer). While conventional IRT models are typically used
for binary or ordinal outcomes, IRT for count data accommodates discrete counts,
v
vi Preface
making it suitable for scenarios where the responses are, for example, the number
of correct answers on a test or the frequency of certain behaviors.
• Extended versions of ordinal response models that account for response styles are
included.
• Thresholds Model: The thresholds model is an overarching model that contains
various common latent trait models as special cases and provides a genuine latent
trait model for continuous responses.
• Tree-based Item Response Models: They include the special topic of Item
Response Trees, which are a type of modeling approach that combines aspects of
binary or polytomous Item Response Theory and decision tree methodologies. The
approach is particularly useful when dealing with complex and multidimensional
latent traits.
• Differential Item Functioning (DIF): DIF refers to a situation where an item on a
test or assessment functions differently for different groups of individuals, even
when those individuals have the same level of the latent trait being measured. DIF
in particular occurs when subgroups of individuals with the same underlying trait
level have different probabilities of responding correctly to a particular test item.
• Explanatory Item Response Models: They are a class of statistical models that
extend traditional Item Response Theory by incorporating explanatory variables
or covariates into the model. The primary goal is to investigate and account
for the influence of additional factors on individuals’ responses to test items.
These models are particularly useful when researchers want to understand how
external variables affect the probability of a certain response category, beyond the
individual’s latent trait.
The focus of the book is on modern latent trait theory models which provide
measurement tools that clearly separate between person abilities and item parameters.
The much older classical test theory is considered only briefly in particular to clarify
the distinction between the differing approaches to measurement and investigate their
relationship. The book does not give in-depth analyses of specific data sets but the
application of models is illustrated by using several data sets from differing areas.
They show how models can be fitted and compared.
In recent years R has become a widely used analysis tool in psychology and the
social sciences. R is open source and various methods have been implemented in
useful packages. All the examples in the book have been computed by using R. For
most of the examples code is made accessible on Github (https://github.com/Gerhar
dTutz/Item-Response-Models). In the book itself only snippets of code are given
but the book can be read without being familiar with R coding. No introduction to
R is included but there are many texts that are easily accessible. It should also be
mentioned that continually new packages are implemented, which is also a reason
why not much space is devoted to R code. Better programs might be around the
corner.
The primary emphasis is placed on unidimensional models which assume that
the latent trait being measured is unidimensional, meaning that it can be adequately
represented by a single underlying construct. These models are useful when the goal is
Preface vii
to measure a single latent trait, such as ability, knowledge, or a specific characteristic.
Multi-dimensional models are considered as tools to account for specific behavioral
characteristics of respondents like response styles, but also then it is assumed that
the trait that is to be measured is unidimensional.
The book is aimed at applied statisticians, researchers working in psychometrics,
for educators, graduate students, and anyone curious about modeling strategies that
enhance the precision and validity of their measurement tools and have to select
among the multitude of available modeling approaches.
Essential features that distinguish the presentation from other books on latent trait
models include the following:
• The role of binary models as foundational elements of latent trait models is emphasized, as many models currently in use can be derived from these binary models.
It is a structured presentation of models starting with binary models, which are at
the heart of almost all IRT models.
• In particular, polychotomous models not only incorporate binary models but can
also be constructed from them. This construction provides a clearer understanding
of polychotomous models compared to previous representations and results in a
simple taxonomy.
• Ordinal models that account for response styles are considered in a separate
chapter.
• The thresholds model is presented as a general model that allows the consideration
of discrete and continuous responses within a closed framework. It also serves as
a foundational model for the more traditional classical test theory.
• Hierarchical tree-based latent trait models, typically not included in older
textbooks, are considered.
• Latent trait models for count data are included.
• The concept of specific objectivity, also referred to as parameter separability,
is examined with a distinction between theoretical and empirical parameter
separability.
• Furthermore, this exploration encompasses more recently developed methods that
emerged at the intersection of advanced data analysis and machine learning, such
as the regularization of estimates.
Thanks to Pascal Jordan, Can Gürer, Martin Spieß, Maria Iannario, Boris Forthmann, Clemens Draxler and Ingrid Maurer for valuable comments that helped to
improve the presentation.
Munich, Germany Gerhard Tutz
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 The Binary Rasch Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Alternative Representations and Uniqueness of Parameters . . . . . 12
2.3 Information Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Separability of Parameters and Model Derivations . . . . . . . . . . . . 15
2.4.1 Separability of Parameters and Specific Objectivity . . . . 16
2.4.2 Derivations and Model Motivations∗ . . . . . . . . . . . . . . . . . 18
2.5 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5.1 Joint Maximum Likelihood Estimation . . . . . . . . . . . . . . . 21
2.5.2 Conditional Maximum Likelihood Estimation . . . . . . . . . 21
2.5.3 Marginal Maximum Likelihood Estimation . . . . . . . . . . . 23
| 2.5.4 | Conditional Pairwise Maximum Likelihood
Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
23
2.5.5 Estimation of Person Parameter . . . . . . . . . . . . . . . . . . . . . 24
2.6 Testing the Rasch Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.1 Diagnostic Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6.2 Item Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.7 R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Extensions of the Rasch Model and Alternative Binary Models . . . . 37
3.1 Homogeneous Monotone Latent Trait Models . . . . . . . . . . . . . . . . 37
3.2 Models with Varying Slopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Three-Parameter Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Fitting of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
ix
x Contents
4 Ordinal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1 Binary Models as Building Blocks for Ordinal Models . . . . . . . . . 51
4.2 Graded Response Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.1 Properties of Graded Response Models . . . . . . . . . . . . . . 56
4.2.2 Sparser Location and Scale Models∗ . . . . . . . . . . . . . . . . . 60
4.3 Adjacent Categories Models and the Partial Credit Model . . . . . . 63
4.3.1 Partial Credit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.2 Sparser Parameterizations: The Rasch Rating
Scale Model∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Sequential Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5 Common Structure of Ordinal Models . . . . . . . . . . . . . . . . . . . . . . . 80
4.6 Nominal Models and a Family of Models . . . . . . . . . . . . . . . . . . . . 85
4.7 Models for Nonmonotone Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.8 Estimation of Polytomous Models . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.8.1 Marginal Maximum Likelihood Estimation . . . . . . . . . . . 90
4.8.2 Conditional Estimation for Partial Credit Model . . . . . . . 91
4.8.3 Estimation of the Sequential Model Using Binary
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.9 Further Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.10 Fitting of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5 Extended Ordinal Models: Accounting for Response Styles . . . . . . . . 97
5.1 Tendency to Middle or Extreme Categories in Partial
Credit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Uncertainty in Partial Credit Models . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4 Further Approaches to the Modeling of Response Styles . . . . . . . 109
5.5 Fitting of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6 The Thresholds Model: A Common Framework for Discrete
and Continuous Responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.1 The Thresholds Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2 Linear Difficulty Functions for Continuous Responses . . . . . . . . . 116
6.3 Non-linear Difficulty Functions for Continuous Responses . . . . . 120
6.4 Further Properties of Thresholds Models . . . . . . . . . . . . . . . . . . . . . 127
6.5 Likert Scales: Discrete or Continuous? . . . . . . . . . . . . . . . . . . . . . . 129
6.6 Flexible Difficulty Functions: A Basis Functions Approach∗ . . . . 132
6.7 Choice of Response and Difficulty Function . . . . . . . . . . . . . . . . . . 135
6.8 Combining Different Types of Items . . . . . . . . . . . . . . . . . . . . . . . . 137
6.9 Estimation and Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.10 Measuring Physical Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Contents xi
7 Classical Test Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.1 The Classical Test Theory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 Classical Test Theory and the Measurement Instrument . . . . . . . . 150
7.3 Thresholds Models and Classical Test Theory . . . . . . . . . . . . . . . . 152
7.3.1 Classical Test Theory and the Sum Score . . . . . . . . . . . . . 157
8 Response Models for Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.1 Rasch’s Poisson Count Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.2 Negative Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.3 Conway-Maxwell-Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.4 Count Thresholds Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.6 Fitting of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
9 Tree-Based Item Response Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.1 Binary Item Response Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2 More General Partition Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.3 Alternative Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.4 Modeling with Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
9.5 A Taxonomy of Models Including Tree-Based Models . . . . . . . . . 183
9.6 Fitting Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10 Differential Item Functioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
10.1 Non-IRT Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
10.2 IRT Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.3 Recursive Partitioning: Tree-Based Approaches . . . . . . . . . . . . . . . 200
10.4 Fitting of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11 Explanatory Item Response Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.1 Subject-Specific Covariates in Ordinal Models . . . . . . . . . . . . . . . 211
11.1.1 The Partial Credit Model with Covariates . . . . . . . . . . . . . 211
11.1.2 The Graded Response Model with Covariates . . . . . . . . . 216
11.2 Covariates in the Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . 220
11.3 Including Item-Specific Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.4 Fitting of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
12 R Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
13 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Author Index . . . . . . . . . . . . . . . . . . . . . .
Authors and Affiliations- Department of Statistics, LMU Munich, Munich, Germany
Gerhard Tutz
About the authorGerhard Tutz is Professor Emeritus at the Department of Statistics, LMU Munich, Germany. His research interests include multivariate statistics, item response theory and categorical data analysis.
Bibliographic InformationBook TitleA Short Guide to Item Response Theory Models
AuthorsGerhard Tutz
Series TitleStatistics for Social and Behavioral Sciences
DOIhttps://doi.org/10.1007/978-3-031-87271-6
PublisherSpringer Cham
eBook PackagesMathematics and Statistics, Mathematics and Statistics (R0)
Copyright InformationThe Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2025
Hardcover ISBN978-3-031-87270-9Published: 03 July 2025
Softcover ISBN978-3-031-87273-0Due: 17 July 2026
eBook ISBN978-3-031-87271-6Published: 01 July 2025
Series ISSN2199-7357
Series E-ISSN2199-7365
Edition Number1
Number of PagesXI, 255
Number of Illustrations60 b/w illustrations, 23 illustrations in colour
TopicsStatistical Theory and Methods, Psychometrics, Statistics for Social Sciences, Humanities, Law, Applied Statistics