During the last years, semi-supervised learning has emerged as an exciting new
direction in machine learning reseach. It is closely related to profound issues of how
to do inference from data, as witnessed by its overlap with transductive inference
(the distinctions are yet to be made precise).
At the same time, dealing with the situation where relatively few labeled training
points are available, but a large number of unlabeled points are given, it is directly
relevant to a multitude of practical problems where is it relatively expensive to
produce labeled data, e.g., the automatic classification of web pages. As a field,
semi-supervised learning uses a diverse set of tools and illustrates, on a small scale,
the sophisticated machinery developed in various branches of machine learning such
as kernel methods or Bayesian techniques.
As we work on semi-supervised learning, we have been aware of the lack of
an authoritative overview of the existing approaches. In a perfect world, such an
overview should help both the practitioner and the researcher who wants to enter
this area. A well researched monograph could ideally fill such a gap; however, the
field of semi-supervised learning is arguably not yet sufficiently mature for this.
Rather than writing a book which would come out in three years, we thus decided
instead to provide an up-to-date edited volume, where we invited contributions by
many of the leading proponents of the field. To make it more than a mere collection
of articles, we have attempted to ensure that the chapters form a coherent whole
and use consistent notation. Moreover, we have written a short introduction, a
dialogue illustrating some of the ongoing debates in the underlying philosophy of
the field, and we have organized and summarized a comprehensive benchmark of
semi-supervised learning.
Benchmarks are helpful for the practitioner to decide which algorithm should be
chosen for a given application. At the same time, they are useful for researchers
to choose issues to study and further develop. By evaluating and comparing the
performance of many of the presented methods on a set of eight benchmark
problems, this book aims at providing guidance in this respect. The problems are
designed to reflect and probe the different assumptions that the algorithms build
on. All data sets can be downloaded from the book web page, which can be found
at
http://www.kyb.tuebingen.mpg.de/ssl-book/.
Finally, we would like to give thanks to everybody who contributed towards the
success of this book project, in particular to Karin Bierig, Sabrina Nielebock, Bob
Prior, to all chapter authors, and to the chapter reviewers.