Validation in Statistics and Machine Learning

6-7 October 2010

Scope and Topics

In statistics and machine learning, the evaluation of algorithms typically relies on their performance on data. This is because, in contrast to a theoretical guarantee (e.g. a consistency result), it is in general not possible to prove that an algorithm performs well on a particular (unseen) data set. Therefore, it is of vital importance that we ensure the reliability of data-based evaluations. This requirement poses a wide range of open research problems and challenges. These include

the lack of a ground truth to validate results in real-world applications,
the high instability of empirical results in many settings,
the difficulty to make statistics and machine learning research reproducible,
the general over-optimism of published research findings due pre-publication optimization of the algorithms and publication bias.

This workshop brings together scientists from statistics, machine learning, and their application fields to tackle these challenges. The workshop serves as a platform to critically discuss current shortcomings, to exchange new approaches, and to identify promising future directions of research.