Research Group "Stochastic Algorithms and Nonparametric Statistics"

Research Seminar "Mathematical Statistics" WS 2021/2022

  • Place: The seminar will be hybrid and realized via Zoom. Please note that the so-called ``3G rule" applies at Weierstrass Institute. Our lecture room ESH has according to hygiene recommendations only a capacity of 16 people. If you intend to participate you must register for our mailinglist with Andrea Fiebig ( Prior to each talk a doodle will be created where it is mandatory to sign in for attendance in person. Therefore, it is mandatory for those who want to participate in person to register (put your name in the list) using the doodle link sent by e-mail before the lecture. Please follow the streamed talk at , if 16 guests have already registered.
  • Time: Wednesdays, 10.00 a.m. - 12.30 p.m.
20.10.2021 N. N.

27.10.2021 N. N.

03.11.2021 Evgeny Stepanov (Russian Academy of Sciences, St. Petersburg)
The story of a fish in a turbulent ocean: How to survive and how to return home (hybrid talk)
Can a fish with limited velocity capabilities reach any point in the (possibly unbounded) ocean? In a recent paper by D. Burago, S. Ivanov and A. Novikov, ''A survival guide for feeble fish", an affirmative answer has been given under the condition that the fluid velocity field is incompressible, bounded and has vanishing mean drift. This brilliant result extends some known point-to-point global controllability theorems though being substantially non constructive. We will give a fish a different recipe of how to survive in a turbulent ocean, and show how this is related to structural stability of dynamical systems by providing a constructive way to change slightly a divergence free vector field with vanishing mean drift to produce a non dissipative dynamics. This immediately leads to closing lemmas for dynamical systems, in particular to C. Pugh's closing lemma, saying also that the fish can eventually return home. Joint work with Sergey Kryzhevich (St. Petersburg).
10.11.2021 Marc Hoffmann (Université Paris-Dauphine)
Part I: 10 am at Weierstrass Institute Some statistical inference results for interacting particle models in a mean-field limit (hybrid talk)
We propose a systematic — theoretical — statistical analysis for systems of interacting diffusions, possibly with common noise and/or degenerate diffusion components, in a mean-field regime. These models are more or less widely used in finance, MFG, systemic risk analysis, behaviourial sociology or ecology. We consider several inference issues such as: i) nonparametric estimation of the solution of the underlying Fokker-Planck type equation or the drift of the system ii) testing for the interaction between components iii) estimation of the interaction range between particles. This talk is based on joint results with C. Fonte, L. Della Maestra and R. Maillet.
10.11.2021 Nikita Zhivotovskiy (ETH Zürich)
Part II: 2 pm at Humboldt Universität, Dorotheenstr.1, R. 005. This lecture will only take place on site! Distribution-free robust linear regression
We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. When learning without assumptions on the covariates, we establish boundedness of the conditional second moment of the response variable as a necessary and sufficient condition for achieving deviation-optimal excess risk bounds. First, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Györfi, Kohler, Krzyzak, and Walk. However, in spite of its optimal in-expectation performance, we show that this procedure fails with constant probability for some distributions. Combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order O(d/n) with the optimal sub-exponential tail. Joint work with Jaouad Mourtada (CREST, ENSAE) and Tomas Vaţkevičius (University of Oxford).
17.11.2021 Christophe Giraud (Institut de Mathématiques d'Orsay, Université Paris-Saclay)
A geometric approach to fair online learning (hybrid talk)
Machine learning is ubiquitous in daily decisions and producing fair and non-discriminatory predictions is a major societal concern. Various criteria of fairness have been proposed in the literature, and we will start with a (biased!) tour on fairness concepts in machine learning. Many decision problems are of a sequential nature, and efforts are needed to better handle such settings. We consider a general setting of fair online learning with stochastic sensitive and non-sensitive contexts. We propose a unified approach for fair learning in this setting, by interpreting this problem as an approachability problem. This point of view offers a generic way to produce algorithms and theoretical results. Adapting Blackwell’s approachability theory, we exhibit a general necessary and sufficient condition for some learning objectives to be compatible with some fairness constraints, and we characterize the optimal trade-off between the two, when they are not compatible. (joint work with E. Chzhen and G. Stoltz)
24.11.2021 Matthew Reimherr (Penn State University)
Online Event at different time: 2 - 4 pm Pure differential privacy in functional data analysis (online talk)
We consider the problem of achieving pure differential privacy in the context of functional data analysis, or more general nonparametric statistics, where the summary of interest can naturally be viewed as an element of a function space. In this talk I will give a brief overview and motivation for differential privacy before delving into the challenges that arise in the sanitization of an infinite dimensional summary. I will present a new mechanism, called the Independent Component Laplace Process, for achieving privacy followed by several applications and examples.
01.12.2021 Nikita Puchkin (HSE Mokau)
Rates of convergence for density estimation with GANs (online talk)
We undertake a thorough study of the non-asymptotic properties of the vanilla generative adversarial networks (GANs). We derive theoretical guarantees for the density estimation with GANs under a proper choice of the deep neural networks classes representing generators and discriminators. In particular, we prove that the resulting estimate converges to the true density p* in terms of Jensen-Shannon (JS) divergence at the rate (logn/n)2β/(2β+d) where n is the sample size and β determines the smoothness of p*. Moreover, we show that the obtained rate is minimax optimal (up to logarithmic factors) for the considered class of densities.
08.12.2021 Davy Paindaveine (Université libre de Bruxelles)
Hypothesis testing on high-dimensional spheres: The Le Cam approach (hybrid talk)
Hypothesis testing in high dimensions has been a most active research topics in the last decade. Both theoretical and practical considerations make it natural to restrict to sign tests, that is, to tests that uses observations only through their directions from a given center. This obviously maps the original Euclidean problem to a spherical one, still in high dimensions. With this motivation in mind, we tackle two testing problems on high-dimensional spheres, both under a symmetry assumption that specifies that the distribution at hand is invariant under rotations with respect to a given axis. More precisely, we consider the problem of testing the null hypothesis of uniformity ("detecting the signal") and the problem of testing the null hypothesis that the symmetry axis coincides with a given direction ("learning the signal direction"). We solve both problems by exploiting Le Cam's asymptotic theory of statistical experiments, in a double- or triple-asymptotic framework. Interestingly, contiguity rates depend in a subtle way on how well the parameters involved are identified as well as on a possible further antipodally-symmetric nature of the distribution. In many cases, strong optimality results are obtained from local asymptotic normality. When this cannot be achieved, it is still possible to establish minimax rate optimality.
15.12.2021 N. N.

05.01.2022 Alexandra Suvorikova (WIAS Berlin)
Robust k-means in metric spaces and spaces of probability measures (online talk)
12.01.2022 Martin Wahl (HU Berlin)
Functional estimation in log-concave location-scale families (hybrid talk)
This talk will be concerned with nonasymptotic lower bounds for the estimation of principal subspaces. I will start by reviewing some previous methods, including the local asymptotic minimax theorem and the Grassmann approach. Then I will present a new approach based on a van Trees inequality (i.e. a Bayesian version of the Cramér-Rao inequality) tailored for invariant statistical models. As applications, I will provide nonasymptotic lower bounds for principal component analysis, the matrix denoising problem and the phase synchronization problem.
19.01.2022 Denis Belomestny (Universität Duisburg-Essen)
Achieving optimal sample complexity in reinforcement learning via upper solutions (hybrid talk)
26.01.2022 Pierre Jacob (ESSEC Paris)
Some methods based on couplings of Markov chain Monte Carlo algorithms (online talk)
Markov chain Monte Carlo algorithms are commonly used to approximate a variety of probability distributions, such as posterior distributions arising in Bayesian analysis. I will review the idea of coupling in the context of Markov chains, and how this idea not only leads to theoretical analyses of Markov chains but also to new Monte Carlo methods. In particular, the talk will describe how coupled Markov chains can be used to obtain 1) unbiased estimators of expectations and of normalizing constants, 2) non-asymptotic convergence diagnostics for Markov chains, and 3) unbiased estimators of the asymptotic variance of MCMC ergodic averages.
02.02.2022 Alessandra Menafoglio (MOX - Dept. of Mathematics, Politecnico di Milano)
Object oriented data analysis in Bayes spaces: From distributional data to the analysis of complex shapes (hybrid talk)
In the presence of increasingly massive and heterogeneous data, the statistical modeling of distributional observations plays a key role. Choosing the 'right' embedding space for these data is of paramount importance for their statistical processing, to account for their nature and inherent constraints. The Bayes space theory is a natural embedding space for (spatial) distributional data, and was successfully applied in varied settings. In this presentation, I will discuss the state-of-the-art methods for the modelling, analysis, and prediction of distributional data, with a particular attention to cases when their spatial dependence cannot be neglected. I will embrace the viewpoint of object-oriented spatial statistics (O2S2), a system of ideas for the analysis of complex data with spatial dependence. All the theoretical developments will be illustrated through their application on real data, highlighting the intrinsic challenges of a statistical analysis which follows the Bayes spaces approach. Applications will cover a varied range of fields, from the assessment of COVID-19 on mortality data to the analysis of complex shapes produced in additive manufacturing.
09.02.2022 Ervan Scornet (CMAP Ecole Polytechnique Paris)
Variable importance in random forests (hybrid talk)
Nowadays, machine learning procedures are used in many fields with the notable exception of so-called sensitive areas (health, justice, defense, to name a few) in which the decisions to be taken are fraught with consequences. In these fields, it is necessary to obtain a precise decision but, to be effectively applied, these algorithms must provide an explanation of the mechanisms that lead to the decision and, in this sense, be interpretable. Unfortunately, the most accurate algorithms today are often the most complex. A classic technique to try to explain their predictions is to calculate indicators corresponding to the strength of the dependence between each input variable and the output to be predicted. In this talk, we will focus on variable importances designed for the original random forest algorithm: the Mean Decreased Impurity (MDI) and the Mean Decrease Accuracy (MDA). We will see how theoretical results provide guidance for their practical uses.
16.02.22 N. N.

last reviewed: January 20, 2022 by Christine Schneider