Forschungsgruppe "Stochastische Algorithmen und Nichtparametrische Statistik"

Research Seminar "Mathematical Statistics" WS 2021

04.11.2020 Pierre E. Jacob (Harvard University)
exceptionally at 11:30 a.m.! Unbiased Markov chain Monte Carlo methods with couplings
Various tasks in statistics involve numerical integration, for which Markov chain Monte Carlo (MCMC) methods are state-of-the-art. MCMC methods yield estimators that converge to integrals of interest in the limit of the number of iterations. This iterative asymptotic justification is not ideal; first, it stands at odds with current trends in computing hardware, with increasingly parallel architectures; secondly, the choice of "burn-in" and of the total number of iterations are dificult. This talk will describe recently proposed estimators that are exactly unbiased for the expectations of interest, while having an almost surely finite computing cost and a finite variance. They can thus be generated independently in parallel and averaged over. We argue that the removal of the "burn-in" bias can be done for many MCMC algorithms of practical interest without inflating the variance by much. The core idea is to use coupled pairs of Markov chains following the pathbreaking work of Peter Glynn & Chan-Han Rhee. The method also provides practical upper bounds on some distances between the marginal distribution of the chain at a finite step and its invariant distribution. This talk will provide an overview of this line of research, joint work with John O'Leary, Yves Atchade and many others.[>> read more as PDF]
11.11.2020 N.N.

18.11.2020 Matthias Löffler (ETH Zürich)
Computationally efficient sparse clustering
25.11.2020 Alexander Kreiß (KU Leuven)
Correlation bounds, mixing and m-dependence under random time-varying network distances with an application to Cox-processes
We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over time. We will assume that the spatial dependence-structure of the processes conditional on the network behaves in the following way: Close vertices (or pairs of vertices) are dependent, while we assume that the dependence decreases conditionally on that the distance in the network increases. We make this intuition mathematically precise by considering three concepts based on correlation, beta-mixing with time-varying beta-coefficients and conditional independence. These concepts allow proving weak-dependence results, e.g. an exponential inequality. These results will be useful for studying asymptotic regimes in which the network grows (i.e. the number of vertices tends to infinity) and we can expect that a growing network results in a larger distance between most vertices. Since the network is dynamic we allow that vertices move around the network and come close to different vertices. In order to demonstrate the use of these concepts in an application we study the asymptotics (for growing networks) of a goodness of fit test in a dynamic interaction network model based on a Cox-type model for counting processes. This model is then applied to bike-sharing data.
02.12.2020 Egor Klochkov (University of Cambridge)
Robust K-means clustering under two moments
We consider the robust algorithms for the k-means clustering problem where a quantizer is constructed based on N independent observations. Our main results are median of means based non-asymptotic excess distortion bounds {that} hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of Pollard(1981) who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in R^d. In a special case of clustering in R^d, under two bounded moments, we prove matching (up to constant factors) non-asymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the sub-Gaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators. This is joint work with Alexei Kroshnin and Nikita Zhivotovskiy.
11.12.2020 Judith Rousseau (University of Oxford)
Exceptionally on Friday at 10 a.m.! Bayesian nonparametric estimation in multivariate non linear Hawkes processes
For access please use:
16.12.2020 Howard Bondell (University of Melbourne)
Bayesian regression for high-dimensional data using a prior on the model fit
We propose a new class of prior distributions for shrinkage regularization in sparse linear regression, particularly the high dimensional case. Instead of placing a prior on the coefficients themselves, we place a prior on the regression R-squared. This is then distributed to the coefficients in a suitable manner. We show that this framework induces a shrinkage prior that can obtain desirable theoretical properties. Compared to existing shrinkage priors, it can simultaneously achieve both high prior concentration at zero, as well as heavier tails. These two properties combine to provide a higher degree of shrinkage on the irrelevant coefficients, along with less bias in estimation of the larger signals. Both theoretical and practical aspects will be discussed.
06.01.2021 Alexey Naumov (HSE Moscow)
Finite time analysis of linear two-timescale stochastic approximation with Markovian noise
13.01.2021 Vladimir Spokoiny (WIAS Berlin)
Bayesian inference in machine learning
Statistical inference for binary data is one of the central problem in machine learning which can be treated within a nonparametric Bernoulli model. In many situations the underlying Bernoulli parameter may approach the edges zero or one that leads to inference for non-regular models. Properties like asymptotic normality of the MLE or of the posterior distribution are not valid in such irregular cases. The approach of this paper suggests to impose a Gaussian prior on the canonical Bernoulli parameter, or, analogously, to consider a penalized MLE with a quadratic penalty. A Gaussian prior can be viewed as a kind of a barrier preventing the canonical parameter to approach infinity and forcing it to stay within the region of regularity. The main results stay the classical results like asymptotic normality of the posterior distribution or asymptotic normality of the penalized MLE for any configuration of Bernoulli parameters under the assumption that the prior is strong enough in direction. We also demonstrate how the results can be applied to analysis of random graphs, ranking problems, and nonparametric binary classification.
20.01.2021 Sven Wang (Cambridge University)
On polynomial-time computation of high-dimensional posterior measures by Langevin-type algorithms
In this talk, we consider the task of generating random samples of high-dimensional posterior distributions. The main results consist of non-asymptotic computational guarantees for Langevin-type MCMC algorithms which scale polynomially in key quantities such as the dimension of the model, the desired precision level, and the number of statistical measurements. We also show that posterior mean vectors and maximum a posteriori (MAP) estimates are computable in polynomial time, with high probability under the distribution of the data. Those results are derived in a general non-linear regression setting where posterior measures are not necessarily log-concave. The main example is a non-linear PDE model involving the steady-state Schrödinger equation. The talk is based on the preprint .
27.01.2021 N.N.

03.02.2021 N.N.

10.02.2021 Dr. Eftychia Solea (ENSAI)
Nonparametric and high-dimensional functional graphical models (online talk)
We consider the problem of constructing nonparametric undirected graphical models for high-dimensional functional data. Most existing statistical methods in this context assume either a Gaussian distribution on the vertices or linear conditional means. In this article we provide a more flexible model which relaxes the linearity assumption by replacing it by an arbitrary additive form. The use of functional principal components offers an estimation strategy that uses a group lasso penalty to estimate the relevant edges of the graph. We establish concentration inequalities for the resulting estimators allowing both the number of predictors and the number of functional principal components to diverge to infinity with increasing sample size. We also investigate the empirical performance of our method through simulation studies and a real data application.
17.02,2021 Georg Keilbar (Humboldt-Universität zu Berlin)
Finite sample bounds for quantiles using SGD (online talk)
24.02.21 Torsten Hothorn (Universität Zürich)
Understanding and applying transformation models (online talk)

last reviewed: February 15, 2021 by Christine Schneider