Research Group "Stochastic Algorithms and Nonparametric Statistics"

Research Seminar "Mathematical Statistics" Summer Semester 2017

  • Place: Weierstrass-Institute for Applied Analysis and Stochastics, Erhard-Schmidt-Hörsaal, Mohrenstraße 39, 10117 Berlin
  • Time: >Wednesdays, 10.00 a.m. - 12.30 p.m.

26.04.17 Jonathan Weed (Massachusetts Institute of Technology, USA)
Optimal rates of estimation for the multi-reference alignment problem
How should one estimate a signal, given only access to noisy versions of the signal corrupted by unknown circular shifts? This simple problem has surprisingly broad applications, in fields from structural biology to aircraft radar imaging. We describe how this model can be viewed as a multivariate Gaussian mixture model whose centers belong to an orbit of a group of orthogonal transformations. This enables us to derive matching lower and upper bounds for the optimal rate of statistical estimation for the underlying signal. These bounds show a striking dependence on the signal-to-noise ratio of the problem. Joint work with Afonso Bandeira and Philippe Rigollet.
03.05.17 Prof. Cristina Butucea (Université Paris-Est Marne-la-Vallée, Frankreich)
Local asymptotic equivalence for quantum models
Quantum statistics is concerned with inference for physical systems described by quantum mechanics. After an introduction to the main notions of quantum statistics: quantum states, measure- ments, channels, we describe nonparametric quantum models. We prove the local asymptotic equivalence (LAE) of i.i.d. quantum pure states and a quantum Gaussian state, in the sense of Le Cam theory. As an application, we show the optimal rates for the estimation of pure states, for the estimation of some quadratic functionals and for the testing of pure states. Surprisingly, a sharp parametric testing rate is obtained in a nonparametric quantum setup. Joint work with M. Guta and M. Nussbaum.
10.05.17 Prof. Marco Cuturi (ENSAE/CREST, Malakoff, France)
A review of regularized optimal transport and applications to Wasserstein barycenters
17.05.17 Shi Chen and Petra Burdejova (Humboldt-Universität zu Berlin)
PCA in an asymmetric norm
24.05.17 Claudia Kirch (Universität Magdeburg)
Frequency domain likelihood approximations for time series bootstrapping and bayesian nonparametrics
A large class of time series methods are based on a Fourier analysis, which can be considered as a whitening of the data, giving rise for example to the famous Whittle likelihood. In particular, frequency domain bootstrap methods have been successfully applied in a large range of situations. In this talk, we will rst review existing frequency domain bootstrap methodology for stationary time series before generalizing them for locally stationary time series. To this end, we rst introduce a moving Fourier transformation that captures the time-varying spectral density in a similar manner as the classical Fourier transform does for stationary time series. We obtain consistent estimators for the local spectral densities and show that the corresponding bootstrap time series correctly mimics the covariance behavior of the original time series. The approach is illustrated by means of some simulations and an application to a wind data set. All time series bootstrap methods are implicitely using a likelihood approximation, which could be used explicitely in a Bayesian nonparametric framework for time series. So far, only the Whittle likelihood has been used in this context to get a nonparametric Bayesian estimation of the spectral density of stationary time series. In a second part of this talk we generalize this approach based on the implicit likelihood from the autoregressive aided periodogram bootstrap introduced by Kreiss and Paparoditis (2003). This likelihood combines a parametric approximation with a nonparametric correction making it particularly attractive for Bayesian applications. Some theoretic results about this likelihood approximation including posterior consistency in the Gaussian case are given. The performance is illustrated in simulations and an application to LIGO gravitational wave data.
31.05.17 Prof. Juan Carlos Escanciano (Indiana University Bloomington, USA)
The talk takes place in R.406 Quantile-regression inference with adaptive control of size
Regression quantiles have asymptotic variances that depend on the conditional densities of the response variable given regressors. This paper develops a new estimate of the asymptotic variance of regression quantiles that leads any resulting Wald-type test or con dence region to behave as well in large samples as its infeasible counterpart in which the true conditional response densities are embedded. We give explicit guidance on implementing the new variance estimator to control adaptively the size of any resulting Wald-type test. Monte Carlo evidence indicates the potential of our approach to deliver con dence intervals for quantile regression parameters with excellent coverage accuracy over di erent quantile levels, data-generating processes and sample sizes. We also include an empirical application. Supplementary material is available online
07.06.17 No seminar at WIAS due to the IRTG 1792 Conference Modern Econometrics faces Machine Learning

14.06.17 Prof. Jean-Michel Loubes (Université Toulouse III, France)
The talk takes place in R.4.13 at HVP11a ! Kantorovich distance based kernel for Gaussian Processes : estimation and forecast
Monge-Kantorovich distances, otherwise known as Wasserstein distances, have received a growing attention in statistics and machine learning as a powerful discrepancy measure for probability distributions. Here, we focus on forecasting a Gaussian process indexed by probability distributions. For this, we provide a family of positive definite kernels built using transportation based distances. We provide a probabilistic understanding of these kernels and characterize the corresponding stochastic processes. We prove that the Gaussian processes indexed by distributions corresponding to these kernels can be efficiently forecast, opening new perspectives in Gaussian process modeling.
21.06.17 Prof. Denis Chetverikov (University of California, Los Angeles, USA)
On cross-validated lasso
In this paper, we derive a rate of convergence of the Lasso estimator when the penalty parameter $lambda$ for the estimator is chosen using $K$-fold cross-validation; in particular, we show that in the model with the Gaussian noise and under fairly general assumptions on the candidate set of values of $lambda$, the prediction norm of the estimation error of the cross-validated Lasso estimator is with high probability bounded from above up to a constant by $(slog p /n)^1/2cdot log^7/8(p n)$, where $n$ is the sample size of available data, $p$ is the number of covariates, and $s$ is the number of non-zero coefficients in the model. Thus, the cross-validated Lasso estimator achieves the fastest possible rate of convergence up to a small logarithmic factor $log^7/8(p n)$. In addition, we derive a sparsity bound for the cross-validated Lasso estimator; in particular, we show that under the same conditions as above, the number of non-zero coefficients of the estimator is with high probability bounded from above up to a constant by $slog^5(p n)$. Finally, we show that our proof technique generates non-trivial bounds on the prediction norm of the estimation error of the cross-validated Lasso estimator even if the assumption of the Gaussian noise fails; in particular, the prediction norm of the estimation error is with high-probability bounded from above up to a constant by $(slog^2(p n)/n)^1/4$ under mild regularity conditions.(The authors are Denis Chetverikov, Zhipeng Liao, and Victor Chernozhukov.)
28.06.17 Prof. Bernd Sturmfels (MPI Leipzig)
Geometry of log-concave density estimation
We present recent work with Elina Robeva and Caroline Uhler that establishes a new link between geometric combinatorics and nonparametric statistics. It concerns shape-constrained densities on d-space that are log-concave, with focus on the maximum likelihood estimator (MLE) for weighted samples. Cule, Samworth, and Stewart showed that the logarithm of the optimal log-concave density is piecewise linear and supported on a regular subdivision of the samples. This defines a map from the space of weights to the set of regular subdivisions of the samples, i.e. the face poset of their secondary polytope. We prove that this map is surjective. In fact, every regular subdivision arises in the MLE for some set of weights with positive probability, but coarser subdivisions appear to be more likely to arise than finer ones. To quantify these results, we introduce a continuous version of the secondary polytope, whose dual we name the Samworth body.

12.07.17 Prof. Wolfgang Polonik (University of California at Davis, USA)
Statistical topological data analysis: Rescaling the persistence diagram
A persistence diagram (PD) is one of the basic objects underlying topological data analysis. It is used to analyze topological and geometric features of an underlying space M, assuming availability of a random sample from M. Existing approaches for such analyses will be reviewed brie y, and their bene ts and shortcomings will be discussed. Then we introduce ideas for rescaling PDs, which enables the derivation of novel limit theorems for the total k persistence, and other functionals of PDs. The long-term goal of studying the rescaling of PDs is to develop novel types of statistical analysis of persistence diagrams.

last reviewed: march 3, 2017, by Christine Schneider