Research Group "Stochastic Algorithms and Nonparametric Statistics"
Research Seminar "Mathematical Statistics" Winter Semester 2015/2016
last reviewed: December 1, 2015, by Christine Schneider
Xiaohong Chen (Yale University)
Optimal sup-norm rates, adaptivity and inference in nonparametric instrumental
variables regression
Abstract: This paper makes several contributions to the literature on the important yet dicult pro-
blem of estimating functions nonparametrically using instrumental variables. First, we derive the minimax
optimal sup-norm convergence rates for nonparametric instrumental variables (NPIV) estimation of the
structural function h0 and its derivatives. Second, we show that a computationally simple sieve NPIV
estimator can attain the optimal sup-norm rates for h0 and its derivatives when h0 is approximated via a
spline or wavelet sieve. Our optimal sup-norm rates surprisingly coincide with the optimal L2-norm rates
for severely ill-posed problems, and are only up to a [log(n)] (with < 1=2) factor slower than the op-
timal L2-norm rates for mildly ill-posed problems. Third, we introduce a novel data-driven procedure for
choosing the sieve dimension optimally. Our data-driven procedure is sup-norm rate-adaptive: the resulting
estimator of h0 and its derivatives converge at their optimal sup-norm rates even though the smoothness of
h0 and the degree of ill-posedness of the NPIV model are unknown. Finally, we present two non-trivial ap-
plications of the sup-norm rates to inference on nonlinear functionals of h0 under low-level conditions. The
rst is to derive the asymptotic normality of sieve t-statistics for exact consumer surplus and deadweight
loss functionals in nonparametric demand estimation when prices, and possibly incomes, are endogenous.
The second is to establish the validity of a sieve score bootstrap for constructing asymptotically exact
uniform condence bands for collections of nonlinear functionals of h0. Both applications provide new and
useful tools for empirical research on nonparametric models with endogeneity.
Link to the paper: http://arxiv.org/pdf/1508.03365v1.pdf
Martin Wainwright (UC Berkeley)
Fast low-rank estimation by projected gradient descent: Statistical and algorithmic
guarantees
Optimization problems with rank constraints arise in many applications, including matrix
regression, structured PCA, matrix completion and matrix decomposition problems. An attractive heuristic
for solving such problems is to factorize the low-rank matrix, and to run projected gradient descent on
the nonconvex problem in the lower dimensional factorized space. We provide a general set of conditions
under which projected gradient descent, when given a suitable initialization, converges geometrically to a
statistically useful solution. Our results are applicable even when the initial solution is outside any region
of local convexity, and even when the problem is globally concave.
Based on joint work with Yudong Chen, Cornell University
Jonathan Rosenblatt (Ben-Gurion University of the Negev, Israel)
On the 0ptimality of averaging in distributed statistical learning
Abstract: A common approach to statistical learning on big data is to randomly split it among m
machines and calculate the parameter of interest by averaging their m individual estimates. Focusing on
empirical risk minimization, or equivalently M-estimation, we study the statistical error incurred by this
strategy. We consider two asymptotic settings: one where the number of samples per machine n ! inf
but the number of parameters p is xed, and a second high-dimensional regime where both p; n ! inf
with p=n ! . Most previous works provided only moment bounds on the error incurred by splitting the
data in the xed p setting. In contrast, we present for both regimes asymptotically exact distributions for
this estimation error. In the xed-p setting, under suitable assumptions, we thus prove that to leading
order, averaging is as accurate as the centralized solution. In the high-dimensional setting, we show a
qualitatively dierent behavior: data splitting does incur a rst order accuracy loss, which we quantify
precisely. In addition, our asymptotic distributions allow the construction of condence intervals and
hypothesis testing on the estimated parameters. Our main conclusion is that in both regimes, averaging
parallelized estimates is an attractive way to speedup computations and save on memory, while incurring
a quantiable and typically moderate excess error.
Martin Wahl (HU Berlin)
Non-asymptotic upper bounds for the reconstruction error of PCA
Abstract: Principal component analysis (PCA) is a standard tool for dimension reduction. Classical
objects of interest are the principal subspaces and their empirical counterparts. In this talk, we
focus on the reconstruction error of PCA, and prove a non-asymptotic upper bound for the corre-
sponding excess risk. This bound unies and improves several upper bounds which were previously
obtained by empirical process theory. Moreover, the bound reveals that the excess risk diers consi-
derably from the usual subspace distance based on canonical angles. Our proof relies on the analysis
of empirical spectral projectors and uses recent concentration inequalities for sample covariance operators.
Karim Lounici (Georgia Institute of Technology, USA)
Concentration bounds and asymptotic distribution for the imperical spectral projectors of sample covariance operators
Abstract:
Felix Krahmer (Technische Universität München)
Empirical chaos processes and their application in blind deconvolution
Abstract:The motivation of this talk is the deconvolution of two unknown vectors w and x, each of
which is sparse with respect to a generic (but known) basis. That is, one seeks to recover w and x from
their circular convolution y = wx. In this talk, we prove a restricted isometry property for this problem,
which then entails convergence guarantees for the non-convex sparse power factorization algorithm via
recent work by Lee et al.
A key ingredient of our proof are tail bounds for random processes of the form
sup
X2X
XL
`=1
jhb`;XC` ij2;
where the b`; c` are independent standard Gaussian vectors and X is a set of matrices. Such processes can
be interpreted as the empirical process corresponding to a chaos process. We analyze such processes in
terms of the Talagrand
2 functional. For the blind deconvolution application, our results yield convergence
guarantee whenever the sparsity of the signals is less than cL= log6 L, where L is the dimension and c is
an absolute constant.
This is joint work with Justin Romberg (Georgia Tech) and Ali Ahmed (MIT).
Adityanand Guntuboyina
(Berkeley)
On Bayes risk lower bounds
Abstract: I will present a general technique for obtaining Bayes risk lower bounds for arbitrary priors in
standard decision theoretic problems. The method leads to generalizations of a variety of classical minimax
bounds. I will describe an application to admissibility. This is based on http://arxiv.org/abs/1410.0503
which is joint work with Xi Chen and Yuchen Zhang.
Christoph Breunig (HU Berlin)
Testing the specification in random coefficient models
Abstract: In this paper, we suggest and analyze a new class of specification tests
for random coefficient models. They allow to assess the validity of
central structural features of the model, in particular linearity in
coefficients, generalizations of this notion like a known nonlinear
functional relationship, or degeneracy of the distribution of a random
coefficient, i.e., whether a coefficient is fixed or random, including
whether an associated variable can be omitted altogether. Our tests are
nonparametric in nature, and use sieve estimators of the characteristic
function. We analyze both their power against global, as well as against
local alternatives, theoretically. Moreover, we perform a Monte Carlo
simulation study, and apply the tests to analyze the degree of
nonlinearity in a heterogeneous random coefficients consumer demand model.
Jonas Peters (ETH Zürich)
Invariant prediction and causal inference
Abstract:Why are we interested in the causal structure of a data-generating process? In a classical regression problem, for example, we include a variable into the model if it improves the prediction; it seems that no causal knowledge is required. In many situations, however, we are interested in the system's behavior under a change of environment. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction (which uses only direct causes of the target variable as predictors) remains valid even if we intervene on predictor variables or change the whole experimental setting.
In this talk, we propose to exploit invariant prediction for causal inference: given data from different experimental settings, we use invariant models to estimate the set of causal predictors. We provide valid confidence intervals and examine sufficient assumptions under which the true set of causal predictors becomes identifiable. The empirical properties are studied for various data sets, including gene perturbation experiments.
This talk does not require any prior knowledge about causal concepts.
Prof. Markus Haltmeier (Universität Innsbruck)
Extreme value analysis of frame coefficients and applications
Abstract:Consider the problem of estimating a high-dimensional vector from linear observations that are corrupted by additive Gaussian white noise. Many solution approaches for such problems construct an estimate as the most regular element satisfying a bound on the coefficients of the residuals with respect to some frame. In order that the true parameter is feasible, the coefficients of the noise must satisfy the bound. For that purpose we compute the asymptotic distribution of these coefficients. We show that generically a standard Gumbel law results, as it is known from the case of orthonormal bases. However, for highly redundant frames other limiting laws may occur. We discuss applications of such results for thresholding in redundant wavelet or curvelet frames, and for the Dantzig selector.
Martin Wahl (Universität Mannheim)
Nonparametric estimation in the presence of complex nuisance
components
Abstract: