# Research Group "Stochastic Algorithms and Nonparametric Statistics"

## Research Seminar "Mathematical Statistics" Summer Semester 2015

last reviewed: May 12, 2015, by Christine Schneider

Dr. Natalia Bochkina (University of Edinburgh, UK)

Statistical inference in possibly misspecied nonregular models

Abstract:Finite dimensional statistical models are usually called nonregular if the regularity assumptions (e.g. of the Cramer-Rao inequality) do not hold. For such models, it is possible to construct an estimator with the rate of convergence that is faster than the parametric root-n rate. I will give an overview of such models with the corresponding rates of convergence in the frequentist setting under the assumption that they are well-specified. In a Bayesian approach, I will consider a special case where the -Y´true¡ value of the parameter for a well-specified model, or the parameter corresponding to the best approximating model from the considered parametric family for a misspecified model, occurs on the boundary of the parameter space. I will show that in this case the posterior distribution (a) asymptotically concentrates around the ``true†¢†¢ value of the parameter (or the best approximating value under a misspecified model), (b) has not only Gaussian components as in the case of regular models (the Bernstein%G†â€“%@von Mises theorem) but also Gamma distribution components whose form depends on the behaviour of the prior distribution near the boundary, and (c) has a faster rate of convergence in the directions of the Gamma distribution components. One implication of this result is that for some models, there appears to be no lower bound on efficiency of estimating the unknown parameter if it is on the boundary of the parameter space. I will discuss how this result can be used for identifying misspecification in regular models. The results will be illustrated on a problem from emission tomography. This is joint work with Peter Green (University of Bristol).

Torsten Hohage (Universität Göttingen)

Variational regularization of statistical inverse problems

Abstract:We consider variational regularization methods for ill-posed inverse problems described by
operator equations F(x) = y in Banach spaces. One focus of this talk will be on data noise models: We
will present a general framework which allows to treat many noise models and data delity terms in a
unied setting, including Gaussian and Poisson processes, continuous and discrete models, and impulsive
noise models.
Rates of convergence are determined by abstract smoothness conditions called source conditions. In va-
riational regularization theory these conditions are often formulated in the form of variational inequalities
rather than range conditions for the functional calculus at F0[xy]F0[xy] where xy denotes the exact so-
lution. Although this has a number of advantages from a theoretical perspective, there has been a lack
of interpretations of variational source conditions for relevant problems. Here we will show for an inverse
medium scattering problem that Sobolev smoothness of the contrast implies logarithmic variational source
conditions and logarithmic rates of convergence for generalized Tikhonov regularization as the noise level
tends to 0.
Our general results will be illustrated in the context of phase retrieval problems in coherent x-ray imaging,
inverse scattering problems, and parameter identication problems in stochastic dierential equations.

Prof. Keith Knight (University of Toronto, Canada)

1 ∞ estimation in regression

Abstract: 1∞ estimation is not part of the traditional canon of
applied regression analysis. And for good reason - it is highly
non-robust and potentially very unstable. Nonetheless, in some situations,
minimizing the maximum absolute residual is a worthwhile objective. In
this talk, we will discuss the properties (both asymptotic and
non-asymptotic) of 1 ∞ estimation in linear regression and describe
an approach for "rescuing'' 1 ∞ estimation that can also be
applied to non-parametric regression problems.

Prof. Marius Kloft (Humbolt Universität)

On the statistical properties of L^{p}-Norm multiple kernel learning

Abstract: Reproducing kernel Hilbert space methods have become a popular and versatile
tool with many application areas in statistics and machine learning, the flagship method being the support vector machine.
Nevertheless, a displeasing stumbling block towards the complete automatization of this method remains
that of automatic kernel selection. In the seminal work of Lanckriet et al. (2004) ,
it was shown that it is computationally feasible to simultaneously learn a support vector machine and a linear combination of kernels;
this approach is dubbed "multiple kernel learning".
In this talk, we discuss a further extension of this methodology, using an ℓ_{q} (q ≥ 1) regularization
of the kernel coefficients, which can be understood as enforcing a degree of soft sparsity.
We present a statistical analysis of the performance of this method in terms of the convergence of its excess loss,
based on precise bounds on its Rademacher complexity.
We will also demonstrate the interest of this approach through applications to bioinformatics and image recognition.

Shih-Kang Chao (HU Berlin)

FASTEC: Factorisable sparse tail event curves

Abstract: High-dimensional multivariate quantile analysis is crucial for many applications, such as risk management and weather analysis. In these applications, quantile functions qY (\u03c4) of random variable Y such that P{Y \u2264 qY (\u03c4)} = \u03c4 at the "tail" of the distribution, namely at \u03c4 close 0 or 1, such as \u03c4 = 1%, 5% or \u03c4 = 95%, 99%, is of great interest. The quantile at level \u03c4 can be interpreted as the lower (upper) bound with confidence level 1\u2212\u03c4 (\u03c4) of the possible outcome of a random variable, and the difference of (qY (\u03c4 ), qY (1 \u2212 \u03c4 )) can be interpreted as \u03c4 -range, with \u03c4 = 25% being the special case of interquartile range. While covariance based methods such as principal component analysis do not yield information for the bounds, and are easily corrupted if data are highly skewed and present outliers. We propose a conditional quantile based method which enables localized analysis on quantiles and global comovement analysis for \u03c4-range for high-dimensional data with factors. We call our method FASTEC: FActorisable Sparse Tail Event Curves. The technique is implemented by factorising the multivariate quantile regression with nuclear norm regularization. As the empirical loss function and the nuclear norm are non-smooth, an efficient algorithm which combines smoothing techniques and effective proximal gradient meth- ods is developed, for which explicit deterministic convergence rates are derived. It is shown that the estimator enjoys nonasymptotic oracle properties under rank sparsity condition. The technique is applied to a multivariate modification of the famous Conditional Autoregressive Value-at-Risk (CAViaR) model of Engle and Manganelli (2004), which is called Sparse Asym- metric Conditional Value-at-Risk (SAMCVaR). With a dataset consists of stock prices of 230 global financial firms ranging over 2007-2010, the leverage effect documented in previous studies like Engle and Ng (1993) is confirmed, and furthermore we show that the negative lag return increase the distribution dispersion mostly by lowering the left tail of the distribution, which does not yield the potential for gain. Finally, a nonparametric extension of our method is pro- posed and applied on Chinese temperature data collected from 159 weather stations for the classification of temperature seasonality patterns.

Henry Horng-Shing Lu (National Chiao Tung University)

Network analysis of big data

Abstract: One great challenge of big data research is to efficiently and accurately identify the inherent complex network structure. We will discuss possible approaches to reconstruct Boolean networks. Specifically, we will prove that (log n) state transition pairs are sufficient and necessary to reconstruct the time delay Boolean network of n nodes with high accuracy if the number of input genes to each gene is bounded. Future developments of methodologies and computation systems for big data researches will be also discussed.

Timothy B. Armstrong (Yale University)

Adaptive testing on a regression function at a point

Abstract: We consider the problem of inference on a regression function at a point when
the entire function satisfies a sign or shape restriction under the null. We propose a
test that achieves the optimal minimax rate adaptively over a range of Hölder classes,
up to a log log n term, which we show to be necessary for adaptation. We apply the
results to adaptive one-sided tests for the regression discontinuity parameter under a
monotonicity restriction, the value of a monotone regression function at the boundary,
and the proportion of true null hypotheses in a multiple testing problem.

Alexander Gasnikov (MIPT, Moskau)

On optimization aspects of finding Wasserstain(-Kantorovich) barycenter

Abstract:In the talk we'll discuss recent works by Marco Cuturi (Kyoto Univ.) et al. devoted to the fast algorithm of computation of Wasserstain barycenter (Wb). In our approach we try to reduce a problem to another high dimensional convex optimization problem . The idea is to freeze the measures support by allowing the cardinalities of the support sets to be large enough. Then we have to solve a sadle-point convex-concave optimization problem (Cuturi et al. considered this problem to be nonsmooth convex optimization problem). We propose new different numerical approaches to solve this problem. 1. We generalize the approach by Cuturi and obtain a sharp bound on the rate of convergence. We have to solve a inner linear programming problem or entropy-linear problem. We have to propose optimal primal-dual methods for these problems and we need to choose optimal relation between the precision of solution for the inner problem and the precision of solution of the whole problem. Since we can't obtain the exact solution of the inner problem we have to properly choose the method for external problem which have the ability to work with inexact oracle. Here we used recent results by Dvurechensky-Gasnikov arXiv:1411.2876, Gasnikov-Dvurechensky-Nesterov arXiv:1411.4218 and Gasnikov-Nesterov et al. arXiv:1410.7719 2. We propose randomized (mirror descent) and non-randomized (mirror-prox) approaches for sadle-point problem which go back to the recent works of Nemirovski-Juditsky http://www2.isye.gatech.edu/~nemirovs/. 3. We propose randomized approaches based on the randomization of functional which has the form of a sum of large number of functions (see Agarwal-Bouttou arXiv:1410.0723v2 and references therein). This context is interesting in case when we have to find a Wb of huge number of empirical measures.

Prof. Ying Chen (National University Singapore)

An adaptive functional autoregressive forecasting model to predict electricity price curves

Abstract: Electricity price forecasting is becoming increasingly relevant in the competitive energy markets. We provide an approach to predict the whole electricity price curves based on the adaptive functional autoregressive (AFAR) methodology. The AFAR has time varying operators that allow it to be safely used in both stationary and non-stationary situations. Under stationarity, we develop a consistent maximum likelihood (ML) estimator with closed form, where the likelihood function is defined on the parameters' subspace or Sieves. For non-stationary data, the estimation is conducted over an interval of local homogeneity, over which the time varying data generating process can be approximated by an FAR model with constant operators. The local interval is identified in a sequential testing procedure. Simulation study illustrates good finite sample properties of the proposed AFAR modeling. Real data application on forecasting California electricity daily price curves demonstrates a superior accuracy of the proposed AFAR modeling compared to several alternatives.

Christoph Breunig (HU Berlin)

Testing the specification in random coefficient models

Abstract: In this paper, we suggest and analyze a new class of specification tests
for random coefficient models. They allow to assess the validity of
central structural features of the model, in particular linearity in
coefficients, generalizations of this notion like a known nonlinear
functional relationship, or degeneracy of the distribution of a random
coefficient, i.e., whether a coefficient is fixed or random, including
whether an associated variable can be omitted altogether. Our tests are
nonparametric in nature, and use sieve estimators of the characteristic
function. We analyze both their power against global, as well as against
local alternatives, theoretically. Moreover, we perform a Monte Carlo
simulation study, and apply the tests to analyze the degree of
nonlinearity in a heterogeneous random coefficients consumer demand model.

Jonas Peters (ETH Zürich)

Invariant prediction and causal inference

Abstract:Why are we interested in the causal structure of a data-generating process? In a classical regression problem, for example, we include a variable into the model if it improves the prediction; it seems that no causal knowledge is required. In many situations, however, we are interested in the system's behavior under a change of environment. Here, causal models become important because they are usually considered invariant under those changes. A causal prediction (which uses only direct causes of the target variable as predictors) remains valid even if we intervene on predictor variables or change the whole experimental setting.
In this talk, we propose to exploit invariant prediction for causal inference: given data from different experimental settings, we use invariant models to estimate the set of causal predictors. We provide valid confidence intervals and examine sufficient assumptions under which the true set of causal predictors becomes identifiable. The empirical properties are studied for various data sets, including gene perturbation experiments.
This talk does not require any prior knowledge about causal concepts.

Prof. Markus Haltmeier (Universität Innsbruck)

Extreme value analysis of frame coefficients and applications

Abstract:Consider the problem of estimating a high-dimensional vector from linear observations that are corrupted by additive Gaussian white noise. Many solution approaches for such problems construct an estimate as the most regular element satisfying a bound on the coefficients of the residuals with respect to some frame. In order that the true parameter is feasible, the coefficients of the noise must satisfy the bound. For that purpose we compute the asymptotic distribution of these coefficients. We show that generically a standard Gumbel law results, as it is known from the case of orthonormal bases. However, for highly redundant frames other limiting laws may occur. We discuss applications of such results for thresholding in redundant wavelet or curvelet frames, and for the Dantzig selector.

Martin Wahl (Universität Mannheim)

Nonparametric estimation in the presence of complex nuisance
components

Abstract: