Forschungsgruppe "Stochastische Algorithmen und Nichtparametrische Statistik"

This page in English

Research Seminar "Mathematical Statistics" Winter semester 2011/12

Place:	Weierstrass-Institute for Applied Analysis and Stochastics
	Erhard-Schmidt-Hörsaal, Mohrenstraße 39, 10117 Berlin
Time:	Wednesdays, 10.00 a.m. - 12.30 p.m.
19.10.2011

26.10.2011	"Jour Fixe" - No seminar!

02.11.2011	Rui Castro (Eindhoven University of Technology)
	Adaptive Sensing for Sparse Signal Detection and Localization
09.11.2011	No Seminar

16.11.2011	No Seminar

23.11.2011	Samory Kpotufe (MPI Tübingen)
Attention! The seminar will be held in room 406.	The curse of dimension in nonparametric regression
30.11.2011	Christian Hafner (UCL, Belgien)
	Volatility of price indices for heterogeneous goods
07.12.2011	Yoosoon Chang (ESMT)
	Nonstationarity in Time Series of State Densities
14.12.2011	No Seminar

21.12.2011	No Seminar

28.12.2011	No Seminar

04.01.2012	Prof. V. Spokoiny (WIAS/HU Berlin)
	Semiparametric alternation: convergence and efficiency"
11.01.2012	Itai Dattner (TU Eindhoven)
	On deconvolution of distribution functions
18.01.2012	Guillaume Lecué (Univ. Marne-la-Vallée)
	General oracle inequalities for empirical risk minimization (ERM) procedures, penalized ERM and regularized ERM with applications to high dimensional data analysis
25.01.2012	Enno Mammen (Universität Mannheim)
	Flexible Generalized Varying Coecient Regression Modes and Locally Additive Interaction Models
01.02.2012	No Seminar

08.02.2012	Paul von Bünau and Franz Kiraly (TU Berlin)
	Stationary subspace analysis and ideal regression
15.02.2012	No Seminar

last reviewed: July 14, 2011, Christine Schneider

Abstract:

Rui M. Castro (Eindhoven University of Technology)

Adaptive Sensing for Sparse Signal Detection and Localization

Abstract:In this talk I will consider the problem of detection and localization of sparse signals in noise. I consider in particular adaptive sensing approaches (also known as active learning or inference using sequential design of experiments). This means the data collection process is sequential and the collection of future data is guided by all data collected so far. This is in contrast with most traditional approaches where all the data is collected passively before any analysis is done. The added exibility of sequential designs results in dramatic gains in performance, both for signal estimation and detection. I consider in particular the setting where the signal of interest is a n-dimensional vector with solely s non-zero components, and show that if sequential designs are allowed the minimum signal compo- nent magnitude necessary to guarantee accurate detection or localization needs only to scale with s. This is remarkable since there is no dependence on the extrinsic dimension n. On the other hand if sequential designs are not allowed the magnitude of the signal components must necessarily scale with the extrinsic dimension n. Our characterization involves both lower performance bounds, as well as simple but powerful adaptive sensing procedures able to nearly achieve these fundamental limits for a while range of scenarios. Extensions of these ideas to a compressed sensing scenario might also be discussed.

Abstract:

Samory Kpotufe (MPI Tübingen)

The curse of dimension in nonparametric regression

Abstract:Nonparametric regression consists of learning an arbitrary mapping f: \X to \Y from a data set of (X,Y) pairs in which the Y values are corrupted by noise of mean zero. This statistical task is known to be subject to a so-called "curse of dimension": if \X is a subset of R^D, and if the only smoothness assumption on f is that it satisfies a Lipschitz condition, it is known that any estimator based on n data points will have an error rate (risk) of \Omega(n^{-2/(2+D)}). In other words a data size exponential in D is required to approximate f, which is unfeasible in high dimension. Fortunately, high-dimensional data often has low-intrinsic complexity due to correlations between features and/or irrelevant features. Common examples in machine learning are data that lie near a low-dimensional manifold, and sparse data. Some nonparametric regressors perform better in such situations as their risks depend only on the "intrinsic dimension" of the data, appropriately formalized. These methods which are "adaptive" to intrinsic dimension therefore require no dimension reduction! I will start with a short discussion of some adaptive traditional methods such as k-NN regression, then move on to computationally efficient adaptive methods which are less known. These efficient methods, namely Random Projection tree and tree-kernel-hybrid regressors, have strong theoretical guarantees which are verifiable on a wide range of real-world data. This talk is based on joint-work with Sanjoy Dasgupta and Nakul Verma.

Christian Hafner (UCL, Belgien)

Volatility of price indices for heterogeneous goods

Abstract:Hedonic regression is a common tool to estimate price indices and has been widely used to construct price indices of markets with heterogenous goods. Two prominent examples are the real estate and the art markets. Although some eorts have been made to improve the eciency of parameter estimates, there is no systematic treatment of volatility in these markets. Considering heterogenous goods as alternative investments, this lack of reliable volatility measures prevents an objective assessment of investment opportunities based on classical mean-variance criteria. Moreover, derivatives on subsets of the traded goods require a precise estimation of the underlying volatility. For example, in art markets, auction houses are interested in derivatives for collections or individual art objects in order to hedge their risks. In this paper we propose a new model which explicitly denes an underlying stochastic process for the price index. The model can be estimated using maximum likelihood and an extended version of the Kalman lter. We derive theoretical properties of the volatility estimator and show that it outperforms the standard estimator. To illustrate the usefulness of the model, we apply it to a large data set of international blue chip artists. (joint work with Fabian Bocart)

Yoosoon Chang

Nonstationarity in Time Series of State Densities

Abstract:This paper proposes a new framework to analyze the nonstationarity in the time series of state densities, representing either cross-sectional or intra-period distributions of some underlying economic variables. We regard each state density as a realization of Hilbertian random variable, and use a functional time series model to fit a given time series of state densities. This allows us to explore various sources of the nonstationarity of such time series. The potential unit roots are identified through functional principal component analysis, and subsequently tested by the generalized eigenvalues of leading components of normalized sample variance operator. The asymptotic null distribution of the test statistic is obtained and tabulated. We use the methodology developed in the paper to investigate the state densities given by the cross-sectional distributions of individual earnings and the intra-month distributions of stock returns. We find some clear evidence for the presence of strong persistency in their time series.

Abstract:

Prof. Vladimir Spokoiny (WIAS/HU Berlin)

Semiparametric alternation: convergence and efficiency

Abstract:A general problem of semiparametric estimation is considered when the target parameter \( \theta \) describing the model structure has to be estimated under the presence of the high dimensional nuisance \( \eta \). The alternating approach assumes that the partial estimation of \( \eta \) under the given target \( \theta \) and vice versa can be done efficiently even in large dimension because the related problem is convex or even quadratic. This naturally leads to the procedure based on sequential optimization: one starts from some initial value of \( \theta \) or \( \eta \) and then sequentially estimates one parameter with the other one fixed. Unfortunately, precise theoretical results addressing the overall quality of such procedures are only available in special cases. One example is given by linear models. In this case, an alternating procedure converges and is efficient under quite simple and tractable identifiability conditions. We aim at developing a novel approach to systematic study of the quality and efficiency of such iterative procedures based on the idea of approximating the original model by a linear one in a local vicinity of the central point. This allows for extending the algorithmic properties of the procedure like geometric convergence from the linear to a general smooth case.

Itai Dattner (TU Eindhoven)

On deconvolution of distribution functions

Abstract:It is well known that rates of convergence of estimators in deconvolution problems are aected by the smoothness of the error density and the density to be estimated. However, the problem of distribution deconvolution is more delicate than what was considered so far. We derive dierent rates of convergence with respect to the tail behavior of the error characteristic function. We develop an optimal in order deconvolution estimator and construct its adaptive version which achieves the optimal rates within a logarithmic factor. Simulation studies comparing the adaptive estimator to other methods are presented and support the superiority of our method. We also present an example with real data and discuss a few relevant extensions of our results. Based on joint works with Alexander Goldenshluger, Benjamin Reiser and Anatoli Juditsky.

Guillaume Lecué (Univ. Marne-la-Vallée)

General oracle inequalities for empirical risk minimization (ERM) procedures, penalized ERM and regularized ERM with applications to high dimensional data analysis

Abstract:We show that empirical risk minimization procedures, regularized empirical risk minimization procedures and penalized empirical risk minimization procedures satisfy non-exact oracle inequalities in a general unbounded framework. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates.

Enno Mammen (Universität Mannheim)

Flexible Generalized Varying Coecient Regression Modes and Locally Additive Interaction Models

Abstract:This talk discusses a very general model that can be used widely to analyze the relation between a response and multiple covariates. The model is nonparametric, yet renders easy interpretation for the eects of the covariates. The model accommodates both continuous and discrete random variables for the response and covariates. It is quite exible to cover both of the generalized varying coecient models and the generalized additive models as special cases. Under a weak condition we give a general theorem that the problem of estimating the multivariate mean function is equivalent to that of estimating its univariate component functions. We discuss implications of the theorem for penalized least squares estimators, and then investigate the outcomes in full details for a kernel-type estimator. The kernel estimator is given as a solution of a system of nonlinear integral equations. We provide an iterative algorithm to solve the system of equations and discuss the theoretical properties of the estimator and the algorithm. Finally, we give a simulation results and illustrate the method with a data example. The talk reports on joint work with Young K. Lee and Byeong U. Park.

Abstract:

Paul von Bünau and Franz Kiraly (TU Berlin)

Abstract:Joint work with Frank Meinecke, Duncan Blythe and K.-R. Muller. In the rst part of this talk, we introduce Stationary Subspace Analysis (SSA) [1], a source-separation method which linearly factorizes observed data into stationary and non-stationary components. We present an SSA algorithm for the mean and covariance matrix, discuss conditions for identiability and our simulation results. In an application to EEG analysis, we demonstrate how SSA reveals meaningful stationary and non-stationary brain sources. In the second part of this talk, we present ideal regression, an algebraic-geometric approach to estimating polynomial relations. Ideal regression is a generalization of classical regression setups. We demonstrate how to obtain consistent estimators and how to prove properties for these estimators, using techniques from algebraic geometry. We present a set of possible applications, and show that SSA can be viewed as a special case of ideal regression [2], exhibiting an algorithmic estimator which we evaluate in simulations. References [1] von Bunau, P., Meinecke, F. C., Kiraly, F. J. and Muller, K. R. Finding Stationary Subspaces in Multivariate Time Series. Phys. Rev. Lett. 103 (21):214101, 2009 [2] Kiraly, F. J., von Bunau, P., Meinecke, F. C., Blythe, D. A. J. and Muller, K. R. Algebraic Geometric Comparison of Probability Distributions. Oberwolfach preprint (OWP) 2011-30, 2011.

Abstract: