Research Group "Stochastic Algorithms and Nonparametric Statistics"

Diese Seite auf Deutsch

Research Seminar "Mathematical Statistics" Sommer Semester 2014

Place:	Weierstrass-Institute for Applied Analysis and Stochastics
	Erhard-Schmidt-Hörsaal, Mohrenstraße 39, 10117 Berlin
Time:	Wednesdays, 10.00 a.m. - 12.30 p.m.
16.04.14	Martin Weidner (UCL)
	Incidental parameter bias in panel quantile regressions
23.04.14	Phillipe Vieu (Université Paul Sabatier, Toulouse)
	How to deal with dimensionality in functional data analysis?
30.04.14	Hajo Holzmann (Universität Marburg)
Mohrenstr. 39, Raum 406	Nonparametric identication and estimation in a triangular random coecient regression model
07.05.14	Matteo Barigozzi (LSE)
Hausvoglteiplatz 11a, Raum 4.13	Dynamic factor models, cointegration, and error correction mechanisms
14.05.14	Wei Biao Wu (Chicago) First talk:
Mohrenstr. 39, Raum 406	An L² test theory for nonstationary time series
	Second talk:
	A sharp strong invariance principle for stationary processes
21.05.14	No Seminar - Berlin Singapore Workshop

28.05.14	Michaël Chichignoud (ETH ZÜrich)
	On bandwidth selection in empirical risk minimization
04.06.14	Dennis Kristensen (UCL)
	What drives the Yield curve?
11.06.14	Yoosoon Chang (Indiana)
	Regime switching model with endogenous auroregressive latent factor
18.06.14	Piotr Kokoschka (Colorado State University)
	Functional framework for high frequency financial data with focus on regression and predictability of intraday price curves
25.06.14	Mona Eberts (Universität Stuttgart)
	Adaptive rates for support vector machines
02.07.14	Sara van de Geer (ETH Zürich)
	Condence intervals using the graphical Lasso (joint work with Jana Jankova)
09.07.14

16.07.14

last reviewed:April 16, 2014, Christine Schneider

Martin Weidner (UCL)

Incidental parameter bias in panel quantile regressions

Abstract:This paper studies linear quantile regression (QR) estimators in panel data settings with fixed effects. The estimation error in the fixed effects causes an incidental parameter problem in the parameters of interest, and we work out the first order asymptotic bias under an asymptotic where both N and T grow to infinity. This leading incidental parameter bias is of order 1/T, analogous to the situation in non-linear fixed effect panel models with smooth objective function. The key technical challenge in deriving our result is that the QR objective function is non-smooth, rendering the existing large T asymptotic bias results in the panel literature non-applicable. We provide analytic and Jackknife bias corrected estimators and study their performance in Monte Carlo simulations, and in an application to educational achievement of US high-school students.

Phillipe Vieu (Université Paul Sabatier, Toulouse)

How to deal with dimensionality in functional data analysis?

Abstract: Functional data are, by nature, infinite dimensional data, and their analysis need necessarily specific attention to the possible effects of high (in fact, infinite) dimension on the behaviour of statistical procedures. Semi-parametric modelling and variable/model selection are two fields of modern Statistics having developped methodologies for dealing with dimensionality in high (but finite) multivariate data analysis. The aim of this talk will be to discuss how these multivariate ideas can be nicely adapted to FDA and to emphasize on the fact that, even if in FDA the dimension is infinite, the continuous structure of the data allows statistical methods to be more efficient (in some senses to be precised ) than in multivariate setting. In usual multi- (but finite) -dimensional settings, semi-parametric ideas have been widely used in order to balance the trade-off betwen very few flexibility (this is the drawback of pure parametric modelling) and dimensional effects (this is the main drawback of non-parametric modelling). In a first attempt it will be shown along this talk, through the simple Single Functional Index Model, how semi-parametric modelling behaves for FDA. From a methodological point of view one will see how this model is rather flexible without being affected by the infinite-dimensionality effect. From an applied point of view, it will be highlighted how the functional semi-parametric statistical procedures are combining good predictive power and nice possibility of interpretation of the results. In a second attempt, one will develop specific variable selection procedures for FDA. The methodology will take fully into acount the continuous structure of the data, leading to rather low computational costs (compared with standard multivariate selection procedures), and combining again good predictive power and nice possibility of interpretation of the results. The talk will be mainly methodlogical and centered around the presentation of these two functional methodologies, namely Functional Single Index Modelling and Variable Selection for Continuous Data. The, it will end by the presentation of some benchmark real curves dataset analysis.

Hajo Holzmann (Universität Marburg)

Nonparametric identication and estimation in a triangular random coecient regression model

Abstract: Linear regression models with random coecients have recently become quite popular in eco- nometrics as a tool for modeling unobserved heterogeneity. The main structural assumption which makes these models identiable is the independence of the regressors from the random coecients, that is, the exogeneity of the regressors. We brie y review identication in this situation. Further, we propose nonpa- rametric estimators for the density of the random coecients in case of light-tailed and even compactly supported regressors, and derive rates of convergence. In the main part of the talk we will be concerned with a triangular system of linear random coecient regression models, where the endogenous regressor in the second-stage equation is the response of an addi- tional equation with an exogenous instrument. Without further structural assumptions on the coecients, we give a surprising non-identiability result for the intercept in the second-stage equation. Further, we show how identiability of the density of the coecients in the second-stage equation can be achieved via a marginal independence assumption. Based on this result, we discuss nonparametric estimation of their joint density. Joint work with Stefan Hoderlein (Boston) and Alexander Meister (Rostock)

Matteo Barigozzi (LSE)

Dynamic factor models, cointegration, and error correction mechanisms

Abstract:In this paper we study Dynamic Factor Models when the factors Ft are I(1) and singular, i.e. rank(Ft) < dim(Ft). By combining the classic Granger Representation Theorem with recent results by Anderson and Deistler on singular stochastic vectors, we prove that, for generic values of the parameters, Ft has an Error Correction representation with two unusual features: (i) the autoregressive matrix polynomial is finite, (ii) the number of error-terms is equal to the number of transitory shocks plus the difference between the dimension and the rank of Ft. This result is the basis for the correct specification of an autoregressive model for Ft. Estimation of impulse-response functions is also discussed. Results of an empirical analysis on a US quarterly database support the use of our model.

Abstract:

Michaël Chichignoud (ETH ZÜrich)

On bandwidth selection in empirical risk minimization

Abstract: The well-known Goldenshluger-Lepski method (GLM) allows to select multi-dimensional bandwidths (possibly anisotropic) of kernel estimators and provides optimal results in this setting. Ho- wever, GLM requires some linearityproperty, which is not satised in empirical risk minimization (where a bandwidth is involved in the empirical risk). One typically deals with this issue in local M-estimation such as local median estimate or local maximum-likelihood estimate; and in statistical learning with noisy data such as in quantile and moment estimation, in discriminant analysis and in clustering. Many of these studies lead to data-driven procedures selecting isotropic bandwidths, however, none of them allows anisotropic bandwidth selection. We present a novel data-driven selection of anisotropic bandwidths in the large setting of empirical risk minimization. The selection consists of comparing gradient empirical risks (instead of comparing estimators). It can be viewed as a non-trivial improvement of GLM to non-linear estimators. This method allows us to derive excess risk bounds - with fast rates of convergence - in noisy clustering as well as adaptive minimax results for pointwise and global estimation in nonparametric regression.

Dennis Kristensen, University College London, UK

What drives the Yield curve?

Abstract:We develop nonparametric tests for term structure dynamics to be driven by a nite number of Markov factors in a continuous-time setting. The tests are based on nonparametric estimators of the model developed under the null of the Markov hypothesis and under the alternative, respectively. We then reject the null if the estimators are statistically dierent from each other. The tests do not rely on particular functional form assumptions and so are able to disentangle the Markov hypothesis from functional form hypotheses. In particular, it allows us to test the hypothesis that the short term interest rate follows a time-homogeneous univariate Markov diusion; such a structure is frequently considered in the term structure literature. In an empirical application, we implement estimators and tests on US term structure data.

Yoosoon Chang, Indiana University, USA:

Regime switching model with endogenous auroregressive latent factor

Abstract:This paper introduces a model with regime switching, which is driven by an autoregressive latent factor correlated with the innovation to the observed time series. In our model, the mean or volatility process is switched between two regimes, depending upon whether the underlying autoregressive latent factor takes values above or below some threshold level. If the latent factor becomes exogenous, our model reduces to the conventional markov switching model, and therefore, our model may be regarded as an extended markov switching model allowing for endogeneity in regime switching. Our model is estimated by the maximum likelihood method using a newly developed modified markov switching filter. For both mean and volatility models that are frequently analyzed in markov switching framework, we demonstrate that the presence of endogeneity in regime switching is indeed strong and ubiquitous.

Piotr Kokoschka (Colorado State University)

Functional framework for high frequency financial data with focus on regression and predictability of intraday price curves

Abstract: The talk will introduce the concept of a functional time series and focus on two specific statistical problems for such series, both motivated by intraday price curves. We explain how intraday price curves can be transformed to form an approximately stationary functional time series. We consider a contemporaneous regression of such transformed daily curves on risk factors, which may be daily functions as well. We then present a significance test designed to determine if the shape of an intraday price curve can be predicted from the past shapes of such curves.

Mona Eberts (Universität Stuttgart)

Adaptive rates for support vector machines

Abstract: Support vector machines (SVMs) using Gaussian kernels are one of the standard and state- of-the-art learning algorithms. For such SVMs applied to least squares regression we establish new oracle inequalities. With the help of these oracle inequalities, we derive learning rates that are (essentially) minmax optimal under standard smoothness assumptions on the target function. We further utilize the oracle inequalities to show that the achieved learning rates can be adaptively obtained by a simple data- dependent parameter selection method. Furthermore, in order to reduce computational costs, we develop a localized SVM approach that is based upon a partition of the input space and trains an individual SVM on each cell of the partition. We apply this local SVM to least squares regression using Gaussian kernels and deduce local learning rates that are essentially minmax optimal under some standard smoothness assumptions on the regression function. This gives the rst motivation for using local SVMs that is not based on computational requirements but on theoretical predictions on the generalization performance.

Sara van de Geer (ETH Zürich)

Condence intervals using the graphical Lasso (joint work with Jana Jankova)

Abstract: Over the recent years much statistical theory and methodology for high-dimensional problems has been developed. However, the question of statistical inference in the sense of testing and condence intervals is less well addressed. In this talk, we consider data consisting of i.i.d. copies a high-dimensional vector X. The aim is to estimate the precision matrix (the inverse of the covariance matrix of X). We use the graphical Lasso as initial estimator and then "de-sparsify" it. Under certain (sparsity) conditions the entries of this new estimator are asymptotically normal. This leads to the construction of asymptotic condence intervals. We illustrate the theory with a simulation study. We also discuss the extension to other l1-penalized M-estimators and the concept of worst possible sub-directions.

Abstract: