Research Group "Stochastic Algorithms and Nonparametric Statistics"
Research Seminar "Mathematical Statistics" Sommer Semester 2014
last reviewed:April 16, 2014, Christine Schneider
Martin Weidner (UCL)
Incidental parameter bias in panel quantile
regressions
Abstract:This paper studies linear quantile regression (QR) estimators in
panel data settings with fixed effects. The estimation error in the
fixed effects causes an incidental parameter problem in the parameters
of interest, and we work out the first order asymptotic bias under an
asymptotic where both N and T grow to infinity. This leading
incidental parameter bias is of order 1/T, analogous to the situation
in non-linear fixed effect panel models with smooth objective
function. The key technical challenge in deriving our result is that
the QR objective function is non-smooth, rendering the existing large
T asymptotic bias results in the panel literature non-applicable. We
provide analytic and Jackknife bias corrected estimators and study
their performance in Monte Carlo simulations, and in an application to
educational achievement of US high-school students.
Phillipe Vieu (Université Paul Sabatier, Toulouse)
How to deal with dimensionality in functional data analysis?
Abstract: Functional data are, by nature, infinite dimensional data, and their analysis need necessarily specific attention to the possible effects of high (in fact, infinite) dimension on the behaviour of statistical procedures. Semi-parametric modelling and variable/model selection are two fields of modern Statistics having developped methodologies for dealing with dimensionality in high (but finite) multivariate data analysis. The aim of this talk will be to discuss how these multivariate ideas can be nicely adapted to FDA and to emphasize on the fact that, even if in FDA the dimension is infinite, the continuous structure of the data allows statistical methods to be more efficient (in some senses to be precised ) than in multivariate setting.
In usual multi- (but finite) -dimensional settings, semi-parametric ideas have been widely used in order to balance the trade-off betwen very few flexibility (this is the drawback of pure parametric modelling) and dimensional effects (this is the main drawback of non-parametric modelling). In a first attempt it will be shown along this talk, through the simple Single Functional Index Model, how semi-parametric modelling behaves for FDA. From a methodological point of view one will see how this model is rather flexible without being affected by the infinite-dimensionality effect. From an applied point of view, it will be highlighted how the functional semi-parametric statistical procedures are combining good predictive power and nice possibility of interpretation of the results.
In a second attempt, one will develop specific variable selection procedures for FDA. The methodology will take fully into acount the continuous structure of the data, leading to rather low computational costs (compared with standard multivariate selection procedures), and combining again good predictive power and nice possibility of interpretation of the results.
The talk will be mainly methodlogical and centered around the presentation of these two functional methodologies, namely Functional Single Index Modelling and Variable Selection for Continuous Data. The, it will end by the presentation of some benchmark real curves dataset analysis.
Hajo Holzmann (Universität Marburg)
Nonparametric identication and estimation in a
triangular random coecient regression model
Abstract: Linear regression models with random coecients have recently become quite popular in eco-
nometrics as a tool for modeling unobserved heterogeneity. The main structural assumption which makes
these models identiable is the independence of the regressors from the random coecients, that is, the
exogeneity of the regressors. We brie y review identication in this situation. Further, we propose nonpa-
rametric estimators for the density of the random coecients in case of light-tailed and even compactly
supported regressors, and derive rates of convergence.
In the main part of the talk we will be concerned with a triangular system of linear random coecient
regression models, where the endogenous regressor in the second-stage equation is the response of an addi-
tional equation with an exogenous instrument. Without further structural assumptions on the coecients,
we give a surprising non-identiability result for the intercept in the second-stage equation. Further, we
show how identiability of the density of the coecients in the second-stage equation can be achieved via
a marginal independence assumption. Based on this result, we discuss nonparametric estimation of their
joint density.
Joint work with Stefan Hoderlein (Boston) and Alexander Meister (Rostock)
Matteo Barigozzi (LSE)
Dynamic factor models, cointegration, and error
correction mechanisms
Abstract:In this paper we study Dynamic Factor Models when the factors Ft are I(1) and singular,
i.e. rank(Ft) < dim(Ft). By combining the classic Granger Representation Theorem with
recent results by Anderson and Deistler on singular stochastic vectors, we prove that,
for generic values of the parameters, Ft has an Error Correction representation with two
unusual features: (i) the autoregressive matrix polynomial is finite, (ii) the number of
error-terms is equal to the number of transitory shocks plus the difference between the
dimension and the rank of Ft. This result is the basis for the correct specification of an
autoregressive model for Ft. Estimation of impulse-response functions is also discussed.
Results of an empirical analysis on a US quarterly database support the use of our model.
Michaël Chichignoud (ETH ZÜrich)
On bandwidth selection in empirical risk minimization
Abstract: The well-known Goldenshluger-Lepski method (GLM) allows to select multi-dimensional
bandwidths (possibly anisotropic) of kernel estimators and provides optimal results in this setting. Ho-
wever, GLM requires some linearityproperty, which is not satised in empirical risk minimization (where
a bandwidth is involved in the empirical risk). One typically deals with this issue in local M-estimation
such as local median estimate or local maximum-likelihood estimate; and in statistical learning with noisy
data such as in quantile and moment estimation, in discriminant analysis and in clustering. Many of
these studies lead to data-driven procedures selecting isotropic bandwidths, however, none of them allows
anisotropic bandwidth selection. We present a novel data-driven selection of anisotropic bandwidths in the
large setting of empirical risk minimization. The selection consists of comparing gradient empirical risks
(instead of comparing estimators). It can be viewed as a non-trivial improvement of GLM to non-linear
estimators. This method allows us to derive excess risk bounds - with fast rates of convergence - in noisy
clustering as well as adaptive minimax results for pointwise and global estimation in nonparametric
regression.
Dennis Kristensen, University College London, UK
What drives the Yield curve?
Abstract:We develop nonparametric tests for term structure dynamics to be driven by a nite number of Markov factors in a continuous-time setting. The tests are based on nonparametric estimators of the model developed under the null of the Markov hypothesis and under the alternative, respectively. We then reject the null if the estimators are statistically dierent from each other. The tests do not rely on particular functional form assumptions and so are able to disentangle the Markov hypothesis from functional form hypotheses. In particular, it allows us to test the hypothesis that the short term interest rate follows a time-homogeneous univariate Markov diusion; such a structure is frequently considered in the term structure literature. In an empirical application, we implement estimators and tests on US term structure data.
Yoosoon Chang, Indiana University, USA:
Regime switching model with endogenous auroregressive latent factor
Abstract:This paper introduces a model with regime switching, which is driven by an autoregressive latent factor correlated with the innovation to the observed time series. In our model, the mean or volatility process is switched between two regimes, depending upon whether the underlying autoregressive latent factor takes values above or below some threshold level. If the latent factor becomes exogenous, our model reduces to the conventional markov switching model, and therefore, our model may be regarded as an extended markov switching model allowing for endogeneity in regime switching. Our model is estimated by the maximum likelihood method using a newly developed modified markov switching filter. For both mean and volatility models that are frequently analyzed in markov switching framework, we demonstrate that the presence of endogeneity in regime switching is indeed strong and ubiquitous.
Piotr Kokoschka (Colorado State University)
Functional framework for high frequency financial data with focus on regression and predictability of intraday price curves
Abstract: The talk will introduce the concept of a functional time series and focus on two specific statistical problems for such series, both motivated by intraday price curves. We explain how intraday price curves can be transformed to form an approximately stationary functional time series. We consider a contemporaneous regression of such transformed daily curves on risk factors, which may be daily functions as well. We then present a significance test designed to determine if the shape of an intraday price curve can be predicted from the past shapes of such curves.
Mona Eberts (Universität Stuttgart)
Adaptive rates for support vector machines
Abstract: Support vector machines (SVMs) using Gaussian kernels are one of the standard and state-
of-the-art learning algorithms. For such SVMs applied to least squares regression we establish new oracle
inequalities. With the help of these oracle inequalities, we derive learning rates that are (essentially)
minmax optimal under standard smoothness assumptions on the target function. We further utilize the
oracle inequalities to show that the achieved learning rates can be adaptively obtained by a simple data-
dependent parameter selection method.
Furthermore, in order to reduce computational costs, we develop a localized SVM approach that is based
upon a partition of the input space and trains an individual SVM on each cell of the partition. We apply
this local SVM to least squares regression using Gaussian kernels and deduce local learning rates that
are essentially minmax optimal under some standard smoothness assumptions on the regression function.
This gives the rst motivation for using local SVMs that is not based on computational requirements but
on theoretical predictions on the generalization performance.
Sara van de Geer
(ETH Zürich)
Condence intervals using the graphical Lasso (joint work with
Jana Jankova)
Abstract: Over the recent years much statistical theory and methodology for high-dimensional
problems has been developed. However, the question of statistical inference in the sense of testing and
condence intervals is less well addressed. In this talk, we consider data consisting of i.i.d. copies a
high-dimensional vector X. The aim is to estimate the precision matrix (the inverse of the covariance
matrix of X). We use the graphical Lasso as initial estimator and then "de-sparsify" it. Under certain
(sparsity) conditions the entries of this new estimator are asymptotically normal. This leads to the
construction of asymptotic condence intervals. We illustrate the theory with a simulation study. We also
discuss the extension to other l1-penalized M-estimators and the concept of worst possible sub-directions.