|
|
|
[Contents] | [Index] |
Collaborator: D. Belomestny, V. Essaoulova, A. Hutt, P. Mathé, D. Mercurio, H.-J. Mucha, J. Polzehl, V. Spokoiny
Cooperation with: F. Baumgart (Leibniz-Institut für Neurobiologie, Magdeburg), R. Brüggemann, Ch. Heyn, U. Simon (Institut für Gewässerökologie und Binnenfischerei, Berlin), P. Bühlmann, A. McNeil (ETH Zürich, Switzerland), C. Butucea (Université Paris 10, France), M.-Y. Cheng (National Taiwan University, Taipeh), A. Daffertshofer (Free University of Amsterdam, The Netherlands), A. Dalalyan (Université Paris 6, France), J. Dolata (Johann Wolfgang Goethe-Universität Frankfurt am Main), L. Dümbgen (University of Bern, Switzerland), J. Fan (Princeton University, USA), J. Franke (Universität Kaiserslautern), R. Friedrich (Universität Münster), F. Godtliebsen (University of Tromsø, Norway), H. Goebl, E. Haimerl (Universität Salzburg), A. Goldenshluger (University of Haifa, Israel), I. Grama (Université de Bretagne-Sud, Vannes, France), J. Horowitz (Northwestern University, Chicago, USA), B. Ittermann (Physikalisch-Technische Bundesanstalt (PTB), Berlin), A. Juditsky (Université de Grenoble, France), I. Molchanov (University of Bern, Switzerland), K.-R. Müller (Fraunhofer FIRST, Berlin), M. Munk (Max-Planck-Institut für Hirnforschung, Frankfurt am Main), S.V. Pereverzev (RICAM, Linz, Austria), H. Riedel (Universität Oldenburg), B. Röhl-Kuhn (Bundesanstalt für Materialforschung und -prüfung (BAM), Berlin), R. von Sachs (Université Catholique de Louvain, Belgium), A. Samarov (Massachusetts Institute of Technology, Cambridge, USA), M. Schrauf (DaimlerChrysler, Stuttgart), S. Sperlich (University Carlos III, Madrid, Spain), U. Steinmetz (Max-Planck-Institut für Mathematik in den Naturwissenschaften, Leipzig), P. Thiesen (Universität der Bundeswehr, Hamburg), G. Torheim (Amersham Health, Oslo, Norway), C. Vial (ENSAI, Rennes, France), Y. Xia (National University of Singapore, Singapore), S. Zwanzig (Uppsala University, Sweden)
Supported by: DFG: DFG-Forschungszentrum ``Mathematik für Schlüsseltechnologien'' (Research Center ``Mathematics for Key Technologies''), project A3; SFB 373 ``Quantifikation und Simulation Ökonomischer Prozesse'' (Quantification and simulation of economic processes), Humboldt-Universität zu Berlin; Priority Program 1114 ``Mathematische Methoden der Zeitreihenanalyse und digitalen Bildverarbeitung'' (Mathematical methods for time series analysis and digital image processing)
Description: The project Statistical data analysis focuses on the development, theoretical investigation and application of modern nonparametric statistical methods, designed to model and analyze complex data structures. WIAS has, with main mathematical contributions, obtained authority for this field, including its applications to problems in technology, medicine, and environmental research as well as risk evaluation for financial products.
Methods developed in the institute within this project area can be grouped into the following main classes.
The investigation and development of adaptive smoothing methods have been driven by interesting problems from imaging and time series analysis. Applications to imaging include signal detection in functional Magnet Resonance Imaging (fMRI) and tissue classification in dynamic Magnet Resonance Imaging (dMRI) experiments, image denoising, analysis of images containing Poisson counts or binary information or the analysis of Positron Emission Tomography (PET) data.
Our approach for time series focuses on locally stationary time series models. These methods allow for abrupt changes of model parameters in time. Intended applications for financial time series include volatility modeling, volatility prediction, and risk assessment.
The models and procedures proposed and investigated at WIAS are based on two main approaches, the pointwise adaptation, originally proposed in [46] for estimation of regression functions with discontinuities, and adaptive weights smoothing, proposed in [33] in the context of image denoising.
The main idea of the pointwise adaptive approach is to search, in each design point, for the largest acceptable window that does not contradict to the assumed local model, and to use the data within this window to obtain local parameter estimates. This allows for estimates with nearly minimal variance under controlled bias.
The general concept behind adaptive weights smoothing is structural adaptation. The procedure attempts to recover the unknown local structure from the data in an iterative way while utilizing the obtained structural information to improve the quality of estimation. This approach possesses a number of remarkable properties like preservation of edges and contrasts and nearly optimal noise reduction inside large homogeneous regions. It is almost dimension free and is applicable to high-dimensional situations.
Both ideas have been investigated and applied in a variety of settings.
The use of spatially adaptive smoothing methods allows, in comparison to a voxelwise decision, for an improved sensitivity and specitivity of signal detection and, in contrast to nonadaptive approaches, to preserve information about the shape of regions of interest. Figure 3 provides a comparison for one fMRI series.
The adaptive weights smoothing approach has been generalized to time varying GARCH models and semiparametric GARCH models in [37]. The procedure involves new ideas on localization of GARCH models. Simulations and applications on financial data show that phenomena like long-range dependence and heavy tails, which are often considered as inherent to financial time series, can as well be interpreted by nonstationarity.
Both the semiparametric GARCH(1,1) and the local constant AWS volatility model can be used to analyze the local stationarity structure. They allow for an improved volatility prediction and explain the observed heavy tails of the logarithmic returns. Applications of the methodology are intended in cooperation with the project Applied mathematical finance.
Figure 4 illustrates volatility estimates obtained for the time series of DAX values.
Nonparametric filters often involve some filtering parameters. These parameters can be chosen to optimize the performance locally at each time point or globally over a time interval. In [5], the filtering parameters are obtained minimizing the prediction error for a large class of filters. Under a general martingale setting, with mild conditions on the time series structure and virtually no assumption on filters, the adaptive filter with filtering parameter chosen on the basis of historical data is shown to perform nearly as well as the one with the ideal filter in the class, in terms of filtering errors. The theoretical result is also verified via intensive simulations. The approach can be used to choose the order of parametric models such as AR or GARCH processes. It can also be applied to volatility estimation in financial economics.
In [14], the pointwise adaptive approach is extended to tail index estimation. The approach is based on approximation by an exponential model. The proposed procedure adaptively selects the number of upper order statistics used in the estimation of the tail of the distribution function. The selection procedure consists in consecutive testing the hypothesis of homogeneity of the estimated parameter against the change-point alternative. The selected number of upper order statistics corresponds to the first detected change-point. The main results are non-asymptotic and state optimality of the proposed method in the ``oracle'' sense.
A similar idea is used for tail index estimation by adaptive weights smoothing in [35].
Bootstrap samples and subsets are equivalent to the choices of weights of observations. This is used to make resampling computationally effective. The built-in validation is an automatic technique with default values for the parameters of the simulations.
For illustration purposes observed birth and death rates from 225 countries are investigated. Ranks are used for hierarchical clustering by Ward's minimum variance method. The stability of the result is investigated by random weighting of observations. In doing so, 200 such replicates were clustered by Ward's method.
Figure 6 illustrates the obtained clusters (left) and the statistics used to validate the result (right). The unique solution is compared with results from the bootstrap samples by the adjusted Rand index R. The axis at the left-hand side and the bars in the graphic are assigned to the standard deviation of R, whereas the axis at the right-hand side scales box-plots showing median, mean, upper, and lower 5 percent quantile of R. The median of R for K = 2 is near to its theoretical maximum value 1. That means, the two cluster solution is stable. It can be confirmed to a high degree for almost all samples. For more than two clusters, the median (or mean) of the adjusted Rand values is much smaller. Therefore the number of clusters K = 2 is the most likely one.
Data sets from economy or finance are often high dimensional. Usually many characteristics of a firm or an asset are monitored without knowledge which characteristics are needed to answer specific questions. Data structures often do not allow for simple parametric models. Nonparametric statistical modeling of such data suffers from the curse of dimensionality problem (high-dimensional data are very sparse). Fortunately, in many cases structures in complex high-dimensional data live in low-dimensional, but usually unknown subspaces. This property can be used to construct efficient procedures to simultaneously identify and estimate the structure inherent to the data set. The most common models in this context are additive models, single- and multi-index models and partial linear models. These models focus on index vectors or dimension reduction spaces which allow to reduce the dimensionality of the data without essential loss of information. They generalize classic linear models and constitute a reasonable compromise between too restrictive linear and too vague pure nonparametric modeling.
Indirect methods of index estimation like the nonparametric least squares estimator, or nonparametric maximum likelihood estimator have been shown to be asymptotically efficient, but their practical applications are very restricted. The reason is that their evaluation leads to an optimization problem in a high-dimensional space, see [20]. In contrast, computationally straightforward direct methods like the average derivative estimator, or sliced inverse regression behave far from optimally, again due to the ``curse of dimensionality'' problem.
[17] developed a structural adaptive approach to dimension reduction using the structural assumptions of a single-index and multi-index model. The method allows for an asymptotically efficient estimation of the dimension reduction space and of the link function. [47] improves on these procedures for single- and multi-index models and generalizes it to the case of partially linear models and partially linear multi-index models.
[45] proposes a new method for partially linear models whose nonlinear component is completely unknown. The target of analysis is identification of regressors which enter in a nonlinear way in the model, and complete estimation of the model including slope coefficients of the linear component and the link function of the nonlinear component. The procedure allows for selecting the significant regression variables. As a by-product, a test that the nonlinear component is M -dimensional for M = 0, 1, 2,... is developed. The proposed approach is fully adaptive to the unknown model structure and applies under mild conditions on the model. The only important assumption is that the dimensionality of the nonlinear component is relatively small. Theoretical results indicate that the procedure provides a prescribed level of the identification error and estimates the linear component with an accuracy of order n-1/2 . A numerical study demonstrates a very good performance of the method even for small or moderate sample sizes.
Ill-posed equations arise frequently in the context of inverse problems, where it is the aim to determine some unknown characteristics of a physical system from data corrupted by measurement errors.
The problem of reconstructing a planar convex set from noisy observations of its moments is considered in [12] . An estimation method based on pointwise recovering of the support function of the set is developed. We study intrinsic accuracy limitations in the shape-from-moments estimation problem by establishing a lower bound on the rate of convergence of the mean squared error. It is shown that the proposed estimator is near-optimal in the sense of the order. An application to tomographic reconstruction is discussed, and it is indicated how the proposed estimation method can be used for recovering edges from noisy Radon data. This constitutes a first step to adaptive estimation procedures for Positron Emission Tomography (PET).
For ill-posed problems it is often impossible to get sensible results unless special methods, such as Tikhonov regularization, are used. Work in this direction is carried out in collaboration with S.V. Pereverzev, RICAM Linz. We study linear problems where an operator A acts injectively and is compact in some Hilbert space, and the equation is disturbed by noise. Under a priori smoothness assumptions on the exact solution x, such problems can be regularized. Within the present paradigm, smoothness is given in terms of general source conditions, expressed through the operator A as x = (A * A)v, | v| R, for some increasing function , (0) = 0. This approach allows to treat regularly and severely ill-posed problems in the same way. The deterministic theory for such equations was developed in [23, 24], including discretization and adaptation to unknown source conditions. The statistical setup is more complicated. However, based on the seminal work by [31], we could also extend this, including several ill-posed problems as studied by [13, 50]. One is often not interested in the calculation of the complete solution x, but only in some functional, say, z, x of it, where z is given afore-hand. If this is the case, then the linear functional strategy, as proposed by Anderssen ([1]) is important. Previous analysis of this strategy is extended, as carried out in [25] to the present setup.
The analysis of ill-posed problems under general source conditions raises many new issues and bridges between approximation theory and interpolation theory in function spaces.
Natural inference problems for parameters in stochastic processes lead to ill-posed inverse problems. A first instance is the problem of nonparametric estimation of the weight measure a in the stochastic delay differential equation
A surprising fact is that in the classical scalar diffusion model
In cooperation with the project Applied mathematical finance a root-N consistent Monte Carlo estimator for a diffusion density ([28]) has been developed. The approach has been applied to an environmental problem ([3]) and extended to a large class of models for stochastic processes in discrete time ([29]). These models allow in particular for realistic estimation of ruin probabilities in finance.
In [2], new algorithms for the evaluation of American options using consumption processes are proposed. The approach is based on the fact that an American option is equivalent to a European option with a consumption process involved. A new method of sequential improvement of an initial approximation based on step-by-step interchanging between lower and upper bounds is developed. Various smoothing techniques are used to approximate the bounds in each step and hence to reduce the complexity of algorithm. The results of numerical experiments confirm efficiency of the algorithms proposed. Applications are intended within the project Applied mathematical finance.
Simulation-extrapolation-type estimators in errors-in-variables models are investigated in [38, 39]. These estimates generalize and improve proposals from [7, 48].
References:
|
|
|
[Contents] | [Index] |