Research Group "Stochastic Algorithms and Nonparametric Statistics"

Research Seminar "Mathematical Statistics" Summer Semester 2024

17.04.2024 Dr. Gil Kur (ETH Zürich)
Connections between minimum norm interpolation and local theory of Banach spaces
24.04.2024 Dr. Nicolas Verzelen (INRAE Montpellier)
Computational trade-offs in high-dimensional clustering
01.05.2024 Public Holiday

08.05.2024 Dr. Georg Keilbar & Ratmir Miftachov (HU Berlin)
Shapley curves: A smoothing perspective
This paper fills the limited statistical understanding of Shapley values as a variable importance measure from a nonparametric (or smoothing) perspective. We introduce population-level Shapley curves to measure the true variable importance, determined by the conditional expectation function and the distribution of covariates. Having defined the estimand, we derive minimax convergence rates and asymptotic normality under general conditions for the two leading estimation strategies. For finite sample inference, we propose a novel version of the wild bootstrap procedure tailored for capturing lower-order terms in the estimation of Shapley curves. Numerical studies confirm our theoretical findings, and an empirical application analyzes the determining factors of vehicle prices.
15.05.2024 Fabian Telschow (HU Berlin)
Estimation of the expected Euler characteristic of excursion sets of random fields and applications to simultaneous confidence bands
The expected Euler characteristic (EEC) of excursion sets of a smooth Gaussian-related random field over a compact manifold can be used to approximate the distribution of its supremum for high thresholds. Viewed as a function of the excursion threshold, the EEC of a Gaussian-related field is expressed by the Gaussian kinematic formula (GKF) as a finite sum of known functions multiplied by the Lipschitz–Killing curvatures (LKCs) of the generating Gaussian field. In the first part of this talk we present consistent estimators of the LKCs as linear projections of ''pinned" Euler characteristic (EC) curves obtained from realizations of zero-mean, unit variance Gaussian processes. As observed data seldom is Gaussian, we generalize these LKC estimators by an unusual use of the Gaussian multiplier bootstrap to obtain consistent estimates of the LKCs of Gaussian limiting fields of non-stationary statistics. In the second part, we explain applications of LKC estimation and the GKF to simultaneous familywise error rate inference, for example, by constructing simultaneous confidence bands and CoPE sets for spatial functional data over complex domains such as fMRI and climate data and discuss their benefits and drawbacks compared to other methodologies.
22.05.2024 Prof. Dr. Vladimir Spokoiny (WIAS Berlin)
Gaussian variational inference in high dimension
We consider the problem of approximating a high-dimensional distribution by a Gaussian one by minimizing the Kullback-Leibler divergence. The main result extends Katsevich and Rigollet (2023) and claims that the minimiser can be well approximated by the Gaussian distribution with the mean and variance as for the underlying measure. We also describe the accuracy of approximation and the range of applicability for such approximation in terms of efficient dimension. The obtained results can be used for analysis of various sampling scheme in optimization.
29.05.2024 Prof. Dr. Tailen Hsing (University of Michigan)
A functional-data perspective in spatial data analysis
More and more spatiotemporal data nowadays can be viewed as functional data. The first part of the talk focuses on the Argo data, which is a modern oceanography dataset that provides unprecedented global coverage of temperature and salinity measurements in the upper 2,000 meters of depth of the ocean. I will discuss a functional kriging approach to predict temperature and salinity as a smooth function of depth, as well as a co-kriging approach of predicting oxygen concentration based on temperature and salinity data. In the second part of the talk, I will give an overview on some related topics, including spectral density estimation and variable selection for functional data.
05.06.2024 Dr. Jia-Jie Zhu (WIAS Berlin)
Wasserstein and beyond: Optimal transport and gradient flows for machine learning and optimization
In the first part of the talk, I will provide an overview of gradient flows over non-negative and probability measures and their application in modern machine learning tasks, such as variational inference, sampling, training of over-parameterized models, and robust optimization. Then, I will present our recent results on the analysis of a couple of particularly relevant gradient flows, including the settings of Wasserstein, Hellinger/Fisher-Rao, and reproducing kernel Hilbert space. The focus is on the global exponential decay of the entropy functionals along the gradient flows such as Hellinger-Kantorovich (a.k.a. Wasserstein-Fisher-Rao) and a new type of gradient flow geometries that guarantee convergence of minimizing a maximum-mean discrepancy, which we term the interaction-force transport.
12.06.2024 Prof. Dr. Marc Hallin (Université Libre de Bruxelles)
Achtung anderer Raum: R.406, 4. OG ! The long quest for quantiles and ranks in Rd and on manifolds
Quantiles are a fundamental concept in probability, and an essential tool in statistics, from descriptive to inferential. Still, despite half a century of attempts, no satisfactory and fully agreed-upon definition of the concept, and the dual notion of ranks, is available beyond the well-understood case of univariate variables and distributions. The need for such a definition is particularly critical for varia- bles taking values in Rd, for directional variables (values on the hypersphere), and, more generally, for variables with values on manifolds. Unlike the real line, indeed, no canonical ordering is available on the- se domains. We show how measure transportation brings a solution to this problem by characterizing distribution-specific (data-driven, in the empirical case) orderings and center-outward distribution and quantile functions (ranks and signs in the empirical case) that satisfy all the properties expected from such concepts while reducing, in the case of real-valued variables, to the classical univariate notion.
19.06.2024 Evaluierung

26.06.2024 Dr. Clément Berenfeld (Universität Potsdam)
Achtung anderer Raum u. anderes Geb.: R. 3.13 im HVP 11a ! A theory of stratification learning
Given i.i.d. sample from a stratified mixture of immersed manifolds of different dimensions, we study the minimax estimation of the underlying stratified structure. We provide a constructive algorithm allowing to estimate each mixture component at its optimal dimension-specific rate adaptively. The method is based on an ascending hierarchical co-detection of points belonging to different layers, which also identifies the number of layers and their dimensions, assigns each data point to a layer accurately, and estimates tangent spaces optimally. These results hold regardless of any ambient assumption on the manifolds or on their intersection configurations. They open the way to a broad clustering framework, where each mixture component models a cluster emanating from a specific nonlinear correlation phenomenon.
03.07.2024 Prof. Dr. Celine Duval (Université de Lille)
Geometry of excursion sets: Computing the surface area from discretized points
The excursion sets of a smooth random field carries relevant information in its various geometric measures. After an introduction of these geometrical quantities showing how they are related to the parameters of the field, we focus on the problem of discretization. From a computational viewpoint, one never has access to the continuous observation of the excursion set, but rather to observations at discrete points in space. It has been reported that for specific regular lattices of points in dimensions 2 and 3, the usual estimate of the surface area of the excursions remains biased even when the lattice becomes dense in the domain of observation. We show that this limiting bias is invariant to the locations of the observation points and that it only depends on the ambient dimension. (based on joint works with H. Biermé, R. Cotsakis, E. Di Bernardino and A. Estrade).
10.07.2024 Dr. Anya Katsevich (MIT, Cambridge, MA)
Laplace asymptotics in high-dimensional Bayesian inference
Computing integrals against a high-dimensional posterior is the major computational bottleneck in Bayesian inference. A popular technique to reduce this computational burden is to use the Laplace approximation (LA), a Gaussian distribution, in place of the true posterior. We derive a new, leading order asymptotic decomposition of integrals against a high-dimensional Laplace-type posterior which sheds valuable insight on the accuracy of the LA in high dimensions. In particular, we determine the tight dimension dependence of the approximation error, leading to the tightest known Bernstein von Mises result on the asymptotic normality of the posterior. The decomposition also leads to a simple modification to the LA which yields a higher-order accurate approximation to the posterior. Finally, we prove the validity of the high-dimensional Laplace asymptotic expansion to arbitrary order, which opens the door to approximating the partition function, of use in high-dimensional model selection and many other applications beyond statistics.
17.07.2024



last reviewed: July 2, 2024 by Christine Schneider