# Research Group "Stochastic Algorithms and Nonparametric Statistics"

## Research Seminar "Mathematical Statistics" SS 2020

 Place: https://zoom.us/j/159082384 Time: Wednesdays, 10.00 a.m. - 12.30 p.m. 22.04.2020 Dr. Gabriel Peyre (ENS Paris) Scaling optimal transport for high dimensional learning Optimal transport (OT) has recently gained lot of interest in machine learning. It is a natural tool to compare in a geometrically faithful way probability distributions. It finds applications in both supervised learning (using geometric loss functions) and unsupervised learning (to perform generative model fitting). OT is however plagued by the curse of dimensionality, since it might require a number of samples which grows exponentially with the dimension. In this talk, I will review entropic regularization methods which define geometric loss functions approximating OT with a better sample complexity. More information and references can be found on the website of our book "Computational Optimal Transport" https://optimaltransport.github.io/ 29.04.2020 Prof. Dr. Thorsten Dickhaus (Universität Bremen) How many null hypotheses are false? Under the multiple testing framework, estimating the proportion $\pi_0$ of true null hypotheses is informative for various reasons. On the one hand, in applications like quality control or anomaly detection, the presence of a certain number of untypical data points already indicates the necessity for an intervention, no matter which of the data points are responsible for that. On the other hand, data-adaptive multiple test procedures incorporate an estimate of $\pi_0$ into their decision rules in order to optimize power. Many classical estimators of $\pi_0$ rely on the empirical cumulative distribution function (ecdf) of all marginal $p$-values, and implicitly require that the ecdf of those $p$-values which correspond to true null hypotheses is in some sense close to the main diagonal in the unit square. We will discuss three sources of violation of the latter requirement, namely (i) discreteness of the statistical model under investigation (the expected ecdf has jumps), (ii) dependencies among the marginal $p$-values (leading to clustering effects), and (iii) testing composite null hypotheses ($p$-values are super-uniform under the null). Modifications of classical estimators of $\pi_0$ will be discussed to tackle these issues. Applications include multiple testing for replicability of scientific discoveries, particularly in the context of biomarker identification. The presentation is based on [1] - [4]. References: [1] Thorsten Dickhaus, Klaus Straßburger, Daniel Schunk, Carlos Morcillo-Suarez, Thomas Illig, Arcadi Navarro (2012). How to analyze many contingency tables simultaneously in genetic association studies. Statistical Applications in Genetics and Molecular Biology, Vol. 11, No. 4, Article 12. [2] Thorsten Dickhaus (2013). Randomized p-values for multiple testing of composite null hypotheses. Journal of Statistical Planning and Inference, Vol. 143, No. 11, 1968-1979. [3] André Neumann, Taras Bodnar, Thorsten Dickhaus (2017). Estimating the Proportion of True Null Hypotheses under Copula Dependency. Research Report 2017:09, Mathematical Statistics, Stockholm University. [4] Anh-Tuan Hoang, Thorsten Dickhaus (2019). Randomized p-values for multiple testing and their application in replicability analysis. Preprint, available at arXiv.org > stat > arXiv:1912.06982 06.05.2020 Dr. Julia Schaumburg (FU Amsterdam) Dynamic clustering of multivariate panel data We propose a dynamic clustering model for studying time-varying group structures in multivariate panel data. The model is dynamic in three ways: First, the cluster means and covariance matrices are time-varying to track gradual changes in cluster characteristics over time. Second, the units of interest can transition between clusters over time based on a Hidden Markov model (HMM). Finally, the HMM's transition matrix can depend on lagged cluster distances as well as economic covariates. Monte Carlo experiments suggest that the units can be classified reliably in a variety of settings. An empirical study of 299 European banks between 2008Q1 and 2018Q2 suggests that banks have become less diverse over time in key characteristics. On average, approximately 3% of banks transition each quarter. Transitions across clusters are related to cluster dissimilarity and differences in bank profitability. 13.05.2020 N. N. 20.05.2020 Chiara Amorino (Université d'Evry Paris-Saclay) Invariant adaptive density estimation for ergodic SDE with jumps over anisotropic classes 27.05.2020 N. N. 03.06.2020 Ingrid van Keilegom (KU Leuven) On a semiparametric estimation method for AFT mixture cure models When studying survival data in the presence of right censoring, it often happens that a certain proportion of the individuals under study do not experience the event of interest and are considered as cured. The mixture cure model is one of the common models that take this feature into account. It depends on a model for the conditional probability of being cured (called the incidence) and a model for the conditional survival function of the uncured individuals (called the latency). This work considers a logistic model for the incidence and a semiparametric accelerated failure time model for the latency part. The estimation of this model is obtained via the maximization of the semiparametric likelihood, in which the unknown error density is replaced by a kernel estimator based on the Kaplan-Meier estimator of the error distribution. Asymptotic theory for consistency and asymptotic normality of the parameter estimators is provided. Moreover, the proposed estimation method is compared with a method proposed by Lu (2010), which uses a kernel approach based on the EM algorithm to estimate the model parameters. Finally, the new method is applied to data coming from a cancer clinical trial. 10.06.2020 Jonathan Niles-Weed (New York University) At 3 pm due to time shift! Minimax estimation of smooth densities in Wasserstein distance We study nonparametric density estimation problems where error is measured in the Wasserstein distance, a metric on probability distributions popular in many areas of statistics and machine learning. We give the first minimax-optimal rates for this problem for general Wasserstein distances, and show that, unlike classical nonparametric density estimation, these rates depend on whether the densities in question are bounded below. Motivated by variational problems involving the Wasserstein distance, we also show how to construct discretely supported measures, suitable for computational purposes, which achieve the minimax rates. Our main technical tool is an inequality giving a nearly tight dual characterization of the Wasserstein distances in terms of Besov norms. Joint work with Q. Berthet. 17.06.2020 Francois Bachoc (Toulouse) 24.06.2020 N. N. 01.07.20 Eric Moulines (ENS Paris) 08.07.2020 N. N. 15.07.20 Torsten Hothorn (University of Zurich) Score-based transformation learning Many statistical learning algorithms can be understood as iterative procedures for explaining variation in scores, that is, in the gradient vector of some target function. The statistical interpretation of boosting as functional gradient descent is maybe the most prominent representative, but also model-based trees and forests have been discussed from this point of view. While these algorithms are agnostic with respect to the target function, we specifically discuss scores obtained from the likelihood of fully parameterised transformation models. This model class is sufficiently large and interesting while at the same time allows for a unified theoretical and computational treatment. In this line of thinking, we can understand and implement classical procedures, such as the Wilcoxon-Mann-Whitney-Rank-Sum test, the log-rank test, maximally selected rank statistics, or regression trees and contemporary statistical learning procedures, most importantly random forests and boosting, as extremes in a continuum of increasingly complex models featuring directly interpretable parameters. We discuss prognostic and predictive models of increasing complexity as transformation models for conditional distributions. The estimation of heterogeneous treatment effects from experimental and observational data is presented as one application currently receiving much interest in various disciplines.

last reviewed: May 18, 2020 by Christine Schneider