Research Group "Stochastic Algorithms and Nonparametric Statistics"

Seminar "Modern Methods in Applied Stochastics and Nonparametric Statistics" Summer Semester 2024

16.04.2024 Ganet Some Maalvladedon (African Institute for Mathematical Sciences-Ghana)
Stochastic optimal control of a prosumer in a district heating system
We consider networks of residential heating systems in which several prosumers satisfy their heating and hot water demand using solar thermal collectors and services of a central producer. Overproduction of heat can either be stored in a local thermal storage or sold to the network. Our focus is the minimization of the prosumers expected total cost from purchasing and selling thermal energy and running the system. This decision making problem under uncertainty about the future production and consumption of thermal energy is formulated as a stochastic optimal control problem and solved with dynamic programming techniques. We present numerical results for the value function and the optimal control.
23.04.2024 Priv.-Doz. Dr. John Schoenmakers (WIAS Berlin)
Optimal stopping with randomly arriving opportunities to stop
30.04.2024

07.05.2024

14.05.2024 Dr. Thomas Möllenhoff (RIKEN, Japan)
Variational inference with natural gradient descent
21.05.2024
Different Room: ESH
28.05.2024

04.06.2024

11.06.2024 Dr. Gabriele Iommazzo (Zuse Institute Berlin)
Starting at 4 pm! Linearly converging conditional gradients over intersecting polytopes
In this work, we present a variant of the Frank-Wolfe algorithm designed to efficiently find a point in the intersection of two or more polytopes. We achieve this by optimizing a function that penalizes the distance between variables in each polytope and by computing updates within the Cartesian product of the sets. Although the objective function is not strongly convex, we establish linear convergence rates by deriving bounds on the product domain's pyramidal width, which we relate to the pyramidal widths of each individual polytope. The proposed algorithm not only addresses the approximate feasibility problem over the intersection of polytopes, but can also be used to efficiently approximate linear programming problems over the same feasible region by treating the LMOs of each polytope as black-box operations. We provide empirical evidence showcasing the applicability of the algorithm across large-scale optimization tasks.
18.06.2024

25.06.2024

02.07.2024 Dr. Davit Gogolashvili (WIAS Berlin):
When is importance weighting correction needed for covariate shift adaptation?
We investigate when the importance weighting (IW) correction is needed to address covariate shift, a common issue in supervised learning where the input distributions of training and test data differ. Classic results show that the IW correction is needed when the model is parametric and misspecified. In contrast, recent results indicate that the IW correction may not be necessary when the model is nonparametric and well-specified. We examine the missing case in the literature where the model is nonparametric and misspecified, and show that the IW correction is needed for obtaining the best approximation of the true unknown function for the test distribution. We do this by analyzing IW-corrected kernel ridge regression, covering a variety of settings, including parametric and nonparametric models, well-specified and misspecified settings, and arbitrary weighting functions.
09.07.2024 Leonid Berlyand (Pennsylvania State University)
Pruning in DNNs via regularization and the Marchenko Pastur distribution: Fully connected and ViT networks
The talk will consist of two parts: first, we present rigorous mathematical results that training with L2 regularization decreases randomness in the weight layer matrices, and at the local minima, the randomness vanishes. Furthermore, we prove that removing the randomness from weight layers decreases the loss monotonically, even without training. Here, we focus on the role of hyperparameters in the regularization term and pruning based on Marchenko-Pastur (MP) distribution. We illustrate these results with numerical examples of fully connected DNNs, showing that MP-based pruning of DNNs trained using L2 regularization reduces loss. Finally, we show how MP-based pruning on ViT models achieves state-of-the-art pruning, reducing the model's parameters while maintaining high accuracy. This is a joint work with Theo Bourdais.
16.07.2024

23.07.2024 Yuanyuan Li (Fudan University)
Function and derivative approximation by shallow neural networks
We investigate a Tikhonov regularization scheme specifically tailored for shallow neural networks within the context of solving a classic inverse problem: approximating an unknown function and its derivatives within a bounded domain based on noisy measurements. The proposed Tikhonov regularization scheme incorporates a penalty term that takes three distinct yet intricately related network (semi)norms: the extended Barron norm, the variation norm, and the Radon-BV seminorm. These choices of the penalty term are contingent upon the specific architecture of the neural network being utilized. We establish the connection between various network norms and particularly trace the dependence of the dimensionality index, aiming to deepen our understanding of how these norms interplay with each other. We revisit the universality of function approximation through various norms, establish rigorous error-bound analysis for the Tikhonov regularization scheme, and explicitly elucidate the dependency of the dimensionality index, providing a clearer understanding of how the dimensionality affects the approximation performance.
30.07.2024 Egor Gladin (HU Berlin/BMS)
Cutting plane methods and dual problems
The present talk examines cutting plane methods, which are a group of iterative algorithms for minimizing a (possibly nonsmooth) convex function over a compact convex set. We consider two prominent examples, namely, the ellipsoid method and Vaidya's method, and show that their convergence rate is preserved even when an inexact oracle is used. Furthermore, we demonstrate that it is possible to use these methods in the context of stochastic optimization efficiently. Another direction where cutting plane methods can be useful is Lagrange dual problems. Commonly, the objective and its derivatives can only be computed approximately in such problems. Thus, the methods' insensitivity to error in subgradients comes in handy. As an application example, we propose a linearly converging dual method for a constrained Markov decision process (CMDP) based on Vaidya's algorithm with an inexact oracle. The talk also discusses the concept of accuracy certificates for convex minimization problems. Certificates allow for online verification of the accuracy of approximate solutions and provide a theoretically valid online stopping criterion. We generalize the notion of accuracy certificates for the setting of an inexact first-order oracle. In particular, this includes the setting of a dual problem where the dual function and its derivatives don't necessarily have closed-form representations. Furthermore, we propose an explicit way to construct accuracy certificates for a large class of cutting plane methods that use polytopes as localizers.


last reviewed: July 15, 2024 by Christine Schneider