Publications
Monographs

L. Starke, K. Tabelow, Th. Niendorf, A. Pohlmann, eds., Denoising for improved parametric MRI of the kidney: Protocol for nonlocal means filtering, 2216 of Methods in Molecular Biology, Humana, New York, NY., 2021, pp. 565576, (Chapter Published), DOI 10.1007/9781071609781_34 .

P. Friz, M. Hairer, A Course on Rough Paths: With an Introduction to Regularity Structures, Universitext, Springer International Publishing, Basel, 2020, 346 pages, (Monograph Published), DOI 10.1007/9783030415563 .
Articles in Refereed Journals

C. Bellingeri, A. Djurdjevac, P. Friz, N. Tapia, Transport and continuity equations with (very) rough noise, SN Partial Differential Equations and Applications, 2 (2021), pp. 226, DOI 10.1007/s4298502100101y .
Abstract
Existence and uniqueness for rough flows, transport and continuity equations driven by general geometric rough paths are established. 
J. Diehl, K. EbrahimiFard, N. Tapia, Iteratedsums signature, quasisymmetric functions and time series analysis, Seminaire Lotharingien de Combinatoire, 84B (2021), pp. 86/186/12, DOI 10.20347/WIAS.PREPRINT.2736 .
Abstract
We survey and extend results on a recently defined character on the quasishuffle algebra. This character, termed iteratedsums signature, appears in the context of time series analysis and originates from a problem in dynamic time warping. Algebraically, it relates to (multidimensional) quasisymmetric functions as well as (deformed) quasishuffle algebras. 
A. Kroshnin, V. Spokoiny, A. Suvorikova, Statistical inference for BuresWasserstein barycenters, The Annals of Applied Probability, 31 (2021), pp. 12641298, DOI 10.1214/20AAP1618 .

N. Tupitsa, P. Dvurechensky, A. Gasnikov, S. Guminov, Alternating minimization methods for strongly convex optimization, Journal of Inverse and IllPosed Problems, (2021), published online on 16.04.2021, DOI 10.1515/jiip20200074 .
Abstract
We consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under PolyakŁojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in manyblocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the nonaccelerated method. 
D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems, Journal of Inverse and IllPosed Problems, 29 (2021), pp. 385405, DOI 10.1515/jiip20200068 .
Abstract
We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using minibatching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems. 
P. Friz, P. Gassiat, P. Pigato , Precise asymptotics: Robust stochastic volatility models, The Annals of Applied Probability, 31 (2021), pp. 896940, DOI 10.1214/20AAP1608 .
Abstract
We present a new methodology to analyze large classes of (classical and rough) stochastic volatility models, with special regard to shorttime and small noise formulae for option prices. Our main tool is the theory of regularity structures, which we use in the form of Bayer et al. (Math. Finance30 (2020) 782832) In essence, we implement a Laplace method on the space of models (in the sense of Hairer), which generalizes classical works of Azencott and Ben Arous on path space and then Aida, InahamaKawabi on rough path space. When applied to rough volatility models, for example, in the setting of Bayer, Friz and Gatheral (Quant. Finance16 (2016) 887904) and FordeZhang (SIAM J. Financial Math.8 (2017) 114145), one obtains precise asymptotics for European options which refine known large deviation asymptotics. 
A. Maltsi, T. Niermann, T. Streckenbach, K. Tabelow, Th. Koprucki, Numerical simulation of TEM images for In(Ga)As/GaAs quantum dots with various shapes, Optical and Quantum Electronics, 52 (2020), pp. 257/1257/11, DOI 10.1007/s1108202002356y .
Abstract
We present a mathematical model and a tool chain for the numerical simulation of TEM images of semiconductor quantum dots (QDs). This includes elasticity theory to obtain the strain profile coupled with the DarwinHowieWhelan equations, describing the propagation of the electron wave through the sample. We perform a simulation study on indium gallium arsenide QDs with different shapes and compare the resulting TEM images to experimental ones. This tool chain can be applied to generate a database of simulated TEM images, which is a key element of a novel concept for modelbased geometry reconstruction of semiconductor QDs, involving machine learning techniques. 
O. Butkovsky, A. Kulik, M. Scheutzow, Generalized couplings and ergodic rates for SPDEs and other Markov models, The Annals of Applied Probability, 30 (2020), pp. 139, DOI 10.1214/19AAP1485 .
Abstract
We establish verifiable general sufficient conditions for exponential or subexponential ergodicity of Markov processes that may lack the strong Feller property. We apply the obtained results to show exponential ergodicity of a variety of nonlinear stochastic partial differential equations with additive forcing, including 2D stochastic NavierStokes equations. Our main tool is a new version of the generalized coupling method. 
O. Butkovsky, M. Scheutzow, Couplings via comparison principle and exponential ergodicity of SPDEs in the hypoelliptic setting, Communications in Mathematical Physics, 379 (2020), pp. 10011034, DOI 10.1007/s0022002003834w .
Abstract
We develop a general framework for studying ergodicity of orderpreserving Markov semigroups. We establish natural and in a certain sense optimal conditions for existence and uniqueness of the invariant measure and exponential convergence of transition probabilities of an orderpreserving Markov process. As an application, we show exponential ergodicity and exponentially fast synchronizationbynoise of the stochastic reaction?diffusion equation in the hypoelliptic setting. This refines and complements corresponding results of Hairer and Mattingly (Electron J Probab 16:658?738, 2011). 
N. Tapia, L. Zambotti, The geometry of the space of branched rough paths, Proceedings of the London Mathematical Society. Third Series, 121 (2020), pp. 220251, DOI 10.1112/plms.12311 .
Abstract
We construct an explicit transitive free action of a Banach space of Hölder functions on the space of branched rough paths, which yields in particular a bijection between theses two spaces. This endows the space of branched rough paths with the structure of a principal homogeneous space over a Banach space and allows to characterize its automorphisms. The construction is based on the BakerCampbellHausdorff formula, on a constructive version of the LyonsVictoir extension theorem and on the HairerKelly map, which allows to describe branched rough paths in terms of anisotropic geometric rough paths. 
A. Ivanova, P. Dvurechensky, A. Gasnikov, Composite optimization for the resource allocation problem, Optimization Methods & Software, published online on 12.02.2020, urlhttps://doi.org/10.1080/10556788.2020.1712599, DOI 10.1080/10556788.2020.1712599 .
Abstract
In this paper we consider resource allocation problem stated as a convex minimization problem with linear constraints. To solve this problem, we use gradient and accelerated gradient descent applied to the dual problem and prove the convergence rate both for the primal iterates and the dual iterates. We obtain faster convergence rates than the ones known in the literature. We also provide economic interpretation for these two methods. This means that iterations of the algorithms naturally correspond to the process of price and production adjustment in order to obtain the desired production volume in the economy. Overall, we show how these actions of the economic agents lead the whole system to the equilibrium. 
M.S. Alkusa, A. Gasnikov, D. Dvinskikh, D.A. Kovalev, F.S. Stonyakin, Accelerated methods for saddlepoint problem, Computational Mathematics and Mathematical Physics, 60 (2020), pp. 17871809, DOI 10.1134/S0965542520110020 .
Abstract
Recently, it has been shown how, on the basis of the usual accelerated gradient method for solving problems of smooth convex optimization, accelerated methods for more complex problems (with a structure) and problems that are solved using various local information about the behavior of a function (stochastic gradient, Hessian, etc.) can be obtained. The term ?accelerated methods? here means, on the one hand, the presence of some unified and fairly general way of acceleration. On the other hand, this also means the optimality of the methods, which can often be proved rigorously. In the present work, an attempt is made to construct in the same way a theory of accelerated methods for solving smooth convexconcave saddlepoint problems with a structure. The main result of this article is the obtainment of in some sense necessary and sufficient conditions under which the complexity of solving nonlinear convexconcave saddlepoint problems with a structure in the number of calculations of the gradients of composites in direct variables is equal in order of magnitude to the complexity of solving bilinear problems with a structure. 
A. Anikin, A. Gasnikov, A. Gornov, D. Kamzolov, Y. Maximov, Y. Nesterov, Efficient numerical methods to solve sparse linear equations with application to PageRank, Optimization Methods & Software, published online on 21.12.2020, urlhttps://doi.org/10.1080/10556788.2020.1858297, DOI 10.1080/10556788.2020.1858297 .
Abstract
Over the last two decades, the PageRank problem has received increased interest from the academic community as an efficient tool to estimate webpage importance in information retrieval. Despite numerous developments, the design of efficient optimization algorithms for the PageRank problem is still a challenge. This paper proposes three new algorithms with a linear time complexity for solving the problem over a boundeddegree graph. The idea behind them is to set up the PageRank as a convex minimization problem over a unit simplex, and then solve it using iterative methods with small iteration complexity. Our theoretical results are supported by an extensive empirical justification using realworld and simulated data. 
S. Athreya, O. Butkovsky, L. Mytnik, Strong existence and uniqueness for stable stochastic differential equations with distributional drift, The Annals of Probability, 48 (2020), pp. 178210, DOI 10.1214/19AOP1358 .

F. Bachoc, A. Suvorikova , D. Ginsbourger, J.M. Loubes, V. Spokoiny, Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding, Electronic Journal of Statistics, 14 (2020), pp. 27422772, DOI 10.1214/20EJS1725 .
Abstract
In this work, we propose a way to construct Gaussian processes indexed by multidimensional distributions. More precisely, we tackle the problem of defining positive definite kernels between multivariate distributions via notions of optimal transport and appealing to Hilbert space embeddings. Besides presenting a characterization of radial positive definite and strictly positive definite kernels on general Hilbert spaces, we investigate the statistical properties of our theoretical and empirical kernels, focusing in particular on consistency as well as the special case of Gaussian distributions. A wide set of applications is presented, both using simulations and implementation with real data. 
D. Belomestny, M. Kaledin, J.G.M. Schoenmakers, Semitractability of optimal stopping problems via a weighted stochastic mesh algorithm, Mathematical Finance. An International Journal of Mathematics, Statistics and Financial Economics, 30 (2020), pp. 15911616, DOI 10.1111/mafi.12271 .
Abstract
In this article we propose a Weighted Stochastic Mesh (WSM) algorithm for approximating the value of discrete and continuous time optimal stopping problems. It is shown that in the discrete time case the WSM algorithm leads to semitractability of the corresponding optimal stopping problem in the sense that its complexity is bounded in order by $varepsilon^4log^d+2(1/varepsilon)$ with $d$ being the dimension of the underlying Markov chain. Furthermore we study the WSM approach in the context of continuous time optimal stopping problems and derive the corresponding complexity bounds. Although we can not prove semitractability in this case, our bounds turn out to be the tightest ones among the complexity bounds known in the literature. We illustrate our theoretical findings by a numerical example. 
D. Belomestny, J.G.M. Schoenmakers, V. Spokoiny, B. Zharkynbay, Optimal stopping via reinforced regression, Communications in Mathematical Sciences, 18 (2020), pp. 109121, DOI 10.4310/CMS.2020.v18.n1.a5 .
Abstract
In this note we propose a new approach towards solving numerically optimal stopping problems via boosted regression based Monte Carlo algorithms. The main idea of the method is to boost standard linear regression algorithms in each backward induction step by adding new basis functions based on previously estimated continuation values. The proposed methodology is illustrated by several numerical examples from finance. 
D. Belomestny, J.G.M. Schoenmakers, Optimal stopping of McKeanVlasov diffusions via regression on particle systems, SIAM Journal on Control and Optimization, 58 (2020), pp. 529550, DOI 10.1137/18M1195590 .
Abstract
In this note we consider the problem of using regression on interacting particles to compute conditional expectations for McKeanVlasov SDEs. We prove general result on convergence of linear regression algorithms and establish the corresponding rates of convergence. Application to optimal stopping and variance reduction are considered. 
I. Chevyrev, P. Friz, A. Korepanov, I. Melbourne, Superdiffusive limits for deterministic fastslow dynamical systems, Probability Theory and Related Fields, 178 (2020), pp. 735770, DOI 10.1007/s00440020009885 .

M. Coghi, J.D. Deuschel, P. Friz, M. Maurelli, Pathwise McKeanVlasov theory with additive noise, The Annals of Applied Probability, 30 (2020), pp. 23552392, DOI 10.1214/20AAP1560 .
Abstract
We take a pathwise approach to classical McKeanVlasov stochastic differential equations with additive noise, as e.g. exposed in Sznitmann [34]. Our study was prompted by some concrete problems in battery modelling [19], and also by recent progress on roughpathwise McKeanVlasov theory, notably CassLyons [9], and then Bailleul, Catellier and Delarue [4]. Such a “pathwise McKeanVlasov theory” can be traced back to Tanaka [36]. This paper can be seen as an attempt to advertize the ideas, power and simplicity of the pathwise appproach, not so easily extracted from [4, 9, 36]. As novel applications we discuss mean field convergence without a priori independence and exchangeability assumption; common noise and reflecting boundaries. Last not least, we generalize DawsonGärtner large deviations to a nonBrownian noise setting. 
J. Diehl, E.F. Kurusch, N. Tapia, Timewarping invariants of multidimensional time series, Acta Applicandae Mathematicae. An International Survey Journal on Applying Mathematics and Mathematical Applications, published online on 14.05.2020, DOI 10.1007/s1044002000333x .
Abstract
In data science, one is often confronted with a time series representing measurements of some quantity of interest. Usually, in a first step, features of the time series need to be extracted. These are numerical quantities that aim to succinctly describe the data and to dampen the influence of noise. In some applications, these features are also required to satisfy some invariance properties. In this paper, we concentrate on timewarping invariants.We show that these correspond to a certain family of iterated sums of the increments of the time series, known as quasisymmetric functions in the mathematics literature. We present these invariant features in an algebraic framework, and we develop some of their basic properties. 
D. Kamzolov, P. Dvurechensky, A. Gasnikov, Universal intermediate gradient method for convex problems with inexact oracle, Optimization Methods & Software, published online on 17.01.2020, urlhttps://doi.org/10.1080/10556788.2019.1711079, DOI 10.1080/10556788.2019.1711079 .

L.Ch. Lin, Y. Chen, G. Pan, V. Spokoiny, Efficient and positive semidefinite preaveraging realized covariance estimator, Statistica Sinica, 31 (2021), pp. 122 (published online on 23.11.2020), DOI 10.5705/ss.202017.0489 .

S. Lu, P. Mathé, S. Pereverzyev, Randomized matrix approximation to enhance regularized projection schemes in inverse problems, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 36 (2020), pp. 085013/1 085013/20, DOI 10.1088/13616420/ab9c44 .
Abstract
The authors consider a randomized solution to illposed operator equations in Hilbert spaces. In contrast to statistical inverse problems, where randomness appears in the noise, here randomness arises in the lowrank matrix approximation of the forward operator, which results in using a Monte Carlo method to solve the inverse problems. In particular, this approach follows the paradigm of the study N. Halko et al 2011 SIAM Rev. 53 217?288, and hence regularization is performed based on the lowrank matrix approximation. Error bounds for the mean error are obtained which take into account solution smoothness and the inherent noise level. Based on the structure of the error decomposition the authors propose a novel algorithm which guarantees (on the mean) a prescribed error tolerance. Numerical simulations confirm the theoretical findings. 
Y. Nesterov , A. Gasnikov, S. Guminov, P. Dvurechensky, Primaldual accelerated gradient methods with smalldimensional relaxation oracle, Optimization Methods & Software, published online on 02.03.2020, urlhttps://doi.org/10.1080/10556788.2020.1731747, DOI 10.1080/10556788.2020.1731747 .

Y.Y. Park, J. Polzehl, S. Chatterjee, A. Brechmann, M. Fiecas, Semiparametric modeling of timevarying activation and connectivity in taskbased fMRI data, Computational Statistics & Data Analysis, 150 (2020), pp. 107006/1107006/14, DOI 10.1016/j.csda.2020.107006 .
Abstract
In functional magnetic resonance imaging (fMRI), there is a rise in evidence that timevarying functional connectivity, or dynamic functional connectivity (dFC), which measures changes in the synchronization of brain activity, provides additional information on brain networks not captured by timeinvariant (i.e., static) functional connectivity. While there have been many developments for statistical models of dFC in restingstate fMRI, there remains a gap in the literature on how to simultaneously model both dFC and timevarying activation when the study participants are undergoing experimental tasks designed to probe at a cognitive process of interest. A method is proposed to estimate dFC between two regions of interest (ROIs) in taskbased fMRI where the activation effects are also allowed to vary over time. The proposed method, called TVAAC (timevarying activation and connectivity), uses penalized splines to model both timevarying activation effects and timevarying functional connectivity and uses the bootstrap for statistical inference. Simulation studies show that TVAAC can estimate both static and timevarying activation and functional connectivity, while ignoring timevarying activation effects would lead to poor estimation of dFC. An empirical illustration is provided by applying TVAAC to analyze two subjects from an eventrelated fMRI learning experiment. 
N. Puchkin, V. Spokoiny, An adaptive multiclass nearest neighbor classifier, ESAIM. Probability and Statistics, 24 (2020), pp. 6999, DOI 10.1051/ps/2019021 .

A. Rastogi, G. Blanchard, P. Mathé, Convergence analysis of Tikhonov regularization for nonlinear statistical inverse problems, Electronic Journal of Statistics, 14 (2020), pp. 27982841, DOI 10.1214/20EJS1735 .
Abstract
We study a nonlinear statistical inverse problem, where we observe the noisy image of a quantity through a nonlinear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization) approach to estimate the quantity for the nonlinear illposed inverse problem. The estimator is defined as the minimizer of a Tikhonov functional, which is the sum of a data misfit term and a quadratic penalty term. We develop a theoretical analysis for the minimizer of the Tikhonov regularization scheme using the concept of reproducing kernel Hilbert spaces. We discuss optimal rates of convergence for the proposed scheme, uniformly over classes of admissible solutions, defined through appropriate source conditions. 
D. Dvinskikh, S.S. Omelchenko, A. Gasnikov, A.I. Tyurin, Accelerated gradient sliding for minimizing a sum of functions, Doklady Mathematics. Maik Nauka/Interperiodica Publishing, Moscow. English. Translation of the Mathematics Section of: Doklady Akademii Nauk. (Formerly: Russian Academy of Sciences. Doklady. Mathematics)., 101 (2020), pp. 244246, DOI 10.1134/S1064562420030084 .
Abstract
We propose a new way of justifying the accelerated gradient sliding of G. Lan, which allows one to extend the sliding technique to a combination of an accelerated gradient method with an accelerated variance reduction method. New optimal estimates for the solution of the problem of minimizing a sum of smooth strongly convex functions with a smooth regularizer are obtained. 
D. Dvinskikh, A.I. Tyurin, A. Gasnikov, S.S. Omelchenko, Accelerated and unaccelerated stochastic gradient descent in model generality, Mathematical Notes, 108 (2020), pp. 511522, DOI 10.1134/S0001434620090230 .
Abstract
A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom nature. In contrast to earlier results, all estimates in the present paper are obtained in model generality. 
CH. Bayer, M. Redmann, J.G.M. Schoenmakers, Dynamic programming for optimal stopping via pseudoregression, Quantitative Finance, published online on 01.09.2020, urlhttps://doi.org/10.1080/14697688.2020.1780299, DOI 10.1080/14697688.2020.1780299 .
Abstract
We introduce new variants of classical regressionbased algorithms for optimal stopping problems based on computation of regression coefficients by Monte Carlo approximation of the corresponding L^{2} inner products instead of the leastsquares error functional. Coupled with new proposals for simulation of the underlying samples, we call the approach "pseudo regression". We show that the approach leads to asymptotically smaller errors, as well as less computational cost. The analysis is justified by numerical examples. 
CH. Bayer, D. Belomestny, M. Redmann, S. Riedel, J.G.M. Schoenmakers, Solving linear parabolic rough partial differential equations, Journal of Mathematical Analysis and Applications, 490 (2020), pp. 124236/1124236/45, DOI 10.1016/j.jmaa.2020.124236 .
Abstract
We study linear rough partial differential equations in the setting of [Friz and Hairer, Springer, 2014, Chapter 12]. More precisely, we consider a linear parabolic partial differential equation driven by a deterministic rough path W of Hölder regularity α with ⅓ < α ≤ ½ . Based on a stochastic representation of the solution of the rough partial differential equation, we propose a regression Monte Carlo algorithm for spatiotemporal approximation of the solution. We provide a full convergence analysis of the proposed approximation method which essentially relies on the new bounds for the higher order derivatives of the solution in space. Finally, a comprehensive simulation study showing the applicability of the proposed algorithm is presented. 
CH. Bayer, Ch.B. Hammouda, R. Tempone, Hierarchical adaptive sparse grids and quasiMonte Carlo for option pricing under the rough Bergomi model, Quantitative Finance, published online on 20.04.2020, urlhttps://doi.org/10.1080/14697688.2020.1744700, DOI 10.1080/14697688.2020.1744700 .
Abstract
The rough Bergomi (rBergomi) model, introduced recently in Bayer et al. [Pricing under rough volatility. Quant. Finance, 2016, 16(6), 887?904], is a promising rough volatility model in quantitative finance. It is a parsimonious model depending on only three parameters, and yet remarkably fits empirical implied volatility surfaces. In the absence of analytical European option pricing methods for the model, and due to the nonMarkovian nature of the fractional driver, the prevalent option is to use the Monte Carlo (MC) simulation for pricing. Despite recent advances in the MC method in this context, pricing under the rBergomi model is still a timeconsuming task. To overcome this issue, we have designed a novel, hierarchical approach, based on: (i) adaptive sparse grids quadrature (ASGQ), and (ii) quasiMonte Carlo (QMC). Both techniques are coupled with a Brownian bridge construction and a Richardson extrapolation on the weak error. By uncovering the available regularity, our hierarchical methods demonstrate substantial computational gains with respect to the standard MC method. They reach a sufficiently small relative error tolerance in the price estimates across different parameter constellations, even for very small values of the Hurst parameter. Our work opens a new research direction in this field, i.e. to investigate the performance of methods other than Monte Carlo for pricing and calibrating under the rBergomi model. 
CH. Bayer, R.F. Tempone , S. Wolfers, Pricing American options by exercise rate optimization, Quantitative Finance, published online on 07.07.2020, urlhttps://doi.org/10.1080/14697688.2020.1750678, DOI 10.1080/14697688.2020.1750678 .
Abstract
We present a novel method for the numerical pricing of American options based on Monte Carlo simulation and the optimization of exercise strategies. Previous solutions to this problem either explicitly or implicitly determine socalled optimal exercise regions, which consist of points in time and space at which a given option is exercised. In contrast, our method determines the exercise rates of randomized exercise strategies. We show that the supremum of the corresponding stochastic optimization problem provides the correct option price. By integrating analytically over the random exercise decision, we obtain an objective function that is differentiable with respect to perturbations of the exercise rate even for finitely many sample paths. The global optimum of this function can be approached gradually when starting from a constant exercise rate. Numerical experiments on vanilla put options in the multivariate BlackScholes model and a preliminary theoretical analysis underline the efficiency of our method, both with respect to the number of timediscretization steps and the required number of degrees of freedom in the parametrization of the exercise rates. Finally, we demonstrate the flexibility of our method through numerical experiments on max call options in the classical BlackScholes model, and vanilla put options in both the Heston model and the nonMarkovian rough Bergomi model. 
P. Dvurechensky, E. Gorbunov, A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization, European Journal of Operational Research, 290 (2021), pp. 601621 (published online on 20.08.2020), DOI 10.1016/j.ejor.2020.08.027 .
Abstract
We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivativefree optimization and gradientbased optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a nonaccelerated and an accelerated directional derivative method and provide their complexity bounds. Despite that our algorithms do not use gradient information, our nonaccelerated algorithm has a complexity bound which is, up to a factor logarithmic in problem dimension, similar to the complexity bound of gradientbased algorithms. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradientbased algorithm up to a factor of square root of the problem dimension, whereas for existing directional derivative methods this factor is of the order of problem dimension. We also extend these results to strongly convex problems. Finally, we consider derivativefree optimization as a particular case of directional derivative optimization with noise in the directional derivative and obtain complexity bounds for nonaccelerated and accelerated derivativefree methods. Complexity bounds for these algorithms inherit the gain in the dimension dependent factors from our directional derivative methods. 
P. Friz, T. Nilssen, W. Stannat , Existence, uniqueness and stability of semilinear rough partial differential equations, Journal of Differential Equations, 268 (2020), pp. 16861721, DOI 10.1016/j.jde.2019.09.033 .

P. Mathé, M.T. Nair, B. Hofmann, Regularization of linear illposed problems involving multiplication operators, Applicable Analysis. An International Journal, published online 28.04.2020, urlhttps://doi.org/10.1080/00036811.2020.1758308, DOI 10.1080/00036811.2020.1758308 .
Abstract
We study regularization of illposed equations involving multiplication operators when the multiplier function is positive almost everywhere and zero is an accumulation point of the range of this function. Such equations naturally arise from equations based on noncompact selfadjoint operators in Hilbert space, after applying unitary transformations arising out of the spectral theorem. For classical regularization theory, when noisy observations are given and the noise is deterministic and bounded, then noncompactness of the illposed equations is a minor issue. However, for statistical illposed equations with noncompact operators less is known if the data are blurred by white noise. We develop a theory for spectral regularization with emphasis on this case. In this context, we highlight several aspects, in particular, we discuss the intrinsic degree of illposedness in terms of rearrangements of the multiplier function. Moreover, we address the required modifications of classical regularization schemes in order to be used for noncompact statistical problems, and we also introduce the concept of the effective illposedness of the operator equation under white noise. This study is concluded with prototypical examples for such equations, as these are deconvolution equations and certain final value problems in evolution equations. 
J. Polzehl, K. Papafitsoros, K. Tabelow, Patchwise adaptive weights smoothing in R, Journal of Statistical Software, 95 (2020), pp. 127, DOI 10.18637/jss.v095.i06 .
Abstract
Image reconstruction from noisy data has a long history of methodological development and is based on a variety of ideas. In this paper we introduce a new method called patchwise adaptive smoothing, that extends the PropagationSeparation approach by using comparisons of local patches of image intensities to define local adaptive weighting schemes for an improved balance of reduced variability and bias in the reconstruction result. We present the implementation of the new method in an R package aws and demonstrate its properties on a number of examples in comparison with other stateofthe art image reconstruction methods.
Contributions to Collected Editions

D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, in: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, 130 of Proceedings of Machine Learning Research, Journal of Machine Learning Research, 2021, pp. 17381746.

V. Dvurechensky, Datadriven confidence bands for distributed nonparametric regression, in: Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy, S. Agarwal , eds., 125 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 300322.

N. Tupitsa, A. Gasnikov, P. Dvurechensky, S. Guminov, Strongly convex optimization for the dual formulation of optimal transport, in: Mathematical Optimization Theory and Operations Research, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., 1275 of Theoretical Computer Science and General Issues, Springer International Publishing AG, Cham, 2020, pp. 192204, DOI 10.1007/9783030586577_17 .
Abstract
In this paper we experimentally check a hypothesis, that dual problem to discrete entropy regularized optimal transport problem possesses strong convexity on a certain compact set. We present a numerical estimation technique of parameter of strong convexity and show that such an estimate increases the performance of an accelerated alternating minimization algorithm for strongly convex functions applied to the considered problem. 
N. Tupitsa , P. Dvurechensky, A. Gasnikov, C.A. Uribe, Multimarginal optimal transport by accelerated alternating minimization, in: 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, 2020, pp. 61326137, DOI 10.1109/CDC42340.2020.9304010 .

D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, V. Spokoiny, On the linesearch gradient methods for stochastic optimization, in: Proceedings of the 21th IFAC World Congress, R. Findeisen, S. Hirche, K. Janschek, M. Mönnigmann, eds., 53 of IFAC PapersOnLine, Elsevier, 2020, pp. 17151720, DOI 10.1016/j.ifacol.2020.12.2284 .
Abstract
In this paper we propose several adaptive gradient methods for stochastic optimization. Our methods are based on Armijotype line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated gradient descent for convex problems and gradient descent for nonconvex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam. 
P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in lowmemory subgradient optimization, in: Numerical Nonsmooth Optimization, A.M. Bagirov, M. Gaudioso, N. Karmitsa, M.M. Mäkelä, S. Taheri, eds., Springer International Publishing, Cham, 2020, pp. 1959, DOI 10.1007/9783030349103_2 .

P. Dvurechensky, A. Gasnikov, S. Omelchenko, A. Tiurin, A stable alternative to Sinkhorn's algorithm for regularized optimal transport, in: Mathematical Optimization Theory and Operations Research. MOTOR 2020, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., Lecture Notes in Computer Science, Springer, Cham, 2020, pp. 406423, DOI 10.1007/9783030499884_28 .

P. Dvurechensky, P. Ostroukhov, K. Safin, S. Shtern, M. Staudigl, Selfconcordant analysis of FrankWolfe algorithms, in: Proceedings of the 37th International Conference on Machine Learning, H. Daumé Iii, A. Singh, eds., 119 of Proceedings of Machine Learning Research (online), 2020, pp. 28142824.
Abstract
Projectionfree optimization via different variants of the FrankWolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a selfconcordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.
Preprints, Reports, Technical Reports

A. Sadiev, A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zerothorder algorithms for smooth saddlepoint problems, Preprint no. 2827, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2827 .
Abstract, PDF (564 kByte)
Saddlepoint problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zerothorder oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convexconcave saddle point problems using zerothorder oracles, and estimate their convergence rate and its dependence on the dimension n of the variable. In particular, our analysis shows that in the case when the feasible set is a direct product of two simplices, our convergence rate for the stochastic term is only by a log n factor worse than for the firstorder methods. We also consider a mixed setup and develop 1/2thorder methods which use zerothorder oracle for the minimization part and firstorder oracle for the maximization part. Finally, we demonstrate the practical performance of our zerothorder and 1/2thorder methods on practical problems. 
CH. Bayer, M. Eigel, L. Sallandt, P. Trunschke, Pricing highdimensional Bermudan options with hierarchical tensor formats, Preprint no. 2821, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2821 .
Abstract, PDF (321 kByte)
An efficient compression technique based on hierarchical tensors for popular option pricing methods is presented. It is shown that the “curse of dimensionality" can be alleviated for the computation of Bermudan option prices with the Monte Carlo leastsquares approach as well as the dual martingale method, both using highdimensional tensorized polynomial expansions. This discretization allows for a simple and computationally cheap evaluation of conditional expectations. Complexity estimates are provided as well as a description of the optimization procedures in the tensor train format. Numerical experiments illustrate the favourable accuracy of the proposed methods. The dynamical programming method yields results comparable to recent Neural Network based methods. 
P.O. Petr Ostroukhov, R.K. Rinat Kamalov, P. Dvurechensky, A. Gasnikov, Tensor methods for strongly convex strongly concave saddle point problems and strongly monotone variational inequalities, Preprint no. 2820, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2820 .
Abstract, PDF (302 kByte)
In this paper we propose three tensor methods for stronglyconvexstronglyconcave saddle point problems (SPP). The first method is based on the assumption of higherorder smoothness (the derivative of the order higher than 2 is Lipschitzcontinuous) and achieves linear convergence rate. Under additional assumptions of first and second order smoothness of the objective we connect the first method with a locally superlinear converging algorithm in the literature and develop a second method with global convergence and local superlinear convergence. The third method is a modified version of the second method, but with the focus on making the gradient of the objective small. Since we treat SPP as a particular case of variational inequalities, we also propose two methods for strongly monotone variational inequalities with the same complexity as the described above. 
A. Agafonov, D. Kamzolov, P. Dvurechensky, A. Gasnikov, Inexact tensor methods and their application to stochastic convex optimization, Preprint no. 2818, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2818 .
Abstract, PDF (333 kByte)
We propose a general nonaccelerated tensor method under inexact information on higher order derivatives, analyze its convergence rate, and provide sufficient conditions for this method to have similar complexity as the exact tensor method. As a corollary, we propose the first stochastic tensor method for convex optimization and obtain sufficient minibatch sizes for each derivative. 
V. Matyukhin, S. Kabanikhin, M. Shishlenin, N. Novikov, A. Vasin, A. Gasnikov, Convex optimization with inexact gradients in Hilbert space and applications to elliptic inverse problems, Preprint no. 2815, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2815 .
Abstract, PDF (1346 kByte)
In this paper we propose the gradient descent type methods to solve convex optimization problems in Hilbert space. We apply it to solve illposed Cauchy problem for Poisson equation and make a comparative analysis with Landweber iteration and steepest descent method. The theoretical novelty of the paper consists in the developing of new stopping rule for accelerated gradient methods with inexact gradient (additive noise). Note that up to the moment of stopping the method “doesn't feel the noise”. But after this moment the noise start to accumulate and the quality of the solution becomes worse for further iterations. 
P. Friz, P. Hager, N. Tapia, Unified signature cumulants and generalized Magnus expansions, Preprint no. 2814, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2814 .
Abstract, PDF (388 kByte)
The signature of a path can be described as its full noncommutative exponential. Following T. Lyons we regard its expectation, the expected signature, as path space analogue of the classical moment generating function. The logarithm thereof, taken in the tensor algebra, defines the signature cumulant. We establish a universal functional relation in a general semimartingale context. Our work exhibits the importance of Magnus expansions in the algorithmic problem of computing expected signature cumulants, and further offers a farreaching generalization of recent results on characteristic exponents dubbed diamond and cumulant expansions; with motivation ranging from financial mathematics to statistical physics. From an affine process perspective, the functional relation may be interpreted as infinitedimensional, noncommutative (“Hausdorff") variation of Riccati's equation. Many examples are given. 
N. Yudin, A. Gasnikov, Flexible modification of GaussNewton method and its stochastic extension, Preprint no. 2813, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2813 .
Abstract, PDF (8261 kByte)
This work presents a novel version of recently developed GaussNewton method for solving systems of nonlinear equations, based on upper bound of solution residual and quadratic regularization ideas. We obtained for such method global convergence bounds and under natural nondegeneracy assumptions we present local quadratic convergence results. We developed stochastic optimization algorithms for presented GaussNewton method and justified sublinear and linear convergence rates for these algorithms using weak growth condition (WGC) and PolyakLojasiewicz (PL) inequality. We show that GaussNewton method in stochastic setting can effectively find solution under WGC and PL condition matching convergence rate of the deterministic optimization method. The suggested method unifies most practically used GaussNewton method modifications and can easily interpolate between them providing flexible and convenient method easily implementable using standard techniques of convex optimization. 
A. Vasin, A. Gasnikov, V. Spokoiny, Stopping rules for accelerated gradient methods with additive noise in gradient, Preprint no. 2812, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2812 .
Abstract, PDF (1129 kByte)
In this article, we investigate an accelerated firstorder method, namely, the method of similar triangles, which is optimal in the class of convex (strongly convex) problems with a Lipschitz gradient. The paper considers a model of additive noise in a gradient and a Euclidean prox structure for not necessarily bounded sets. Convergence estimates are obtained in the case of strong convexity and its absence, and a stopping criterion is proposed for not strongly convex problems. 
D. Belomestny, J.G.M. Schoenmakers, From optimal martingales to randomized dual optimal stopping, Preprint no. 2810, WIAS, Berlin, 2021.
Abstract, PDF (571 kByte)
In this article we study and classify optimal martingales in the dual formulation of optimal stopping problems. In this respect we distinguish between weakly optimal and surely optimal martingales. It is shown that the family of weakly optimal and surely optimal martingales may be quite large. On the other hand it is shown that the Doobmartingale, that is, the martingale part of the Snell envelope, is in a certain sense the most robust surely optimal martingale under random perturbations. This new insight leads to a novel randomized dual martingale minimization algorithm that does`nt require nested simulation. As a main feature, in a possibly large family of optimal martingales the algorithm efficiently selects a martingale that is as close as possible to the Doob martingale. As a result, one obtains the dual upper bound for the optimal stopping problem with low variance. 
A. Neumann, N. Peitek, A. Brechmann, K. Tabelow, Th. Dickhaus, Utilizing anatomical information for signal detection in functional magnetic resonance imaging, Preprint no. 2806, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2806 .
Abstract, PDF (2995 kByte)
We are considering the statistical analysis of functional magnetic resonance imaging (fMRI) data. As demonstrated in previous work, grouping voxels into regions (of interest) and carrying out a multiple test for signal detection on the basis of these regions typically leads to a higher sensitivity when compared with voxelwise multiple testing approaches. In the case of a multisubject study, we propose to define the regions for each subject separately based on their individual brain anatomy, represented, e.g., by socalled Aparc labels. The aggregation of the subjectspecific evidence for the presence of signals in the different regions is then performed by means of a combination function for pvalues. We apply the proposed methodology to real fMRI data and demonstrate that our approach can perform comparably to a twostage approach for which two independent experiments are needed, one for defining the regions and one for actual signal detection. 
F. Besold, V. Spokoiny, Adaptive manifold clustering, Preprint no. 2800, WIAS, Berlin, 2020.
Abstract, PDF (2013 kByte)
Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by [6] where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing highdimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling assumptions. The procedure demonstrates a stateoftheart performance and is very efficient even for large data dimension D. However, the theoretical study in [6] is very limited and did not really address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypothesis means that high dimensional data can be well approximated by a ddimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments. 
J. Diehl, R. Preiss, M. Ruddy, N. Tapia, The moving frame method for iteratedintegrals: Orthogonal invariants, Preprint no. 2796, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2796 .
Abstract, PDF (286 kByte)
We explore the algebraic properties of a generalized version of the iteratedsums signature, inspired by previous work of F. Kiraly and H. Oberhauser. In particular, we show how to recover the character property of the associated linear map over the tensor algebra by considering a deformed quasishuffle product of words on the latter. We introduce three nonlinear transformations on iteratedsums signatures, close in spirit to Machine Learning applications, and show some of their properties. 
J. Diehl, K. EbrahimiFard, N. Tapia, Generalized iteratedsums signatures, Preprint no. 2795, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2795 .
Abstract, PDF (152 kByte)
We explore the algebraic properties of a generalized version of the iteratedsums signature, inspired by previous work of F. Király and H. Oberhauser. In particular, we show how to recover the character property of the associated linear map over the tensor algebra by considering a deformed quasishuffle product of words on the latter. We introduce three nonlinear transformations on iteratedsums signatures, close in spirit to Machine Learning applications, and show some of their properties. 
CH. Bayer, D. Belomestny, P. Hager, P. Pigato, J.G.M. Schoenmakers, V. Spokoiny, Reinforced optimal control, Preprint no. 2792, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2792 .
Abstract, PDF (591 kByte)
Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay, Commun. Math. Sci., 18(1):109?121, 2020] proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems, while considerably improving the method?s efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis. 
CH. Bayer, P. Hager, S. Riedel, J.G.M. Schoenmakers, Optimal stopping with signatures, Preprint no. 2790, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2790 .
Abstract, PDF (407 kByte)
We propose a new method for solving optimal stopping problems (such as American option pricing in finance) under minimal assumptions on the underlying stochastic process. We consider classic and randomized stopping times represented by linear functionals of the associated rough path signature, and prove that maximizing over the class of signature stopping times, in fact, solves the original optimal stopping problem. Using the algebraic properties of the signature, we can then recast the problem as a (deterministic) optimization problem depending only on the (truncated) expected signature. The only assumption on the process is that it is a continuous (geometric) random rough path. Hence, the theory encompasses processes such as fractional Brownian motion which fail to be either semimartingales or Markov processes. 
R. Krawchenko, C.A. Uribe, A. Gasnikov, P. Dvurechensky, Distributed optimization with quantization for computing Wasserstein barycenters, Preprint no. 2782, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2782 .
Abstract, PDF (946 kByte)
We study the problem of the decentralized computation of entropyregularized semidiscrete Wasserstein barycenters over a network. Building upon recent primaldual approaches, we propose a sampling gradient quantization scheme that allows efficient communication and computation of approximate barycenters where the factor distributions are stored distributedly on arbitrary networks. The communication and algorithmic complexity of the proposed algorithm are shown, with explicit dependency on the size of the support, the number of distributions, and the desired accuracy. Numerical results validate our algorithmic analysis. 
J. Diehl, K. EbrahimiFard, N. Tapia, Tropical time series, iteratedsum signatures and quasisymmetric functions, Preprint no. 2760, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2760 .
Abstract, PDF (236 kByte)
Driven by the need for principled extraction of features from time series, we introduce the iteratedsums signature over any commutative semiring. The case of the tropical semiring is a central, and our motivating, example, as it leads to features of (realvalued) time series that are not easily available using existing signaturetype objects. 
CH. Bayer, F. Harang, P. Pigato, Logmodulated rough stochastic volatility models, Preprint no. 2752, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2752 .
Abstract, PDF (595 kByte)
We propose a new class of rough stochastic volatility models obtained by modulating the powerlaw kernel defining the fractional Brownian motion (fBm) by a logarithmic term, such that the kernel retains square integrability even in the limit case of vanishing Hurst index H. The soobtained logmodulated fractional Brownian motion (logfBm) is a continuous Gaussian process even for H = 0. As a consequence, the resulting superrough stochastic volatility models can be analysed over the whole range of Hurst indices between 0 and 1/2, including H = 0, without the need of further normalization. We obtain the usual power law explosion of the skew as maturity T goes to 0, modulated by a logarithmic term, so no flattening of the skew occurs as H goes to 0. 
CH. Bayer, J. Qiu, Y. Yao, Pricing options under rough volatility with backward SPDEs, Preprint no. 2745, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2745 .
Abstract, PDF (553 kByte)
In this paper, we study the option pricing problems for rough volatility models. As the framework is nonMarkovian, the value function for a European option is not deterministic; rather, it is random and satisfies a backward stochastic partial differential equation (BSPDE). The existence and uniqueness of weak solutions is proved for general nonlinear BSPDEs with unbounded random leading coefficients whose connections with certain forwardbackward stochastic differential equations are derived as well. These BSPDEs are then used to approximate American option prices. A deep learningbased method is also investigated for the numerical approximations to such BSPDEs and associated nonMarkovian pricing problems. Finally, the examples of rough Bergomi type are numerically computed for both European and American options. 
S. Riedel, Semiimplicit Taylor schemes for stiff rough differential equations, Preprint no. 2734, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2734 .
Abstract, PDF (538 kByte)
We study a class of semiimplicit Taylortype numerical methods that are easy to implement and designed to solve multidimensional stochastic differential equations driven by a general rough noise, e.g. a fractional Brownian motion. In the multiplicative noise case, the equation is understood as a rough differential equation in the sense of T. Lyons. We focus on equations for which the drift coefficient may be unbounded and satisfies a onesided Lipschitz condition only. We prove wellposedness of the methods, provide a full analysis, and deduce their convergence rate. Numerical experiments show that our schemes are particularly useful in the case of stiff rough stochastic differential equations driven by a fractional Brownian motion. 
CH. Bayer, P. Friz, N. Tapia, Stability of deep neural networks via discrete rough paths, Preprint no. 2732, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2732 .
Abstract, PDF (365 kByte)
Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks. In particular we derive stability bounds in terms of the total pvariation of trained weights for any p ≥ 1. 
V. Avanesov, Datadriven confidence bands for distributed nonparametric regression, Preprint no. 2729, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2729 .
Abstract, PDF (316 kByte)
Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully datadriven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimaxoptimal highprobability bound for the averaged estimator, complementing and generalizing the known risk bounds. 
R.J.A. Laeven, J.G.M. Schoenmakers, N.F.F. Schweizer, M. Stadje, Robust multiple stopping  A pathwise duality approach, Preprint no. 2728, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2728 .
Abstract, PDF (576 kByte)
In this paper we develop a solution method for general optimal stopping problems. Our general setting allows for multiple exercise rights, i.e., optimal multiple stopping, for a robust evaluation that accounts for model uncertainty, and for general reward processes driven by multidimensional jumpdiffusions. Our approach relies on first establishing robust martingale dual representation results for the multiple stopping problem which satisfy appealing pathwise optimality (almost sure) properties. Next, we exploit these theoretical results to develop upper and lower bounds which, as we formally show, not only converge to the true solution asymptotically, but also constitute genuine upper and lower bounds. We illustrate the applicability of our general approach in a few examples and analyze the impact of model uncertainty on optimal multiple stopping strategies. 
A. Ivanova, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, A. Tyurin, E. Vorontsova, D. Pasechnyuk, Oracle complexity separation in convex optimization, Preprint no. 2711, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2711 .
Abstract, PDF (424 kByte)
Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part. 
D. Kamzolov, A. Gasnikov, P. Dvurechensky, On the optimal combination of tensor optimization methods, Preprint no. 2710, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2710 .
Abstract, PDF (284 kByte)
We consider the minimization problem of a sum of a number of functions having Lipshitz p th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain nearoptimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize nearoptimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order. 
F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, S. Artamonov, V. Piskunova, Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model, Preprint no. 2709, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2709 .
Abstract, PDF (463 kByte)
In this paper we propose a general algorithmic framework for firstorder methods in optimization in a broad sense, including minimization problems, saddlepoint problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, levelset methods, Bregman proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal conditional gradient method and universal method for variational inequalities with composite structure. These method works for smooth and nonsmooth problems with optimal complexity without a priori knowledge of the problem smoothness. As a particular case of our general framework, we introduce relative smoothness for operators and propose an algorithm for VIs with such operator. We also generalize our framework for relatively strongly convex objectives and strongly monotone variational inequalities. 
M. Redmann, S. Riedel, RungeKutta methods for rough differential equations, Preprint no. 2708, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2708 .
Abstract, PDF (393 kByte)
We study RungeKutta methods for rough differential equations which can be used to calculate solutions to stochastic differential equations driven by processes that are rougher than a Brownian motion. We use a Taylor series representation (Bseries) for both the numerical scheme and the solution of the rough differential equation in order to determine conditions that guarantee the desired order of the local error for the underlying RungeKutta method. Subsequently, we prove the order of the global error given the local rate. In addition, we simplify the numerical approximation by introducing a RungeKutta scheme that is based on the increments of the driver of the rough differential equation. This simplified method can be easily implemented and is computational cheap since it is derivativefree. We provide a full characterization of this implementable RungeKutta method meaning that we provide necessary and sufficient algebraic conditions for an optimal order of convergence in case that the driver, e.g., is a fractional Brownian motion with Hurst index 1/4 < H ≤ 1/2. We conclude this paper by conducting numerical experiments verifying the theoretical rate of convergence. 
M. Redmann, Ch. Bayer, P. Goya, Lowdimensional approximations of highdimensional asset price models, Preprint no. 2706, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2706 .
Abstract, PDF (363 kByte)
We consider highdimensional asset price models that are reduced in their dimension in order to reduce the complexity of the problem or the effect of the curse of dimensionality in the context of option pricing. We apply model order reduction (MOR) to obtain a reduced system. MOR has been previously studied for asymptotically stable controlled stochastic systems with zero initial conditions. However, stochastic differential equations modeling price processes are uncontrolled, have nonzero initial states and are often unstable. Therefore, we extend MOR schemes and combine ideas of techniques known for deterministic systems. This leads to a method providing a good pathwise approximation. After explaining the reduction procedure, the error of the approximation is analyzed and the performance of the algorithm is shown conducting several numerical experiments. Within the numerics section, the benefit of the algorithm in the context of option pricing is pointed out. 
M. Ghani Varzaneh, S. Riedel, A dynamical theory for singular stochastic delay differential equations II: Nonlinear equations and invariant manifolds, Preprint no. 2701, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2701 .
Abstract, PDF (383 kByte)
Building on results obtained in [GVRS], we prove Local Stable and Unstable Manifold Theorems for nonlinear, singular stochastic delay differential equations. The main tools are rough paths theory and a semiinvertible Multiplicative Ergodic Theorem for cocycles acting on measurable fields of Banach spaces obtained in [GVR]. 
CH. Bayer, D. Belomestny, P. Hager, P. Pigato, J.G.M. Schoenmakers, Randomized optimal stopping algorithms and their convergence analysis, Preprint no. 2697, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2697 .
Abstract, PDF (367 kByte)
In this paper we study randomized optimal stopping problems and consider corresponding forward and backward Monte Carlo based optimization algorithms. In particular we prove the convergence of the proposed algorithms and derive the corresponding convergence rates. 
C. Bellinger, A. Djurdjevac, P. Friz, N. Tapia, Transport and continuity equations with (very) rough noise, Preprint no. 2696, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2696 .
Abstract, PDF (320 kByte)
Existence and uniqueness for rough flows, transport and continuity equations driven by general geometric rough paths are established. 
S. Guminov, P. Dvurechensky, A. Gasnikov, On accelerated alternating minimization, Preprint no. 2695, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2695 .
Abstract, PDF (343 kByte)
Alternating minimization (AM) optimization algorithms have been known for a long time and are of importance in machine learning problems, among which we are mostly motivated by approximating optimal transport distances. AM algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly or cheaply with high accuracy. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropyregularized optimal transport problem. We introduce an accelerated alternating minimization method with a $1/k^2$ convergence rate, where $k$ is the iteration counter. This improves over known bound $1/k$ for general AM methods and for the Sinkhorn's algorithm. Moreover, our algorithm converges faster than gradienttype methods in practice as it is free of the choice of the stepsize and is adaptive to the local smoothness of the problem. We show that the proposed method is primaldual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm. 
P. Dvurechensky, A. Gasnikov, P. Ostroukhov, A.C. Uribe, A. Ivanova, Nearoptimal tensor methods for minimizing gradient norm, Preprint no. 2694, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2694 .
Abstract, PDF (423 kByte)
Motivated by convex problems with linear constraints and, in particular, by entropyregularized optimal transport, we consider the problem of finding approximate stationary points, i.e. points with the norm of the objective gradient less than small error, of convex functions with Lipschitz pth order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the methods presented in the same paper do not have optimal complexity bounds. We propose two optimal up to logarithmic factors methods with complexity bounds with respect to the initial objective residual and the distance between the starting point and solution respectively 
P. Dvurechensky, M. Staudigl, C.A. Uribe , Generalized selfconcordant Hessianbarrier algorithms, Preprint no. 2693, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2693 .
Abstract, PDF (647 kByte)
Many problems in statistical learning, imaging, and computer vision involve the optimization of a nonconvex objective function with singularities at the boundary of the feasible set. For such challenging instances, we develop a new interiorpoint technique building on the Hessianbarrier algorithm recently introduced in Bomze, Mertikopoulos, Schachinger and Staudigl, [SIAM J. Opt. 2019 29(3), pp. 21002127], where the Riemannian metric is induced by a generalized selfconcordant function. This class of functions is sufficiently general to include most of the commonly used barrier functions in the literature of interior point methods. We prove global convergence to an approximate stationary point of the method, and in cases where the feasible set admits an easily computable selfconcordant barrier, we verify worstcase optimal iteration complexity of the method. Applications in nonconvex statistical estimation and Lpminimization are discussed to given the efficiency of the method. 
E. Gorbunov, D. Dvinskikh, A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization, Preprint no. 2691, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2691 .
Abstract, PDF (616 kByte)
We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case we use special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the nonsmooth case. We prove the convergence guarantees in smooth convex case with deterministic firstorder oracle. We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, RRRMAACSA and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primaldual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. RRRMAACSA is an accelerated stochastic method based on the restarts of RRMAACSA and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth and Lipschitz continuous on some ball and use stochastic dual firstorder oracle. We develop convergence analysis for these methods for the unbiased and biased oracles respectively. Finally, we apply all aforementioned results and approaches to solve decentralized distributed optimization problem and discuss optimality of the obtained results in terms of communication rounds and number of oracle calls per node. 
F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C.A. Uribe, D. Pasechnyuk, S. Artamonov, Gradient methods for problems with inexact model of the objective, Preprint no. 2688, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2688 .
Abstract, PDF (785 kByte)
We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [19] and relative smoothness condition [43]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [49]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments. 
V. Avanesov, Nonparametric change point detection in regression, Preprint no. 2687, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2687 .
Abstract, PDF (329 kByte)
This paper considers the prominent problem of changepoint detection in regression. The study suggests a novel testing procedure featuring a fully datadriven calibration scheme. The method is essentially a black box, requiring no tuning from the practitioner. The approach is investigated from both theoretical and practical points of view. The theoretical study demonstrates proper control of firsttype error rate under H0 and power approaching 1 under H1. The experiments conducted on synthetic data fully support the theoretical claims. In conclusion, the method is applied to financial data, where it detects sensible changepoints. Techniques for changepoint localization are also suggested and investigated 
V. Avanesov, How to gamble with nonstationary Xarmed bandits and have no regrets, Preprint no. 2686, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2686 .
Abstract, PDF (287 kByte)
In Xarmed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time steps. The problem and its variations have been a subject of numerous studies, suggesting sublinear and sometimes optimal strategies. The given paper introduces a new variation of the problem. We consider an environment, which can abruptly change its behaviour an unknown number of times. To that end we propose a novel strategy and prove it attains sublinear cumulative regret. Moreover, the obtained regret bound matches the best known bound for GPUCB for a stationary case, and approaches the minimax lower bound in case of highly smooth relation between an action and the corresponding reward. The theoretical result is supported by experimental study. 
F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, V. Piskunova, Inexact model: A framework for optimization and variational inequalities, Preprint no. 2679, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2679 .
Abstract, PDF (414 kByte)
In this paper we propose a general algorithmic framework for firstorder methods in optimization in a broad sense, including minimization problems, saddlepoint problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, levelset methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and nonsmooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities. 
K. EbrahimiFard, F. Patras, N. Tapia, L. Zambotti, Wick polynomials in noncommutative probability: A grouptheoretical approach, Preprint no. 2677, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2677 .
Abstract, PDF (292 kByte)
Wick polynomials and Wick products are studied in the context of noncommutative probability theory. It is shown that free, boolean and conditionally free Wick polynomials can be defined and related through the action of the group of characters over a particular Hopf algebra. These results generalize our previous developments of a Hopf algebraic approach to cumulants and Wick products in classical probability theory. 
P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in lowmemory subgradient optimization, Preprint no. 2676, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2676 .
Abstract, PDF (413 kByte)
One of the main goals in the development of nonsmooth optimization is to cope with high dimensional problems by decomposition, duality or Lagrangian relaxation which greatly reduces the number of variables at the cost of worsening differentiability of objective or constraints. Small or medium dimensionality of resulting nonsmooth problems allows to use bundletype algorithms to achieve higher rates of convergence and obtain higher accuracy, which of course came at the cost of additional memory requirements, typically of the order of n2, where n is the number of variables of nonsmooth problem. However with the rapid development of more and more sophisticated models in industry, economy, finance, et all such memory requirements are becoming too hard to satisfy. It raised the interest in subgradientbased lowmemory algorithms and later developments in this area significantly improved over their early variants still preserving O(n) memory requirements. To review these developments this chapter is devoted to the blackbox subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of N.Z. Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and nonsmooth convex and quasiconvex optimization problems are briefly exposed in what follows to introduce to the relevant fundamentals of nonsmooth optimization. Special attention in this section is given to the adaptive stepsize policy which aims to attain lowest complexity bounds. Unfortunately the nondifferentiability of objective function in convex optimization essentially slows down the theoretical low bounds for the rate of convergence in subgradient optimization compared to the smooth case but there are different modern techniques that allow to solve nonsmooth convex optimization problems faster then dictate lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov Universal approach, and Legendre (saddle point) representation approach. The new results on Universal Mirror Prox algorithms represent the original parts of the survey. To demonstrate application of nonsmooth convex optimization algorithms for solution of hugescale extremal problems we consider convex optimization problems with nonsmooth functional constraints and propose two adaptive Mirror Descent methods. The first method is of primaldual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitzcontinuous convex objective and constraints. The advantages of application of this method to sparse Truss Topology Design problem are discussed in certain details. The second method can be applied for solution of convex and quasiconvex optimization problems and is optimal in a sense of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of nonsmooth convex optimization.
Talks, Poster

S. Breneis, On variation functions and their moduli of continuity (online talk), Methods of nonlinear analysis in differential and integral equations (Online Event), May 15  16, 2021, Rzeszów University of Technology, Department of Nonlinear Analysis, Rzeszów, Poland, May 16, 2021.

O. Butkovsky, New coupling techniques for exponential ergodicity of SPDEs in the hypoelliptic and effectively elliptic settings, Applied and Computational Mathematics Research Seminar, Tulane University, School of Science and Engineering, New Orleans, USA, April 30, 2021.

O. Butkovsky, Regularization by noise for PDEs: A stochastic sewing approach (online talk), Theory of Probability and Its Applications: P. L. Chebyshev  200 (The 6th International Conference on Stochastic Methods) (Online Event), May 17  22, 2021, Steklov Mathematical Institute of Russian Academy of Sciences, Moscow, Russian Federation, May 22, 2021.

O. Butkovsky, Skew fractional Brownian motion: Going beyond the CatellierGubinelli setting (online talk), 14th OxfordBerlin Young Researchers Meeting on Applied Stochastic Analysis (Online Event), February 10  12, 2021, University of Oxford, Mathematical Institute, UK, February 11, 2021.

N. Tapia, Approximation of controlled rough paths (online talk), 14th OxfordBerlin Young Researchers Meeting on Applied Stochastic Analysis (Online Event), February 10  12, 2021, University of Oxford, Mathematical Institute, UK, February 10, 2021.

N. Tapia, Unified signature cumulants and generalized magnus expansions, Analyse stochastique trajectorielle et applications Pathwise Stochastic Analysis and Applications (Online Event), March 8  12, 2021, C.I.R.M., France, March 11, 2021.

D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, The 24th International Conference on Artificial Intelligence and Statistics (Online Event), April 13  15, 2021.

D. Dvinskikh, Decentralized algorithms for Wasserstein barycenters (online talk), Moscow Conference on Combinatorics and Applications (Online Event), May 31  June 4, 2021, Moscow Institute of Physics and Technology, School of Applied Mathematics and Computer Science, Moscow, Russian Federation, June 2, 2021.

A. Suvorikova, Optimal transport in machine learning (onlne talk), HDI Lab Seminar, HSE University, Faculty of Computer Science (Online Event), Moskau, Russian Federation, April 29, 2021.

CH. Bayer, A pricing BSPDE for rough volatility (online talk), MATH4UQ Seminar (Online Event), RWTH Aachen University, Mathematics for Uncertainty Quantification, April 6, 2021.

P. Dvurechensky, Accelerated gradient methods and their applications to Wasserstein barycenter problem, TheXIII international scientific conference and young scientist school``Theory and Numerics of Inverse and Illposed Problems", April 12  22, 2021, Mathematical Center in Akademgorodok, Novosibirsk, Russian Federation, April 14, 2021.

P. Dvurechensky, Wasserstein barycenters from the computational perspective (online talk), Moscow Conference on Combinatorics and Applications (Online Event), May 31  June 4, 2021, Moscow Institute of Physics and Technology, School of Applied Mathematics and Computer Science, Moscow, Russian Federation, June 2, 2021.

P. Friz, New perspectives on rough paths, signatures and signature cumulants (online talk), DataSig Seminar Series (Online Event), University of Oxford, Mathematical Institute, UK, May 6, 2021.

P. Friz, Rough stochastic differential equations (online talk), Analyse stochastique trajectorielle et applications Pathwise Stochastic Analysis and Applications (Online Event), March 8  12, 2021, C.I.R.M., France, March 8, 2021.

K. Papafitsoros, Optimization with learninginformed differential equation constraints and its applications (online talk), Seminar Modern Methods in Applied Stochastics and Nonparametric Statistics, WIAS Berlin, March 16, 2021.

V. Spokoiny, Bayesian inference in Bernoulli model with application to ranking from pairwise comparison (online talk), Data Seminar, Université Grenoble Alpes, Laboratoire Jean Kuntzmann, France, January 21, 2021.

K. Tabelow, MaRDI  The mathematical research data Initiative within the NFDI (online talk), SFB 1294 Colloquium (Online Event), Universität Potsdam, Institut für Mathematik, April 16, 2021.

V. Avanesov, Datadriven confidence bands for distributed nonparametric regression (online talk), The 33rd Annual Conference on Learning Theory (COLT 2020) (Online Event), July 9  12, 2020, Graz, Austria, July 10, 2020.

O. Butkovsky, Exponential ergodicity of orderpreserving SPDEs in the hypoelliptic setting via new coupling techniques (online talk), BernoulliIMS One World Symposium 2020 (Online Event), August 24  28, 2020, August 25, 2020.

O. Butkovsky, New coupling techniques for exponential ergodicity of SPDEs in the hypoelliptic and effectively elliptic settings (online talk), Probability Seminar, HeriotWatt University, Edinburgh, UK, October 21, 2020.

O. Butkovsky, Regularization by noise for PDEs  A stochastic sewing approach (online talk), FOR 2402 Seminar (Online Event), Technische Universtität Berlin, Institut für Mathematik, November 12, 2020.

O. Butkovsky, Regularization by noise for SDEs and related systems: A tale of two approaches, Eighth BielefeldSNU joint Workshop in Mathematics, February 24  26, 2020, Universität Bielefeld, Fakultät für Mathematik, February 24, 2020.

O. Butkovsky, Skew stochastic heat equation and regularization by noise for PDEs (online talk), BernoulliIMS One World Symposium 2020 (Online Event), August 24  28, 2020, August 27, 2020.

S. Riedel, Optimal stopping: a signature approach (online talk), 13th Annual ERC BerlinOxford Young Researchers Meeting on Applied Stochastic Analysis, WIAS Berlin, June 9, 2020.

S. Riedel, RungeKutta methods for rough differential equations (online talk), The DNA Seminar (spring 2020), Norwegian University of Science and Technology, Department of Mathematical Sciences, Trondheim, Norway, June 24, 2020.

N. Tapia, Free Wick polynomials (online talk), Arbeitsgruppenseminar Analysis, Universität Potsdam, Institut für Mathematik, April 24, 2020.

N. Tapia, Higher order iteratedsums signatures (online talk), DataSig Seminar, University of Oxford, Mathematical Institute, UK, April 2, 2020.

N. Tapia, Transport and continuity equations with rough noise (online talk), The DNA Seminar (spring 2020), Norwegian University of Science and Technology, Department of Mathematical Sciences, Trondheim, Norway, April 22, 2020.

N. Tapia, Transport equations with low regularity rough noise, Young Researchers between Geometry and Stochastic Analysis, February 12  14, 2020, University of Bergen, Norway, February 13, 2020.

N. Tapia , Signatures in shape analysis (online talk), Geometry of curves in time series and shape analysis (Online Event), August 11  14, 2020, MaxPlanckInstitut für Mathematik in den Naturwissenschaften (MiS), Leipzig, August 14, 2020.

A. Kroshnin, A. Suvorikova, V. Spokoiny, Statistical inference on BuresWasserstein space: From theory to practice, Math of Machine Learning 2020, Sochi, Russian Federation, February 19  22, 2020.

D. Dvinskikh, A. Gasnikov, SA vs SAA for population Wasserstein barycenter calculation, Math of Machine Learning 2020, Sochi, Russian Federation, February 19  22, 2020.

D. Dvinskikh, A. Gasnikov, Two approaches for population Wasserstein barycenter problem: Stochastic averaging versus sample average approximation (online talks), XII Summer School on Operations Research, Data and Decision Making, May 19  21, 2020, Faculty of Informatics, Mathematics and Computer Science, Nizhny Novgorod, Russian Federation, May 20, 2020.

A. Suvorikova, Change point detection in highdimensional data (online talk), Joint AramcoHSE Reserach Seminar, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation, April 15, 2020.

A. Suvorikova, Optimal transport and its application in bioinformatics, Second HSEYandex Autumn School on Generative Models, November 24  27, 2020, National Research University Higher School of Economics, Moscow, Russian Federation, November 27, 2020.

A. Suvorikova, Shapebased domain adaptation, Statistical Seminar of HDI Lab, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation, March 2, 2020.

A. Suvorikova, Shapebased domain adaptation via optimal transportation (online talk), Machine Learning Online Seminar, MaxPlanckInstitut für Mathematik in den Naturwissenschaften (MiS), Leipzig, April 1, 2020.

A. Suvorikova, Userfriendly optimal transport (online talk), Chromatine Seminar (Online Event), Skolkovo Institute of Science and Technology (Skoltech), Moscow, Russian Federation, August 11, 2020.

CH. Bayer, Pricing American options by exercise rate optimization, Research Seminar on Insurance Mathematics and Stochastic Finance, Eidgenössische Technische Hochschule Zürich, Switzerland, January 9, 2020.

CH. Bayer, Pricing American options by exercise rate optimization, MathriskINRIA / LPSM ParisDiderot Seminaire, Inria Paris Research Centre, France, February 6, 2020.

CH. Bayer, Pricing American options by exercise rate optimization, Lunch at the Lab, University of Calgary, Department of Mathematics and Statistics, Canada, March 3, 2020.

CH. Bayer, Rough volatility, Summer School 2020 and Annual Meeting of the BerlinOxford IRTG 2544 ``Stochastic Analysis in Interaction'', September 14  17, 2020, Döllnsee, September 15, 2020.

P. Dvurechensky, Distributed optimization for Wasserstein barycenters, Workshop on PDE Constrained Optimization under Uncertainty and Mean Field Games, January 28  29, 2020, WIAS Berlin, January 29, 2020.

P. Dvurechensky, Distributed optimization for Wasserstein barycenters (online talk), 15th INFORMS Telecommunication and Network Analytics Conference 2020 (Online Event), October 20  21, 2020, INFORMS Telecommunications & Networks Analytics, Catonsville, USA, October 21, 2020.

P. Friz, Diamonds, cumulants, signature cumulants and Magnus expansion (online talk), Geometry of Curves in Time Series and Shape Analysis (Online Event), August 11  14, 2020, MaxPlanckInstitut für Mathematik in den Naturwissenschaften, Leipzig, August 14, 2020.

TH. Koprucki, K. Tabelow, T. Streckenbach, T. Niermann, A. Maltsi, Modelbased geometry reconstruction of TEM images, MATH+ Day 2020 (Online Event), Berlin, November 6, 2020.

P. Mathé, Bayesian inverse problems with noncommuting operators, University of Edinburgh, School of Mathematics, UK, February 14, 2020.

V. Spokoiny, Advanced Statistical Methods, February 11  March 3, 2020, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation.

V. Spokoiny, Bayes inference for nonlinear inverse problems, Workshop ``Statistics meets Machine Learning'', January 26  31, 2020, Mathematisches Forschungsinstitut Oberwolfach, January 28, 2020.

K. Tabelow, MaRDI: Mathematical research data initiative, Leibniz NFDIKonferenz (Online Event), August 19, 2020.
External Preprints

M. Coghi, W. Dreyer, P. Gajewski, C. Guhlke, P. Friz, M. Maurelli, A McKeanVlasov SDE and particle system with interactionfrom reflecting boundaries, Preprint no. 2102.12315v1/12102.12315v1/37, Cornell University Library, arXiv.org, 2021.

A. Agafonov, P. Dvurechensky, G. Scutari, A. Gasnikov, D. Kamzolov, A. Lukashevich, A. Daneshmand, An accelerated secondorder method for distributed stochastic optimization, Preprint no. arXiv:2103.14392, Cornell University Library, arXiv.org, 2021.
Abstract
We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finitesum problem, for which we propose an inexact accelerated cubicregularized Newton's method that achieves lower communication complexity bound for this setting and improves upon existing upper bound. We further exploit this algorithm to obtain convergence rate bounds for the original stochastic optimization problem and compare our bounds with the existing bounds in several regimes when the goal is to minimize the number of communication rounds and increase the parallelization by increasing the number of workers. 
A. Beznosikov, P. Dvurechensky, A. Koloskova, V. Samokhin, S.U. Stich, A. Gasnikov, Decentralized local stochastic extragradient for variational inequalities, Preprint no. arXiv:2106.08315, Cornell University, 2021.
Abstract
We consider decentralized stochastic variational inequalities where the problem data is distributed across many participating devices (heterogeneous, or nonIID data setting). We propose a novel method  based on stochastic extragradient  where participating devices can communicate over arbitrary, possibly timevarying network topologies. This covers both the fully decentralized optimization setting and the centralized topologies commonly used in Federated Learning. Our method further supports multiple local updates on the workers for reducing the communication frequency between workers. We theoretically analyze the proposed scheme in the strongly monotone, monotone and nonmonotone setting. As a special case, our method and analysis apply in particular to decentralized stochastic minmax problems which are being studied with increased interest in Deep Learning. For example, the training objective of Generative Adversarial Networks (GANs) are typically saddle point problems and the decentralized training of GANs has been reported to be extremely challenging. While SOTA techniques rely on either repeated gossip rounds or proximal updates, we alleviate both of these requirements. Experimental results for decentralized GAN demonstrate the effectiveness of our proposed algorithm. 
A. Daneshmand, G. Scutari, P. Dvurechensky, A. Gasnikov, Newton method over networks is fast up to the statistical precision, Preprint no. arXiv:2102.06780, Cornell University, 2021.

E. Gladin, A. Sadiev, A. Gasnikov, P. Dvurechensky, A. Beznosikov, M. Alkousa, Solving smooth minmin and minmax problems by mixed oracle algorithms, Preprint no. arXiv:2103.00434, Cornell University, 2021.
Abstract
In this paper, we consider two types of problems that have some similarity in their structure, namely, minmin problems and minmax saddlepoint problems. Our approach is based on considering the outer minimization problem as a minimization problem with inexact oracle. This inexact oracle is calculated via inexact solution of the inner problem, which is either minimization or a maximization problem. Our main assumptions are that the problem is smooth and the available oracle is mixed: it is only possible to evaluate the gradient w.r.t. the outer block of variables which corresponds to the outer minimization problem, whereas for the inner problem only zerothorder oracle is available. To solve the inner problem we use accelerated gradientfree method with zerothorder oracle. To solve the outer problem we use either inexact variant of Vaydya's cuttingplane method or a variant of accelerated gradient method. As a result, we propose a framework that leads to nonasymptotic complexity bounds for both minmin and minmax problems. Moreover, we estimate separately the number of first and zerothorder oracle calls which are sufficient to reach any desired accuracy. 
E. Gorbunov, M. Danilova, I. Shibaev, P. Dvurechensky, A. Gasnikov, Nearoptimal high probability complexity bounds for nonsmooth stochastic optimization with heavytailed noise, Preprint no. arXiv:2106.05958, Cornell University, 2021.
Abstract
Thanks to their practical efficiency and random nature of the data, stochastic firstorder methods are standard for training largescale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for nonsmooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negativepower or logarithmic but under an additional assumption of subGaussian (lighttailed) noise distribution that may not hold in practice, e.g., in several NLP tasks. In our paper, we resolve this issue and derive the first highprobability convergence results with logarithmic dependence on the confidence level for nonsmooth convex stochastic optimization problems with nonsubGaussian (heavytailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Höldercontinuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the nonsmooth setting. 
V. Novitskii, A. Gasnikov, Improved exploiting higher order smoothness in derivativefree optimization and continuous bandit, Preprint no. arXiv:2101.03821, Cornell University, 2021.

A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, A. Gasnikov, Decentralized distributed optimization for saddle point problems, Preprint no. arXiv:2102.07758, Cornell University, 2021.

A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over timevarying graphs, Preprint no. arXiv:2103.15598, Cornell University Library, arXiv.org, 2021.
Abstract
We consider a distributed stochastic optimization problem that is solved by a decentralized network of agents with only local communication between neighboring agents. The goal of the whole system is to minimize a global objective function given as a sum of local objectives held by each agent. Each local objective is defined as an expectation of a convex smooth random function and the agent is allowed to sample stochastic gradients for this function. For this setting we propose the first accelerated (in the sense of Nesterov's acceleration) method that simultaneously attains optimal up to a logarithmic factor communication and oracle complexity bounds for smooth strongly convex distributed stochastic optimization. We also consider the case when the communication graph is allowed to vary with time and obtain complexity bounds for our algorithm, which are the first upper complexity bounds for this setting in the literature. 
V. Tominin , Y. Tominin , E. Borodich , D. Kovalev, A. Gasnikov, P. Dvurechensky, On accelerated methods for saddlepoint problems with composite structure, Preprint no. arXiv:2103.09344, Cornell University, 2021.
Abstract
We consider stronglyconvexstronglyconcave saddlepoint problems with general nonbilinear objective and different condition numbers with respect to the primal and the dual variables. First, we consider such problems with smooth composite terms, one of which having finitesum structure. For this setting we propose a variance reduction algorithm with complexity estimates superior to the existing bounds in the literature. Second, we consider finitesum saddlepoint problems with composite terms and propose several algorithms depending on the properties of the composite terms. When the composite terms are smooth we obtain better complexity bounds than the ones in the literature, including the bounds of a recently proposed nearlyoptimal algorithms which do not consider the composite structure of the problem. If the composite terms are proxfriendly, we propose a variance reduction algorithm that, on the one hand, is accelerated compared to existing variance reduction algorithms and, on the other hand, provides in the composite setting similar complexity bounds to the nearlyoptimal algorithm which is designed for noncomposite setting. Besides that, our algorithms allow to separate the complexity bounds, i.e. estimate, for each part of the objective separately, the number of oracle calls that is sufficient to achieve a given accuracy. This is important since different parts can have different arithmetic complexity of the oracle, and it is desired to call expensive oracles less often than cheap oracles. The key thing to all these results is our general framework for saddlepoint problems, which may be of independent interest. This framework, in turn is based on our proposed Accelerated MetaAlgorithm for composite optimization with probabilistic inexact oracles and probabilistic inexactness in the proximal mapping, which may be of independent interest as well. 
P. Dvurechensky, D. Kamzolov, A. Lukashevich, S. Lee, E. Ordentlich, C.A. Uribe, A. Gasnikov, Hyperfast secondorder local solvers for efficient statistically preconditioned distributed optimization, Preprint no. arXiv:2102.08246, Cornell University, 2021.

P. Dvurechensky, M. Staudigl, S. Shtern, Firstorder methods for convex optimization, Preprint no. arXiv:2101.00935, Cornell University, 2021.
Abstract
Firstorder methods for solving convex optimization problems have been at the forefront of mathematical optimization in the last 20 years. The rapid development of this important class of algorithms is motivated by the success stories reported in various applications, including most importantly machine learning, signal processing, imaging and control theory. Firstorder methods have the potential to provide low accuracy solutions at low computational complexity which makes them an attractive set of tools in largescale optimization problems. In this survey we cover a number of key developments in gradientbased optimization methods. This includes nonEuclidean extensions of the classical proximal gradient method, and its accelerated versions. Additionally we survey recent developments within the class of projectionfree methods, and proximal versions of primaldual schemes. We give complete proofs for various key results, and highlight the unifying aspects of several optimization algorithms. 
O. Butkovsky, K. Dareiotis , M. Gerencsér, Approximation of SDEs  A stochastic sewing approach, Preprint no. arXiv:1909.07961, Cornell University, 2020.

M. Danilova, P. Dvurechensky, A. Gasnikov, E. Gorbunov, S. Guminov, D. Kamzolov, I. Shibaev, Recent theoretical advances in nonconvex optimization, Preprint no. arXiv:2012.06188, Cornell University, 2020.

S. Athreya, O. Butkovsky, K. Lê, L. Mytnik, Wellposedness of stochastic heat equation with distributional drift and skew stochastic heat equation, Preprint no. arXiv:2011.13498, Cornell University, 2020.

C. Bellingeri, P. Friz, M. Gerencsér, Singular paths spaces and applications, Preprint no. arXiv:2003.03352, Cornell University, 2020.

A. Beznosikov, V. Samokhin, A. Gasnikov, Local SGD for saddlepoint problems, Preprint no. arXiv:2010.13112, Cornell University, 2020.

E. Gorbunov, A. Rogozin, A. Beznosikov, D. Dvinskikh, A. Gasnikov, Recent theoretical advances in decentralized distributed convex optimization, Preprint no. arXiv:2011.13259, Cornell University, 2020.
Abstract
In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the nondistributed setup. In particular, we provide our recent results that have not been published yet, and that could be found in details only in arXiv preprints. 
N. Puchkin, A. Timofeev, V. Spokoiny, Manifoldbased time series forecasting, Preprint no. arXiv:2012.08244, Cornell University, 2020.
Abstract
Prediction for high dimensional time series is a challenging task due to the curse of dimensionality problem. Classical parametric models like ARIMA or VAR require strong modeling assumptions and time stationarity and are often overparametrized. This paper offers a new flexible approach using recent ideas of manifold learning. The considered model includes linear models such as the central subspace model and ARIMA as particular cases. The proposed procedure combines manifold denoising techniques with a simple nonparametric prediction by local averaging. The resulting procedure demonstrates a very reasonable performance for reallife econometric time series. We also provide a theoretical justification of the manifold estimation procedure. 
N. Puchkin, V. Spokoiny, E. Stepanov, D. Trevisan, Reconstruction of manifold embeddings into Euclidean spaces via intrinsic distances, Preprint no. arXiv:2012.13770, Cornell University, 2020.
Abstract
We consider one of the classical manifold learning problems, that of reconstructing up to an almost isometry an embedding of a compact connected Riemannian manifold in a Euclidean space given the information on intrinsic distances between points from its almost dense subset. It will be shown that the most popular methods in data science to deal with such a problem, the classical Multidimensional scaling (MDS) and the Maximum variance unfolding (MVU) actually miss the point and may provide results very far from an isometry (and even may give no biLipshitz embedding). We will then provide an easy variational formulation of this problem which leads to an algorithm always providing an almost isometric imbedding with given controlled small distortion of original distances. 
A. Rastogi, P. Mathé, Inverse learning in Hilbert scales, Preprint no. arXiv:2002.10208, Cornell University, 2020.
Abstract
We study the linear illposed inverse problem with noisy data in the statistical learning setting. Approximate reconstructions from random noisy data are sought with general regularization schemes in Hilbert scale. We discuss the rates of convergence for the regularized solution under the prior assumptions and a certain link condition. We express the error in terms of certain distance functions. For regression functions with smoothness given in terms of source conditions the error bound can then be explicitly established. 
A. Sadiev, A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zerothorder algorithms for smooth saddlepoint problems, Preprint no. arXiv:2009.09908, Cornell University, 2020.
Abstract
In recent years, the importance of saddlepoint problems in machine learning has increased. This is due to the popularity of GANs. In this paper, we solve stochastic smooth (strongly) convexconcave saddlepoint problems using zerothorder oracles. Theoretical analysis shows that in the case when the optimization set is a simplex, we lose only logn times in the stochastic convergence term. The paper also provides an approach to solving saddlepoint problems, when the oracle for one of the variables has zero order, and for the second  first order. Subsequently, we implement zerothorder and 1/2thorder methods to solve practical problems. 
I. Shibaev, P. Dvurechensky, A. Gasnikov, Zerothorder methods for noisy Höldergradient functions, Preprint no. arXiv:2006.11857, Cornell University, 2020.

D. Tiapkin, A. Gasnikov, P. Dvurechensky, Stochastic saddlepoint optimization for Wasserstein barycenters, Preprint no. arXiv:2006.06763, Cornell University, 2020.
Abstract
We study the computation of nonregularized Wasserstein barycenters of probability measures supported on the finite set. The first result gives a stochastic optimization algorithm for the discrete distribution over the probability measures which is comparable with the current best algorithms. The second result extends the previous one to the arbitrary distribution using kernel methods. Moreover, this new algorithm has a total complexity better than the Stochastic Averaging approach via the Sinkhorn algorithm in many cases. 
N. Tupitsa, P. Dvurechensky, A. Gasnikov, C.A. Uribe , Multimarginal optimal transport by accelerated alternating minimization, Preprint no. arXiv:2004.02294, Cornell University Library, arXiv.org, 2020.
Abstract
We consider a multimarginal optimal transport, which includes as a particular case the Wasserstein barycenter problem. In this problem one has to find an optimal coupling between m probability measures, which amounts to finding a tensor of the order m. We propose an accelerated method based on accelerated alternating minimization and estimate its complexity to find the approximate solution to the problem. We use entropic regularization with sufficiently small regularization parameter and apply accelerated alternating minimization to the dual problem. A novel primaldual analysis is used to reconstruct the approximately optimal coupling tensor. Our algorithm exhibits a better computational complexity than the stateoftheart methods for some regimes of the problem parameters. 
D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems, Preprint no. arXiv:1904.09015, Cornell University, 2020.
Abstract
We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using minibatching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems. 
D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, A. Tyurin, V. Spokoiny, Adaptive gradient descent for convex and nonconvex stochastic optimization, Preprint no. arXiv:1911.08380, Cornell University, 2020.

D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, Preprint no. arXiv:2010.04677, Cornell University, 2020.

D. Dvinskikh, Stochastic approximation versus sample average approximation for population Wasserstein barycenter calculation, Preprint no. arXiv:2001.07697, Cornell University, 2020.

P. Dvurechensky, K. Safin, S. Shtern, M. Staudigl, Generalized selfconcordant analysis of FrankWolfe algorithms, Preprint no. arXiv:2010.01009, Cornell University, 2020.
Abstract
Projectionfree optimization via different variants of the FrankWolfe (FW) method has become one of the cornerstones in large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with selfconcordance like properties. Such generalized selfconcordant (GSC) functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex. Indeed, in a number of applications, e.g. inverse covariance estimation or distanceweighted discrimination problems in support vector machines, the loss is given by a GSC function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. This paper closes this apparent gap in the literature by developing provably convergent FW algorithms with standard O(1/k) convergence rate guarantees. If the problem formulation allows the efficient construction of a local linear minimization oracle, we develop a FW method with linear convergence rate. 
P. Dvurechensky, S. Shtern, M. Staudigl, P. Ostroukhov, K. Safin, Selfconcordant analysis of FrankWolfe algorithms, Preprint no. arXiv:2002.04320, Cornell University, 2020.
Abstract
Projectionfree optimization via different variants of the FrankWolfe (FW) method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a selfconcordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k), k being the iteration counter. If the problem can be represented by a local linear minimization oracle, we are the first to propose a FW method with linear convergence rate without assuming neither strong convexity nor a Lipschitz continuous gradient. 
P. Friz, J. Gatheral, R. Radoičić, Forests, cumulants, martingales, Preprint no. arXiv:2002.01448, Cornell University, 2020.
Abstract
This work is concerned with forest and cumulant type expansions of general random variables on a filtered probability spaces. We establish a "broken exponential martingale" expansion that generalizes and unifies the exponentiation result of Alòs, Gatheral, and Radoičić and the cumulant recursion formula of Lacoin, Rhodes, and Vargas. Specifically, we exhibit the two previous results as lower dimensional projections of the same generalized forest expansion, subsequently related by forest reordering. Our approach also leads to sharp integrability conditions for validity of the cumulant formula, as required by many of our examples, including iterated stochastic integrals, Lévy area, Bessel processes, KPZ with smooth noise, WienerItô chaos and "rough" stochastic (forward) variance models. 
P. Friz, P. Pigato, J. Seibel, The step stochastic volatility model (SSVM), Preprint no. May 7, Available at SSRN's eLibrary: urlhttps://ssrn.com/abstract=3595408 or urlhttp://dx.doi.org/10.2139/ssrn.3595408, 2020.
Abstract
Stochastic Volatility Models (SVMs) are ubiquitous in quantitative finance. But is there a Markovian SVM capable of producing extreme (T^(1/2)) shortdated implied volatility skew? We here propose a modification of a given SVM "backbone", Heston for instance, to achieve just this  without adding jumps or nonMarkovian "rough" fractional volatility dynamics. This is achieved via nonsmooth leverage function, such as a step function. The resulting Step Stochastic Volatility Model (SSVM) is thus a parametric example of local stochastic volatility model (LSVM). From an IT perspective, SSVM amounts to trivial modifications in the code of existing SVM implementations. From a QF perspective, SSVM offers new flexibility in smile modelling and towards assessing model risk. For comparison, we then exhibit the marketinduced leverage function for LSVM, calibrated with the particle method.
Research Groups
 Partial Differential Equations
 Laser Dynamics
 Numerical Mathematics and Scientific Computing
 Nonlinear Optimization and Inverse Problems
 Interacting Random Systems
 Stochastic Algorithms and Nonparametric Statistics
 Thermodynamic Modeling and Analysis of Phase Transitions
 Nonsmooth Variational Problems and Operator Equations