Statistical inverse problems

Mathematical research topic of WIAS

DE |
EN

In many applications the quantities of interest can be observed only indirectly, or they must be derived from other measurements. Often the measurements are noisy and the reconstruction of the quantities of interest from noisy measurements is unstable. In this case the problem is an ill-posed (inverse) problem. Such problems are met in tomography and other image generating technologies, but the range of potential applications is much wider. A specific area of application is model calibration, i.e. when a model depends on additional parameters which need to be fitted to given data.

The reconstruction of the quantities of interest from such noisy and indirect obervations gives rise to statistical inverse problems, if the nature of the noise is statistical. Then we meet (nonparametric) statistical estimation problems which may be solved in a traditional sense by providing an estimator for the quantities of interest, but which may also be solved, when performing a Bayes analysis, which then provides us with additional information and which thus allows for the quantification of the uncertainty inerent in the reconstruction.

The studies in this topical area range from fundamental mathematical questions to application specific investigations. They involve expertise in regularization theory, non-parametric statistics, and within the specific application.

In recent years important contributions were made to the analysis of linear and nonlinear statistical inverse problems. One objective was the design of efficient reconstruction methods and the understanding of a proper parameter choice (model selection) for the used regularization scheme. Bayesian methods were also studied, both theoretically and in practical applications (MCMC methods). One important application was within the industrial project BOP.

Publications

Monographs

N. Tupitsa, P. Dvurechensky, D. Dvinskikh, A. Gasnikov, Section: Computational Optimal Transport, P.M. Pardalos, O.A. Prokopyev, eds., Encyclopedia of Optimization, Springer International Publishing, Cham, published online on 11.07.2023 pages, (Chapter Published), DOI 10.1007/978-3-030-54621-2_861-1 .
A. Gasnikov, D. Dvinskikh, P. Dvurechensky, E. Gorbunov, A. Beznosikov, A. Lobanov, Randomized gradient-free methods in convex optimization, P.M. Pardalos , O.A. Prokopyev, eds., Encyclopedia of Optimization, Springer International Publishing, Cham, published online on 08.09.2023 pages, (Chapter Published), DOI 10.007/978-3-030-54621-2_859-1 .
M. Danilova, P. Dvurechensky, A. Gasnikov, E. Gorbunov, S. Guminov, D. Kamzolov, I. Shibaev, Chapter: Recent Theoretical Advances in Non-convex Optimization, A. Nikeghbali, P.M. Pardalos, A.M. Raigorodskii, M.Th. Rassias, eds., 191 of Springer Optimization and Its Applications, Springer, Cham, 2022, pp. 79--163, (Chapter Published), DOI 10.1007/978-3-031-00832-0_3 .
M. Hintermüller, K. Papafitsoros, Chapter 11: Generating Structured Nonsmooth Priors and Associated Primal-dual Methods, in: Processing, Analyzing and Learning of Images, Shapes, and Forms: Part 2, R. Kimmel, X.-Ch. Tai, eds., 20 of Handbook of Numerical Analysis, Elsevier, 2019, pp. 437--502, (Chapter Published), DOI 10.1016/bs.hna.2019.08.001 .
M. Hintermüller, M. Hinze, J. Sokołowski, S. Ulbrich, eds., Special issue to honour Guenter Leugering on his 65th birthday, 1 of Control & Cybernetics, Systems Research Institute, Polish Academy of Sciences, Warsaw, 2019, (Collection Published).
M. Hintermüller, J.F. Rodrigues, eds., Topics in Applied Analysis and Optimisation -- Partial Differential Equations, Stochastic and Numerical Analysis, CIM Series in Mathematical Sciences, Springer Nature Switzerland AG, Cham, 2019, 396 pages, (Collection Published).
P. Deuflhard, M. Grötschel, D. Hömberg, U. Horst, J. Kramer, V. Mehrmann, K. Polthier, F. Schmidt, Ch. Schütte, M. Skutella, J. Sprekels, eds., MATHEON -- Mathematics for Key Technologies, 1 of EMS Series in Industrial and Applied Mathematics, European Mathematical Society Publishing House, Zurich, 2014, 453 pages, (Collection Published).

Articles in Refereed Journals

D. Pasechniuk, P. Dvurechensky, C.A. Uribe, A. Gasnikov, Decentralised convex optimisation with probability-proportional-to-size quantization, EURO Journal on Computational Optimization, 13 (2025), published online on 22.07.2025, DOI 10.1016/j.ejco.2025.100113 .
P. Dvurechensky, G. Iommazzo, S. Shtern, M. Staudigl, A conditional gradient homotopy method with applications to semidefinite programming, IMA Journal of Numerical Analysis, (2025), pp. 8208252/1 -- 8208252/54, DOI 10.1093/imanum/draf059 .
E. Gorbunov, M. Danilova, I. Shibaev, P. Dvurechensky, A. Gasnikov, High-probability complexity bounds for non-smooth stochastic convex optimization with heavy-tailed noise, Journal of Optimization Theory and Applications, 203 (2024), pp. 2679--2738, DOI 10.1007/s10957-024-02533-z .
Abstract
Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice, e.g., in several NLP tasks. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.
A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, A. Gasnikov, Decentralized saddle point problems via non-Euclidean mirror prox, Optimization Methods & Software, 40 (2025), pp. 1127--1152, DOI 10.1080/10556788.2023.2280062 .
P. Dvurechensky, P. Ostroukhov, A. Gasnikov, C.A. Uribe, A. Ivanova, Near-optimal tensor methods for minimizing the gradient norm of convex functions and accelerated primal-dual tensor methods, Optimization Methods & Software, 39 (2024), pp. 1068--1103, DOI 10.1080/10556788.2023.2296443 .
P. Dvurechensky, M. Staudigl, Hessian barrier algorithms for non-convex conic optimization, Mathematical Programming. A Publication of the Mathematical Programming Society, 209 (2025), pp. 171--229 (published online on 04.03.2024), DOI 10.1007/s10107-024-02062-7 .
Abstract
We consider the minimization of a continuous function over the intersection of a regular cone with an affine set via a new class of adaptive first- and second-order optimization methods, building on the Hessian-barrier techniques introduced in [Bomze, Mertikopoulos, Schachinger, and Staudigl, Hessian barrier algorithms for linearly constrained optimization problems, SIAM Journal on Optimization, 2019]. Our approach is based on a potential-reduction mechanism and attains a suitably defined class of approximate first- or second-order KKT points with the optimal worst-case iteration complexity O(??2) (first-order) and O(??3/2) (second-order), respectively. A key feature of our methodology is the use of self-concordant barrier functions to construct strictly feasible iterates via a disciplined decomposition approach and without sacrificing on the iteration complexity of the method. To the best of our knowledge, this work is the first which achieves these worst-case complexity bounds under such weak conditions for general conic constrained optimization problems.
M. Eigel, R. Gruhlke, D. Sommer, Less interaction with forward models in Langevin dynamics: Enrichment and homotopy, SIAM Journal on Applied Dynamical Systems, 23 (2024), pp. 870--1908, DOI 10.1137/23M1546841 .
Abstract
Ensemble methods have become ubiquitous for the solution of Bayesian inference problems. State-of-the-art Langevin samplers such as the Ensemble Kalman Sampler (EKS), Affine Invariant Langevin Dynamics (ALDI) or its extension using weighted covariance estimates rely on successive evaluations of the forward model or its gradient. A main drawback of these methods hence is their vast number of required forward calls as well as their possible lack of convergence in the case of more involved posterior measures such as multimodal distributions. The goal of this paper is to address these challenges to some extend. First, several possible adaptive ensemble enrichment strategies that successively enlarge the number of particles in the underlying Langevin dynamics are discusses that in turn lead to a significant reduction of the total number of forward calls. Second, analytical consistency guarantees of the ensemble enrichment method are provided for linear forward models. Third, to address more involved target distributions, the method is extended by applying adapted Langevin dynamics based on a homotopy formalism for which convergence is proved. Finally, numerical investigations of several benchmark problems illustrates the possible gain of the proposed method, comparing it to state-of-the-art Langevin samplers.
A. Agafonov, D. Kamzolov, P. Dvurechensky, A. Gasnikov, Inexact tensor methods and their application to stochastic convex optimization, Optimization Methods & Software, 39 (2024), pp. 42--83 (published online in Nov. 2023), DOI 10.1080/10556788.2023.2261604 .
N. Kornilov, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, Gradient free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact, Computational Management Science, 20 (2023), pp. 37/1--37/43, DOI 10.1007/s10287-023-00470-2 .
A. Ivanova, P. Dvurechensky, E. Vorontsova, D. Pasechnyuk, A. Gasnikov, D. Dvinskikh, A. Tyurin, Oracle complexity separation in convex optimization, Journal of Optimization Theory and Applications, 193 (2022), pp. 462--490, DOI 10.1007/s10957-022-02038-7 .
Abstract
Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part.
E. Gorbunov, P. Dvurechensky, A. Gasnikov, An accelerated method for derivative-free smooth stochastic convex optimization, SIAM Journal on Optimization, 32 (2022), pp. 1210--1238, DOI 10.1137/19M1259225 .
Abstract
We consider an unconstrained problem of minimization of a smooth convex function which is only available through noisy observations of its values, the noise consisting of two parts. Similar to stochastic optimization problems, the first part is of a stochastic nature. On the opposite, the second part is an additive noise of an unknown nature, but bounded in the absolute value. In the two-point feedback setting, i.e. when pairs of function values are available, we propose an accelerated derivative-free algorithm together with its complexity analysis. The complexity bound of our derivative-free algorithm is only by a factor of n??? larger than the bound for accelerated gradient-based algorithms, where n is the dimension of the decision variable. We also propose a non-accelerated derivative-free algorithm with a complexity bound similar to the stochastic-gradient-based algorithm, that is, our bound does not have any dimension-dependent factor. Interestingly, if the solution of the problem is sparse, for both our algorithms, we obtain better complexity bound if the algorithm uses a 1-norm proximal setup, rather than the Euclidean proximal setup, which is a standard choice for unconstrained problems.
P. Dvurechensky, K. Safin, S. Shtern, M. Staudigl, Generalized self-concordant analysis of Frank--Wolfe algorithms, Mathematical Programming. A Publication of the Mathematical Programming Society, 198 (2023), pp. 255--323 (published online on 29.01.2022), DOI 10.1007/s10107-022-01771-1 .
Abstract
Projection-free optimization via different variants of the Frank--Wolfe method has become one of the cornerstones of large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex, making them a challenging class of functions for first-order methods. Indeed, in a number of applications, such as inverse covariance estimation or distance-weighted discrimination problems in binary classification, the loss is given by a generalized self-concordant function having potentially unbounded curvature. For such problems projection-free minimization methods have no theoretical convergence guarantee. This paper closes this apparent gap in the literature by developing provably convergent Frank?Wolfe algorithms with standard O(1/k) convergence rate guarantees. Based on these new insights, we show how these sublinearly convergent methods can be accelerated to yield linearly convergent projection-free methods, by either relying on the availability of a local liner minimization oracle, or a suitable modification of the away-step Frank--Wolfe method.
M. Eigel, R. Gruhlke, M. Marschall, Low-rank tensor reconstruction of concentrated densities with application to Bayesian inversion, Statistics and Computing, 32 (2022), pp. 27/1--27/27, DOI 10.1007/s11222-022-10087-1 .
Abstract
A novel method for the accurate functional approximation of possibly highly concentrated probability densities is developed. It is based on the combination of several modern techniques such as transport maps and nonintrusive reconstructions of low-rank tensor representations. The central idea is to carry out computations for statistical quantities of interest such as moments with a convenient reference measure which is approximated by an numerical transport, leading to a perturbed prior. Subsequently, a coordinate transformation leads to a beneficial setting for the further function approximation. An efficient layer based transport construction is realized by using the Variational Monte Carlo (VMC) method. The convergence analysis covers all terms introduced by the different (deterministic and statistical) approximations in the Hellinger distance and the Kullback-Leibler divergence. Important applications are presented and in particular the context of Bayesian inverse problems is illuminated which is a central motivation for the developed approach. Several numerical examples illustrate the efficacy with densities of different complexity.
E. Vorontsova, A. Gasnikov, P. Dvurechensky, A. Ivanova, D. Pasechnyuk, Numerical methods for the resource allocation problem in a computer network, Computational Mathematics and Mathematical Physics, 61 (2021), pp. 297--328, DOI 10.1134/S0965542521020135 .
A. Gasnikov, D. Dvinskikh, P. Dvurechensky, D. Kamzolov, V. Matyukhin, D. Pasechnyuk, N. Tupitsa, A. Chernov, Accelerated meta-algorithm for convex optimization, Computational Mathematics and Mathematical Physics, 61 (2021), pp. 17--28, DOI 10.1134/S096554252101005X .
N. Tupitsa, P. Dvurechensky, A. Gasnikov, S. Guminov, Alternating minimization methods for strongly convex optimization, Journal of Inverse and Ill-Posed Problems, 29 (2021), pp. 721--739, DOI 10.1515/jiip-2020-0074 .
Abstract
We consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under Polyak-Łojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in many-blocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the non-accelerated method.
A. Rastogi, G. Blanchard, P. Mathé, Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems, Electronic Journal of Statistics, 14 (2020), pp. 2798--2841, DOI 10.1214/20-EJS1735 .
Abstract
We study a non-linear statistical inverse problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization) approach to estimate the quantity for the non-linear ill-posed inverse problem. The estimator is defined as the minimizer of a Tikhonov functional, which is the sum of a data misfit term and a quadratic penalty term. We develop a theoretical analysis for the minimizer of the Tikhonov regularization scheme using the concept of reproducing kernel Hilbert spaces. We discuss optimal rates of convergence for the proposed scheme, uniformly over classes of admissible solutions, defined through appropriate source conditions.
P. Mathé, Bayesian inverse problems with non-commuting operators, Mathematics of Computation, 88 (2019), pp. 2897--2912, DOI 10.1090/mcom/3439 .
Abstract
The Bayesian approach to ill-posed operator equations in Hilbert space recently gained attraction. In this context, and when the prior distribution is Gaussian, then two operators play a significant role, the one which governs the operator equation, and the one which describes the prior covariance. Typically it is assumed that these operators commute. Here we extend this analysis to non-commuting operators, replacing the commutativity assumption by a link condition. We discuss its relation to the commuting case, and we indicate that this allows to use interpolation type results to obtain tight bounds for the contraction of the posterior Gaussian distribution towards the data generating element.
M. Eigel, M. Marschall, R. Schneider, Sampling-free Bayesian inversion with adaptive hierarchical tensor representations, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 34 (2018), pp. 035010/1--035010/29, DOI 10.1088/1361-6420/aaa998 .
Abstract
The statistical Bayesian approach is a natural setting to resolve the ill-posedness of inverse problems by assigning probability densities to the considered calibration parameters. Based on a parametric deterministic representation of the forward model, a sampling-free approach to Bayesian inversion with an explicit representation of the parameter densities is developed. The approximation of the involved randomness inevitably leads to several high dimensional expressions, which are often tackled with classical sampling methods such as MCMC. To speed up these methods, the use of a surrogate model is beneficial since it allows for faster evaluation with respect to calibration parameters. However, the inherently slow convergence can not be remedied by this. As an alternative, a complete functional treatment of the inverse problem is feasible as demonstrated in this work, with functional representations of the parametric forward solution as well as the probability densities of the calibration parameters, determined by Bayesian inversion. The proposed sampling-free approach is discussed in the context of hierarchical tensor representations, which are employed for the adaptive evaluation of a random PDE (the forward problem) in generalized chaos polynomials and the subsequent high-dimensional quadrature of the log-likelihood. This modern compression technique alleviates the curse of dimensionality by hierarchical subspace approximations of the involved low rank (solution) manifolds. All required computations can be carried out efficiently in the low-rank format. A priori convergence is examined, considering all approximations that occur in the method. Numerical experiments demonstrate the performance and verify the theoretical results.
S. Lu, P. Mathé, S. Pereverzyev, Balancing principle in supervised learning for a general regularization scheme, Applied and Computational Harmonic Analysis. Time-Frequency and Time-Scale Analysis, Wavelets, Numerical Algorithms, and Applications, 48 (2020), pp. 123--148, DOI 10.1016/j.acha.2018.03.001 .
A. Anikin, A. Gasnikov, P. Dvurechensky, A. Turin, A. Chernov, Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints, Computational Mathematics and Mathematical Physics, 57 (2017), pp. 1262--1276.
D. Belomestny, H. Mai, J.G.M. Schoenmakers, Generalized Post--Widder inversion formula with application to statistics, Journal of Mathematical Analysis and Applications, 455 (2017), pp. 89--104.
Abstract
In this work we derive an inversion formula for the Laplace transform of a density observed on a curve in the complex domain, which generalizes the well known Post-Widder formula. We establish convergence of our inversion method and derive the corresponding convergence rates for the case of a Laplace transform of a smooth density. As an application we consider the problem of statistical inference for variance-mean mixture models. We construct a nonparametric estimator for the mixing density based on the generalized Post-Widder formula, derive bounds for its root mean square error and give a brief numerical example.
S. Bürger, P. Mathé, Discretized Lavrent'ev regularization for the autoconvolution equation, Applicable Analysis. An International Journal, 96 (2017), pp. 1618--1637, DOI 10.1080/00036811.2016.1212336 .
Abstract
Lavrent?ev regularization for the autoconvolution equation was considered by Janno J. in Lavrent?ev regularization of ill-posed problems containing nonlinear near-to-monotone operators with application to autoconvolution equation, Inverse Prob. 2000;16:333?348. Here this study is extended by considering discretization of the Lavrent?ev scheme by splines. It is shown how to maintain the known convergence rate by an appropriate choice of spline spaces and a proper choice of the discretization level. For piece-wise constant splines the discretized equation allows for an explicit solver, in contrast to using higher order splines. This is used to design a fast implementation by means of post-smoothing, which provides results, which are indistinguishable from results obtained by direct discretization using cubic splines.
M. Hintermüller, C.N. Rautenberg, T. Wu, A. Langer, Optimal selection of the regularization function in a generalized total variation model. Part II: Algorithm, its analysis and numerical tests, Journal of Mathematical Imaging and Vision, 59 (2017), pp. 515--533.
Abstract
Based on the generalized total variation model and its analysis pursued in part I (WIAS Preprint no. 2235), in this paper a continuous, i.e., infinite dimensional, projected gradient algorithm and its convergence analysis are presented. The method computes a stationary point of a regularized bilevel optimization problem for simultaneously recovering the image as well as determining a spatially distributed regularization weight. Further, its numerical realization is discussed and results obtained for image denoising and deblurring as well as Fourier and wavelet inpainting are reported on.
M. Hintermüller, C.N. Rautenberg, Optimal selection of the regularization function in a weighted total variation model. Part I: Modeling and theory, Journal of Mathematical Imaging and Vision, 59 (2017), pp. 498--514.
Abstract
Based on the generalized total variation model and its analysis pursued in part I (WIAS Preprint no. 2235), in this paper a continuous, i.e., infinite dimensional, projected gradient algorithm and its convergence analysis are presented. The method computes a stationary point of a regularized bilevel optimization problem for simultaneously recovering the image as well as determining a spatially distributed regularization weight. Further, its numerical realization is discussed and results obtained for image denoising and deblurring as well as Fourier and wavelet inpainting are reported on.
P. Mathé, S.V. Pereverzev, Complexity of linear ill-posed problems in Hilbert space, Journal of Complexity, 38 (2017), pp. 50--67.
D. Belomestny, J.G.M. Schoenmakers, Statistical inference for time-changed Lévy processes via Mellin transform approach, Stochastic Processes and their Applications, 126 (2016), pp. 2092--2122.
K. Lin, S. Lu, P. Mathé, Oracle-type posterior contraction rates in Bayesian inverse problems, Inverse Problems and Imaging, 9 (2015), pp. 895--915.
P. Mathé, Adaptive discretization for signal detection in statistical inverse problems, Applicable Analysis. An International Journal, 94 (2015), pp. 494--505.
S. Anzengruber, B. Hofmann, P. Mathé, Regularization properties of the sequential discrepancy principle for Tikhonov regularization in Banach spaces, Applicable Analysis. An International Journal, 93 (2014), pp. 1382--1400.
S. Lu, P. Mathé, Discrepancy based model selection in statistical inverse problems, Journal of Complexity, 30 (2014), pp. 290--308.
C. Marteau, P. Mathé, General regularization schemes for signal detection in inverse problems, Mathematical Methods of Statistics, 23 (2014), pp. 176--200.
R.I. Boţ, B. Hofmann, P. Mathé, Regularizability of ill-posed problems and the modulus of continuity, Analysis and Applications, 32 (2013), pp. 299--312.
E. Burnaev, A. Zaytsev, V. Spokoiny, Non-asymptotic properties for Gaussian field regression, Automation and Remote Control, 74 (2013), pp. 1645--1655.
Q. Jin, P. Mathé, Oracle inequality for a statistical Raus--Gfrerer-type rule, SIAM ASA J. Uncertainty Quantification, 1 (2013), pp. 386--407.
A. Zaitsev, E. Burnaev, V. Spokoiny, Properties of the posterior distribution of a regression model based on Gaussian random fields, Automation and Remote Control, 74 (2013), pp. 1645--1655.
B. Hofmann, P. Mathé, Some note on the modulus of continuity for ill-posed problems in Hilbert space, Trudy Instituta Matematiki i Mekhaniki UrO RAN, 18 (2012), pp. 34--41.
G. Blanchard, P. Mathé, Discrepancy principle for statistical inverse problems with application to conjugate gradient iteration, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 28 (2012), pp. 115011/1--115011/23.
B. Hofmann, P. Mathé, Parameter choice in Banach space regularization under variational inequalities, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 28 (2012), pp. 104006/1--104006/17.
S.M.A. Becker, Regularization of statistical inverse problems and the Bakushinskii veto, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 27 (2011), pp. 115010/1--115010/22.
Abstract
In the deterministic context Bakushinskii's theorem excludes the existence of purely data-driven convergent regularization for ill-posed problems. We will prove in this work that in the statistical setting we can either construct a counter example or develop an equivalent formulation depending on the considered class of probability distributions. Hence, Bakushinskii's theorem does not generalize to the statistical context, although this has often been assumed in the past. To arrive at this conclusion, we will deduce from the classic theory new concepts for a general study of statistical inverse problems and perform a systematic clarification of the key ideas of statistical regularization.
F. Bauer, P. Mathé, Parameter choice methods using minimization schemes, Journal of Complexity, 27 (2011), pp. 68--85.
Abstract
In this paper we establish a generalized framework, which allows to prove convergenence and optimality of parameter choice schemes for inverse problems based on minimization in a generic way. We show that the well known quasi-optimality criterion falls in this class. Furthermore we present a new parameter choice method and prove its convergence by using this newly established tool.
J. Flemming, B. Hofmann, P. Mathé, Sharp converse results for the regularization error using distance functions, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 27 (2011), pp. 025006/1--025006/18.
Abstract
In the analysis of ill-posed inverse problems the impact of solution smoothness on accuracy and convergence rates plays an important role. For linear ill-posed operator equations in Hilbert spaces and with focus on the linear regularization schema we will establish relations between the different kinds of measuring solution smoothness in a point-wise or integral manner. In particular we discuss the interplay of distribution functions, profile functions that express the regularization error, index functions generating source conditions, and distance functions associated with benchmark source conditions. We show that typically the distance functions and the profile functions carry the same information as the distribution functions, and that this is not the case for general source conditions. The theoretical findings are accompanied with examples exhibiting applications and limitations of the approach.
P. Mathé, U. Tautenhahn, Enhancing linear regularization to treat large noise, Journal of Inverse and Ill-Posed Problems, 19 (2011), pp. 859--879.
P. Mathé, U. Tautenhahn, Regularization under general noise assumptions, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 27 (2011), pp. 035016/1--035016/15.
Abstract
The authors explain how the major results which were obtained recently in Eggermont et al (2009 Inverse Problems 25 115018) can be derived from a more general perspective of recent regularization theory. By pursuing this further, the authors provide a general view on regularization under general noise assumptions, including weakly and strongly controlled noise. The prospect is not to generalize previous work in this direction, but rather to envision the intrinsic structure present in regularization under general noise assumptions. In particular, the authors find variants of the discrepancy and the Lepski$vrm i$ principle to choose the regularization parameter, albeit within different context and under different assumptions.
D. Belomestny, Spectral estimation of the fractional order of a Lévy process, The Annals of Statistics, 38 (2010), pp. 317--351.
B. Hoffmann, P. Mathé, H. VON Weizsäcker, Regularization in Hilbert space under unbounded operators and general source conditions, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 25 (2009), pp. 115013/1--115013/15.
D. Belomestny, G.N. Milstein, V. Spokoiny, Regression methods in pricing American and Bermudan options using consumption processes, Quantitative Finance, 9 (2009), pp. 315--327.
Abstract
Here we develop methods for efficient pricing multidimensional discrete-time American and Bermudan options by using regression based algorithms together with a new approach towards constructing upper bounds for the price of the option. Applying sample space with payoffs at the optimal stopping times, we propose sequential estimates for continuation values, values of the consumption process, and stopping times on the sample paths. The approach admits constructing both low and upper bounds for the price by Monte Carlo simulations. The methods are illustrated by pricing Bermudan swaptions and snowballs in the Libor market model.
P. Mathé, S.V. Pereverzev, The use of higher order finite difference schemes is not dangerous, Journal of Complexity, 25 (2009), pp. 3--10.
V. Spokoiny, C. Vial, Parameter tuning in pointwise adaptation using a propagation approach, The Annals of Statistics, 37 (2009), pp. 2783--2807.
Abstract
This paper discusses the problem of adaptive estimating a univariate object like the value of a regression function at a given point or a linear functional in a linear inverse problem. We consider an adaptive procedure originated from Lepski (1990) which selects in a data-driven way one estimate out of a given class of estimates ordered by their variability. A serious problem with using this and similar procedures is the choice of some tuning parameters like thresholds. Numerical results show that the theoretically recommended proposals appear to be too conservative and lead to a strong oversmoothing effects. A careful choice of the parameters of the procedure is extremely important for getting the reasonable quality of estimation. The main contribution of this paper is the new approach for choosing the parameters of the procedure by providing the prescribed behavior of the resulting estimate in the simple parametric situation. We establish a non-asymptotical “oracle” bound which shows that the estimation risk is, up to a logarithmic multiplier, equal to the risk of the “oracle” estimate which is optimally selected from the given family. A numerical study demonstrates the nice performance of the resulting procedure in a number of simulated examples.
B. Hofmann, P. Mathé, M. Schieck, Modulus of continuity for conditionally stable ill-posed problems in Hilbert space, Journal of Inverse and Ill-Posed Problems, 16 (2008), pp. 567-585.
P. Mathé, B. Hofmann, Direct and inverse results in variable Hilbert scales, Journal of Approximation Theory, 154 (2008), pp. 77--89.
P. Mathé, B. Hofmann, How general are general source conditions?, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 24 (2008), pp. 015009/1--015009/5.
P. Mathé, N. Schöne, Regularization by projection in variable Hilbert scales, Applicable Analysis. An International Journal, 2 (2008), pp. 201-- 219.
B. Hofmann, P. Mathé, S.V. Pereverzev, Regularization by projection: Approximation theoretic aspects and distance functions, Journal of Inverse and Ill-Posed Problems, 15 (2007), pp. 527--545.
P. Mathé, U. Tautenhahn, Error bounds for regularization methods in Hilbert scales by using operator monotonicity, Far East Journal of Mathematical Sciences (FJMS), 24 (2007), pp. 1--21.
Abstract
For solving linear ill-posed problems with noisy data regularization methods are required. In the present paper regularized approximations are obtained by a general regularization scheme in Hilbert scales. We exploit operator monotonicity of certain functions for deriving order optimal error bounds that characterize the accuracy of the regularized approximations. These error bounds are obtained under general smoothness conditions
P. Mathé, B. Hofmann, Analysis of profile functions for general linear regularization methods, SIAM Journal on Numerical Analysis, 45 (2007), pp. 1122--1141.
Abstract
The stable approximate solution of ill-posed linear operator equations in Hilbert spaces requires regularization. Tight bounds for the noise-free part of the regularization error are constitutive for bounding the overall error. Norm bounds of the noise-free part which decrease to zero along with the regularization parameter are called profile functions and are subject of our analysis. The interplay between properties of the regularization and certain smoothness properties of solution sets, which we shall describe in terms of source-wise representations is crucial for the decay of associated profile functions. On the one hand, we show that a given decay rate is possible only if the underlying true solution has appropriate smoothness. On the other hand, if smoothness fits the regularization, then decay rates are easily obtained. If smoothness does not fit, then we will measure this in terms of some distance function. Tight bounds for these allow us to obtain profile functions. Finally we study the most realistic case when smoothness is measured with respect to some operator which is related to the one governing the original equation only through a link condition. In many parts the analysis is done on geometric basis, extending classical concepts of linear regularization theory in Hilbert spaces. We emphasize intrinsic features of linear ill-posed problems which are frequently hidden in the classical analysis of such problems.
A. Goldenshluger, V. Spokoiny, Recovering convex edges of image from noisy tomographic data, IEEE Transactions on Information Theory, 52 (2006), pp. 1322--1334.
D. Belomestny, M. Reiss, Spectral calibration of exponential Lévy models, Finance and Stochastics, 10 (2006), pp. 449--474.
Abstract
We investigate the problem of calibrating an exponential Lévy model based on market prices of vanilla options. We show that this inverse problem is in general severely ill-posed and we derive exact minimax rates of convergence. The estimation procedure we propose is based on the explicit inversion of the option price formula in the spectral domain and a cut-off scheme for high frequencies as regularisation.
P. Mathé, U. Tautenhahn, Interpolation in variable Hilbert scales with application to inverse problems, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 22 (2006), pp. 2271--2297.
Abstract
For solving linear ill-posed problems with noisy data regularization methods are required. In the present paper regularized approximations in Hilbert scales are obtained by a general regularization scheme. The analysis of such schemes is based on new results for interpolation in Hilbert scales. Error bounds are obtained under general smoothness conditions.
J. Polzehl, V. Spokoiny, Propagation-separation approach for local likelihood estimation, Probability Theory and Related Fields, 135 (2006), pp. 335--362.
Abstract
The paper presents a unified approach to local likelihood estimation for a broad class of nonparametric models, including, e.g., regression, density, Poisson and binary response models. The method extends the adaptive weights smoothing (AWS) procedure introduced by the authors [Adaptive weights smoothing with applications to image sequentation. J. R. Stat. Soc., Ser. B 62, 335-354 (2000)] in the context of image denoising. The main idea of the method is to describe a greatest possible local neighborhood of every design point in which the local parametric assumption is justified by the data. The method is especially powerful for model functions having large homogeneous regions and sharp discontinuities. The performance of the proposed procedure is illustrated by numerical examples for density estimation and classification. We also establish some remarkable theoretical non-asymptotic results on properties of the new algorithm. This includes the “propagation” property which particularly yields the root-$n$ consistency of the resulting estimate in the homogeneous case. We also state an “oracle” result which implies rate optimality of the estimate under usual smoothness conditions and a “separation” result which explains the sensitivity of the method to structural changes.
D. Belomestny, Reconstruction of the general distribution from the distribution of some statistics, Theory of Probability and its Applications, 49 (2005), pp. 1--15.
Abstract
We investigate the problem of characterizing the distribution of independent identically distributed random variables $X_1,ldots,X_m$ (general distribution) by the distribution of linear statistics and statistics of maximum with positive coefficients. Necessary and sufficient conditions are found under which such a characterization takes place.
A. Goldenshluger, V. Spokoiny, On the shape-from-moments problem and recovering edges from noisy Radon data, Probability Theory and Related Fields, 128 (2004), pp. 123--140.
D. Belomestny, Constraints on distributions imposed by properties of linear forms, ESAIM. Probability and Statistics, 7 (2003), pp. 313-328.
Abstract
Let $(X_1,Y_1)...,(X_m,Y_m)$ be $m$ independent identically distributed bivariate vectors and $L_1=b_1X_1+...+b_mX_m$ ,$L_2=b_1Y_1+...+b_mY_m$ are two linear forms with positive coefficients. We study two problems: under what conditions does the equidistribution of $L_1$ and $L_2$ imply the same property for $X_1$ and $Y_1$, and under what conditions does the independence of $L_1$ and $L_2$ entail independence of $X_1$ and $Y_1$? Some analytical sufficient conditions are obtained and it is shown that in general they can not be weakened.
P. Mathé, S.V. Pereverzev, Discretization strategy for linear ill-posed problems in variable Hilbert scales, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 19 (2003), pp. 1263-1277.
Abstract
The authors study the regularization of projection methods for solving linear ill-posed problems with compact and injective linear operators in Hilbert spaces. Smoothness of the unknown solution is given in terms of general source conditions, such that the framework of variable Hilbert scale s is suitable. The structure of the error is analyzed in terms of the noise level, the regularization parameter and as a function of other parameters, driving the discretization. As a result, a strategy is proposed, which automatically adapts to the unknown source condition, uniformly for certain classes, and provides the optimal order of accuracy.
P. Mathé, S.V. Pereverzev, Direct estimation of linear functionals from indirect noisy observations, Journal of Complexity, 18 (2002), pp. 500--516.
Abstract
The authors study the efficiency of the linear functional strategy, as introduced by Anderssen (1986), for inverse problems with observations blurred by Gaussian white noise with known intensity $delta$. The optimal accuracy is presented and it is shown, how this can be achieved by a linear--functional strategy based on the noisy observations. This optimal linear--functional strategy is obtained from Tikhonov regularization of some dual problem. Next, the situation is treated, when only a finite number of noisy observations, given beforehand is available. Under appropriate smoothness assumptions best possible accuracy still can be attained, if the number of observations corresponds to the noise intensity in a proper way. It is also shown, that, at least asymptotically this number of observations cannot be reduced.
P. Mathé, Stable summation of orthogonal series with noisy coefficients, Journal of Approximation Theory, 117 (2002), pp. 66--80.
Abstract
We study the recovery of continuous functions from Fourier coefficients with respect to certain given orthonormal systems, blurred by noise. For deterministic noise this is a classical ill--posed problem. Emphasis is laid on a priori smoothness assumptions on the solution, which allows to apply regularization to reach the best possible accuracy. Results are obtained for systems obeying norm growth conditions. In the white noise setting mild additional assumptions have to be made to have accurate bounds. We finish our study with the recovery of functions from noisy coefficients with respect to the Haar system.
P. Mathé, S.V. Pereverzev, Optimal discretization of inverse problems in Hilbert scales. Regularization and self-regularization of projection methods, SIAM Journal on Numerical Analysis, 38 (2001), pp. 1999--2021.
Abstract
We study the efficiency of the approximate solution of ill--posed problems, based on discretized observations, which we assume to be given afore--hand. We restrict ourselves to problems which can be formulated in Hilbert scales. Within this framework we shall quantify the degree of ill--posedness, provide general conditions on projection schemes to achieve the best possible order of accuracy. We pay particular attention on the problem of self--regularization vs. Tikhonov regularization. Moreover, we study the information complexity. Asymptotically, any method, which achieves the best possible order of accuracy must use at least such amount of noisy observations. We accomplish our study with two specific problems, Abel's integral equation and the recovery of continuous functions from noisy coefficients with respect to a given orthonormal system, both classical ill--posed problems.

Contributions to Collected Editions

E. Gorbunov, A. Sadiev, M. Danilova, S. Horváth, G. Gidel, P. Dvurechensky, A. Gasnikov, P. Richtárik, High-probability convergence for composite and distributed stochastic minimization and variational inequalities with heavy-tailed noise, in: International Conference on Machine Learning, 21--27 July 2024, Vienna, Austria, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, F. Berkenkamp, eds., 235 of Proceedings of Machine Learning Research, 2024, pp. 15951--16070.
P. Dvurechensky, M. Staudigl, Barrier algorithms for constrained non-convex optimization, in: International Conference on Machine Learning, 21--27 July 2024, Vienna, Austria, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, F. Berkenkamp, eds., 235 of Proceedings of Machine Learning Research, 2024, pp. 12190--12214.
S. Abdurakhmon, M. Danilova, E. Gorbunov, S. Horvath, G. Gauthier, P. Dvurechensky, P. Richtarik, High-probability bounds for stochastic optimization and variational inequalities: The case of unbounded variance, in: Proceedings of the 40th International Conference on Machine Learning, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, J. Scarlett, eds., 202 of Proceedings of Machine Learning Research, 2023, pp. 29563--29648.
A. Agafonov, P. Dvurechensky, G. Scutari, A. Gasnikov, D. Kamzolov, A. Lukashevich, A. Daneshmand, An accelerated second-order method for distributed stochastic optimization, in: 60th IEEE Conference on Decision and Control (CDC), IEEE, 2021, pp. 2407--2413, DOI 10.1109/CDC45484.2021.9683400 .
A. Daneshmand, G. Scutari, P. Dvurechensky, A. Gasnikov, Newton method over networks is fast up to the statistical precision, in: Proceedings of the 38th International Conference on Machine Learning, 139 of Proceedings of Machine Learning Research, 2021, pp. 2398--2409.
E. Gladin, A. Sadiev, A. Gasnikov, P. Dvurechensky, A. Beznosikov, M. Alkousa, Solving smooth min-min and min-max problems by mixed oracle algorithms, in: Mathematical Optimization Theory and Operations Research: Recent Trends, A. Strekalovsky, Y. Kochetov, T. Gruzdeva, A. Orlov , eds., 1476 of Communications in Computer and Information Science book series (CCIS), Springer International Publishing, Basel, 2021, pp. 19--40, DOI 10.1007/978-3-030-86433-0_2 .
Abstract
In this paper, we consider two types of problems that have some similarity in their structure, namely, min-min problems and min-max saddle-point problems. Our approach is based on considering the outer minimization problem as a minimization problem with an inexact oracle. This inexact oracle is calculated via an inexact solution of the inner problem, which is either minimization or maximization problem. Our main assumption is that the available oracle is mixed: it is only possible to evaluate the gradient w.r.t. the outer block of variables which corresponds to the outer minimization problem, whereas for the inner problem, only zeroth-order oracle is available. To solve the inner problem, we use the accelerated gradient-free method with zeroth-order oracle. To solve the outer problem, we use either an inexact variant of Vaidya's cutting-plane method or a variant of the accelerated gradient method. As a result, we propose a framework that leads to non-asymptotic complexity bounds for both min-min and min-max problems. Moreover, we estimate separately the number of first- and zeroth-order oracle calls, which are sufficient to reach any desired accuracy.
S. Guminov, P. Dvurechensky, N. Tupitsa, A. Gasnikov, On a combination of alternating minimization and Nesterov's momentum, in: Proceedings of the 38th International Conference on Machine Learning, 139 of Proceedings of Machine Learning Research, 2021, pp. 3886--3898.
Abstract
Alternating minimization (AM) optimization algorithms have been known for a long time and are of importance in machine learning problems, among which we are mostly motivated by approximating optimal transport distances. AM algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly or cheaply with high accuracy. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. We introduce an accelerated alternating minimization method with a $1/k^2$ convergence rate, where $k$ is the iteration counter. This improves over known bound $1/k$ for general AM methods and for the Sinkhorn's algorithm. Moreover, our algorithm converges faster than gradient-type methods in practice as it is free of the choice of the step-size and is adaptive to the local smoothness of the problem. We show that the proposed method is primal-dual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm.
A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over time-varying graphs, in: 2021 IEEE 60th Annual Conference on Decision and Control (CDC), IEEE, 2021, pp. 3367--3373, DOI 10.1109/CDC45484.2021.9683400 .
A. Sadiev , A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zeroth-order algorithms for smooth saddle-point problems, in: Mathematical Optimization Theory and Operations Research: Recent Trends, A. Strekalovsky, Y. Kochetov, T. Gruzdeva, A. Orlov , eds., 1476 of Communications in Computer and Information Science book series (CCIS), Springer International Publishing, Basel, 2021, pp. 71--85, DOI 10.1007/978-3-030-86433-0_5 .
Abstract
Saddle-point problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order oracles, and estimate their convergence rate and its dependence on the dimension n of the variable. In particular, our analysis shows that in the case when the feasible set is a direct product of two simplices, our convergence rate for the stochastic term is only by a factor worse than for the first-order methods. Finally, we demonstrate the practical performance of our zeroth-order methods on practical problems.
N. Tupitsa, A. Gasnikov, P. Dvurechensky, S. Guminov, Strongly convex optimization for the dual formulation of optimal transport, in: Mathematical Optimization Theory and Operations Research, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., 1275 of Theoretical Computer Science and General Issues, Springer International Publishing AG, Cham, 2020, pp. 192--204, DOI 10.1007/978-3-030-58657-7_17 .
Abstract
In this paper we experimentally check a hypothesis, that dual problem to discrete entropy regularized optimal transport problem possesses strong convexity on a certain compact set. We present a numerical estimation technique of parameter of strong convexity and show that such an estimate increases the performance of an accelerated alternating minimization algorithm for strongly convex functions applied to the considered problem.
N. Tupitsa , P. Dvurechensky, A. Gasnikov, C.A. Uribe, Multimarginal optimal transport by accelerated alternating minimization, in: 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, 2020, pp. 6132--6137, DOI 10.1109/CDC42340.2020.9304010 .
D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, V. Spokoiny, On the line-search gradient methods for stochastic optimization, in: Proceedings of the 21th IFAC World Congress, R. Findeisen, S. Hirche, K. Janschek, M. Mönnigmann, eds., 53 of IFAC PapersOnLine, Elsevier, 2020, pp. 1715--1720, DOI 10.1016/j.ifacol.2020.12.2284 .
Abstract
In this paper we propose several adaptive gradient methods for stochastic optimization. Our methods are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.
P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in low-memory subgradient optimization, in: Numerical Nonsmooth Optimization, A.M. Bagirov, M. Gaudioso, N. Karmitsa, M.M. Mäkelä, S. Taheri, eds., Springer International Publishing, Cham, 2020, pp. 19--59, DOI 10.1007/978-3-030-34910-3_2 .
P. Dvurechensky, A. Gasnikov, S. Omelchenko, A. Tiurin, A stable alternative to Sinkhorn's algorithm for regularized optimal transport, in: Mathematical Optimization Theory and Operations Research. MOTOR 2020, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., Lecture Notes in Computer Science, Springer, Cham, 2020, pp. 406--423, DOI 10.1007/978-3-030-49988-4_28 .
P. Dvurechensky, P. Ostroukhov, K. Safin, S. Shtern, M. Staudigl, Self-concordant analysis of Frank--Wolfe algorithms, in: Proceedings of the 37th International Conference on Machine Learning, H. Daumé Iii, A. Singh, eds., 119 of Proceedings of Machine Learning Research (online), 2020, pp. 2814--2824.
Abstract
Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.
D. Dvinskikh, E. Gorbunov, A. Gasnikov, A. Dvurechensky, C.A. Uribe, On primal and dual approaches for distributed stochastic convex optimization over networks, in: 2019 IEEE 58th Conference on Decision and Control (CDC), IEEE Xplore, 2019, pp. 7435--7440, DOI 10.1109/CDC40024.2019.9029798 .
Abstract
We introduce a primal-dual stochastic gradient oracle method for distributed convex optimization problems over networks. We show that the proposed method is optimal in terms of communication steps. Additionally, we propose a new analysis method for the rate of convergence in terms of duality gap and probability of large deviations. This analysis is based on a new technique that allows to bound the distance between the iteration sequence and the optimal point. By the proper choice of batch size, we can guarantee that this distance equals (up to a constant) to the distance between the starting point and the solution.
P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C.A. Uribe, A. Nedić, Decentralize and randomize: Faster algorithm for Wasserstein barycenters, in: Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett, eds., Curran Associates, Inc., 2018, pp. 10760--10770.
L. Bogulubsky, P. Dvurechensky, A. Gasnikov, G. Gusev, Y. Nesterov, A. Raigorodskii, A. Tikhonov, M. Zhukovskii, Learning supervised PageRank with gradient-based and gradient-free optimization methods, in: Advances in Neural Information Processing Systems 29, D.D. Lee, M. Sugiyama, U.V. Luxburg, I. Guyon, R. Garnett, eds., Curran Associates, Inc., 2016, pp. 4907--4915.
A. Chernov, P. Dvurechensky, A. Gasnikov, Fast primal-dual gradient method for strongly convex minimization problems with linear constraints, in: Discrete Optimization and Operations Research -- 9th International Conference, DOOR 2016, Vladivostok, Russia, September 19--23, 2016, Proceedings, Y. Kochetov, M. Khachay, V. Beresnev, E. Nurminski, P. Pardalos, eds., 9869 of Theoretical Computer Science and General Issues, Springer International Publishing Switzerland, Cham, 2016, pp. 391--403.
M. Arias Chao, D.S. Lilley, P. Mathé, V. Schlosshauer, Calibration and uncertainty quantification of gas turbines performance models, in: ASME Turbo Expo 2015: Turbine Technical Conference and Exposition, Volume 7A: Structures and Dynamics, ASME and Alstom Technologie AG, 2015, pp. V07AT29A001--V07AT29A012.
G. Blanchard, N. Krämer, Kernel partial least squares is universally consistent, in: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010), Y.W. Teh, M. Titterington, eds., 9 of JMLR Workshop and Conference Proceedings, Journal of Machine Learning Research, Cambridge, MA, USA, 2010, pp. 57--64.

Preprints, Reports, Technical Reports

A. Mielke, Some notes on the Hellinger distance and various Fisher--Rao distances, Preprint no. 3222, WIAS, Berlin, 2025, DOI 10.20347/WIAS.PREPRINT.3222 .
Abstract, PDF (380 kByte)
These expository notes introduce the Hellinger distance on the set of all measures and the induced Fisher-Rao distances for subsets of measures, such as probability measures or Gaussian measures. The historical background is highlighted and the relations and the distinct features of the two distances are discussed. Moreover, we provide a dynamic characterization of absolutely continuous curves in the Hellinger spaces in terms of the growth equation, which replaces the continuity equation in the theory of optimal transport.
D. Sommer, R. Gruhlke, M. Kirstein, M. Eigel, C. Schillings, Generative modelling with tensor train approximations of Hamilton--Jacobi--Bellman equations, Preprint no. 3078, WIAS, Berlin, 2023, DOI 10.20347/WIAS.PREPRINT.3078 .
Abstract, PDF (2117 kByte)
Sampling from probability densities is a common challenge in fields such as Uncertainty Quantification (UQ) and Generative Modelling (GM). In GM in particular, the use of reverse-time diffusion processes depending on the log-densities of Ornstein-Uhlenbeck forward processes are a popular sampling tool. In [5] the authors point out that these log-densities can be obtained by solution of a Hamilton-Jacobi-Bellman (HJB) equation known from stochastic optimal control. While this HJB equation is usually treated with indirect methods such as policy iteration and unsuper-vised training of black-box architectures like Neural Networks, we propose instead to solve the HJB equation by direct time integration, using compressed polynomials represented in the Tensor Train (TT) format for spatial discretization. Crucially, this method is sample-free, agnostic to normalization constants and can avoid the curse of dimensionality due to the TT compression. We provide a complete derivation of the HJB equation?s action on Tensor Train polynomials and demonstrate the performance of the proposed time-step-, rank- and degree-adaptive integration method on a nonlinear sampling task in 20 dimensions.
M. Eigel, Ch. Miranda, J. Schütte, D. Sommer, Approximating Langevin Monte Carlo with ResNet-like neural network architectures, Preprint no. 3077, WIAS, Berlin, 2023, DOI 10.20347/WIAS.PREPRINT.3077 .
Abstract, PDF (795 kByte)
We sample from a given target distribution by constructing a neural network which maps samples from a simple reference, e.g. the standard normal distribution, to samples from the target. To that end, we propose using a neural network architecture inspired by the Langevin Monte Carlo (LMC) algorithm. Based on LMC perturbation results, we show approximation rates of the proposed architecture for smooth, log-concave target distributions measured in the Wasserstein-2 distance. The analysis heavily relies on the notion of sub-Gaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map.
R. Gruhlke, M. Eigel, Low-rank Wasserstein polynomial chaos expansions in the framework of optimal transport, Preprint no. 2927, WIAS, Berlin, 2022, DOI 10.20347/WIAS.PREPRINT.2927 .
Abstract, PDF (10 MByte)
A unsupervised learning approach for the computation of an explicit functional representation of a random vector Y is presented, which only relies on a finite set of samples with unknown distribution. Motivated by recent advances with computational optimal transport for estimating Wasserstein distances, we develop a new Wasserstein multi-element polynomial chaos expansion (WPCE). It relies on the minimization of a regularized empirical Wasserstein metric known as debiased Sinkhorn divergence.
As a requirement for an efficient polynomial basis expansion, a suitable (minimal) stochastic coordinate system X has to be determined with the aim to identify ideally independent random variables. This approach generalizes representations through diffeomorphic transport maps to the case of non-continuous and non-injective model classes M with different input and output dimension, yielding the relation Y=M(X) in distribution. Moreover, since the used PCE grows exponentially in the number of random coordinates of X, we introduce an appropriate low-rank format given as stacks of tensor trains, which alleviates the curse of dimensionality, leading to only linear dependence on the input dimension. By the choice of the model class M and the smooth loss function, higher order optimization schemes become possible. It is shown that the relaxation to a discontinuous model class is necessary to explain multimodal distributions. Moreover, the proposed framework is applied to a numerical upscaling task, considering a computationally challenging microscopic random non-periodic composite material. This leads to tractable effective macroscopic random field in adopted stochastic coordinates.
A. Andresen, V. Spokoiny, Critical dimension in profile semiparametric estimation, Preprint no. 1776, WIAS, Berlin, 2013, DOI 10.20347/WIAS.PREPRINT.1776 .
Abstract, PDF (358 kByte)
This paper revisits the classical inference results for profile quasi maximum likelihood estimators (profile MLE) in the semiparametric estimation problem. We mainly focus on two prominent theorems: the Wilks phenomenon and Fisher expansion for the profile MLE are stated in a new fashion allowing finite samples and model misspecification. The method of study is also essentially different from the usual analysis of the semiparametric problem based on the notion of the hardest parametric submodel. Instead we apply the local bracketing and the upper function devices from Spokoiny (2011). This novel approach particularly allows to address the important issue of the effective target and nuisance dimension and it does not involve any pilot estimator of the target parameter. The obtained nonasymptotic results are surprisingly sharp and yield the classical asymptotic statements including the asymptotic normality and efficiency of the profile MLE. The general results are specified to the important special cases of an i.i.d. sample.
D. Belomestny, Spectral estimation of the fractional order of a Lévy process, Preprint no. 1491, WIAS, Berlin, 2010, DOI 10.20347/WIAS.PREPRINT.1491 .
Abstract, Postscript (2247 kByte), PDF (347 kByte)
We consider the problem of estimating the fractional order of a Levy process from low frequency historical and options data. An estimation methodology is developed which allows us to treat both estimation and calibration problems in a unified way. The corresponding procedure consists of two steps: the estimation of a conditional characteristic function and the weighted least squares estimation of the fractional order in spectral domain. While the second step is identical for both calibration and estimation, the first one depends on the problem at hand. Minimax rates of convergence for the fractional order estimate are derived, the asymptotic normality is proved and a data-driven algorithm based on aggregation is proposed. The performance of the estimator in both estimation and calibration setups is illustrated by a simulation study.

Talks, Poster

O. Butkovsky, Mini-course: Stochastic sewing with applications (Part 1), PDF Seminar, Beijing Institute of Technology, School of Mathematics and Statistics, China, July 1, 2025.
O. Butkovsky, Mini-course: Stochastic sewing with applications (Part 2), PDF Seminar, Beijing Institute of Technology, School of Mathematics and Statistics, China, July 2, 2025.
O. Butkovsky, Weak uniqueness for singular stochastic equations driven by fractional Brownian motion, PDF Seminar, Beijing Institute of Technology, School of Mathematics and Statistics, China, July 3, 2025.
J. Schütte, Approximating Langevin Monte Carlo with ResNet-like neural network architectures, 94th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM 2024), Session 25.02 ``Various Topics in Computational and Mathematical Methods in Datascience'', March 18 - 22, 2024, Otto-von-Guericke-Universität Magdeburg, March 20, 2024.
D. Sommer, Approximating Langevin Monte Carlo with ResNet-like neural network architecture (online talk), University of Tokyo, Graduate School of Mathematical Sciences, Japan, March 13, 2024.
D. Sommer, Approximating Langevin Monte Carlo with ResNet-like neural network architectures, SIAM Conference on Uncertainty Quantification (UQ24), Minisymposium MS25 ``Nonlinear Approximation of High-Dimensional Functions: Compositional, Low-Rank and Sparse Structures'', February 27 - March 1, 2024, Savoia Excelsior Palace Trieste and Stazione Marittima, Italy, February 27, 2024.
M. Eigel, Generative modelling with tensor compressed HJB approximations, 11th International Conference ``Inverse Problems: Modeling and Simulation'' (IPMS 2024), Minisymposium M7 ``Bayesian, Variational, and Optimization Techniques for Inverse Problems in Stochastic PDEs'', May 26 - June 1, 2024, Paradise-Bay Hotel, Cirkewwa, Malta, May 31, 2024.
M. Eigel, Topology optimisation under uncertainties, modern tensor compression & empirical SFEM, UQ Colloquium, Bosch Research Campus, Renningen, October 11, 2024.
O. Klein, A model for a magneto mechanical device: Forward and inverse uncertainty quantization, Leibniz MMS Days 2024, Kaiserslautern, April 10 - 12, 2024.
O. Klein, On a model for a magneto mechanical device: Forward and inverse uncertainty quantification, 2nd Workshop of the MATH+ Thematic Einstein Semester ``Mathematics of Small Data Analysis'', Berlin, January 17 - 19, 2024.
L. Schmeller, Gel models for phase separation at finite strains, 93rd Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), May 29 - June 2, 2023, Technische Universität Dresden, June 2, 2023.
V. Aksenov, Simulation of Wasserstein gradient flows with low-rank tensor methods for sampling, 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), Minisymposium 00837 ``Particle Methods for Bayesian Inference'', August 20 - 25, 2023, Waseda University, Tokyo, Japan, August 24, 2023.
A. Selahi, Recovering battery ageing dynamics with invertible neuronal networks, 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), August 20 - 25, 2023, Waseda University, Tokyo, Japan, August 23, 2023.
A. Selahi, Recovery of battery ageing dynamics using Bayesian inference, 93rd Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), May 29 - June 2, 2023, Technische Universität Dresden, June 1, 2023.
D. Sommer, Computing log-densities of reverse-time diffusion processes through Hamilton--Jacobi--Bellman equations (online talk), University of Tokyo, Graduate School of Mathematical Sciences, Japan, November 30, 2023.
D. Sommer, Less interaction with forward models in Langevin dynamics, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, France, June 27, 2023.
M. Eigel, Accelerated interacting particle systems with low-rank tensor compression for Bayesian inversion, 5th International Conference on Uncertainty Quantification in Computational Science and Engineering (UNCECOMP 2023), MS9 ``UQ and Data Assimilation with Sparse, Low-rank Tensor, and Machine Learning Methods'', June 12 - 14, 2023, Athens, Greece, June 14, 2023.
M. Eigel, Accelerated interacting particle transport for Bayesian inversion, 10th International Congress on Industrial and Applied Mathematics (ICIAM 2023), Minisymposium 00966 ``Theoretical and Computational Advances in Measure Transport'', August 20 - 25, 2023, Waseda University, Tokyo, Japan, August 21, 2023.
A. Selahi, Homogenization in PDE-based battery models and recovery of ageing dynamics via Bayesian inference (online talk), Second Conference of Young Applied Mathematicians (Hybrid Event), September 18 - 22, 2022, Arenzano, Italy, September 21, 2022.
P. Dvurechensky, Accelerated alternating minimization methods, 20th French German Portuguese Conference in Optimization, May 3 - 6, 2022, University of Porto, School of Economics and Management, Portugal, May 5, 2022.
P. Dvurechensky, Generalized self-concordant analysis of Frank--Wolfe algorithms, 19th Workshop on Advances in Continuous Optimization (EUROPT 2022), July 29 - 30, 2022, NOVA University Lisbon, School of Science and Technology, Portugal, July 30, 2022.
P. Dvurechensky, Multimarginal optimal transport by accelerated alternating minimization (online talk), SIAM Conference on Imaging Science (IS22) (Online Event), Minisymposium MS94: ``Multi-Marginal Optimal Transport'', March 21 - 25, 2022, March 24, 2022.
O. Klein, On forward and inverse uncertainty quantification for a model for a magneto mechanical device involving a hysteresis operator (joint work with Carmine Stefano Clemente and Daniele Davino, Universitá degli Studi di Sannio, Italy), MURPHYS 2022 -- Interdisciplinary Conference on Multiple Scale Systems, Systems with Hysteresis, May 29 - June 3, 2022, Silesian University, Ostravice, Czech Republic, May 31, 2022.
M. Landstorfer, A. Selahi, M. Heida, M. Eigel, Recovery of battery ageing dynamics with multiple timescales, MATH+-Day 2022, Technische Universität Berlin, November 18, 2022.
V. Spokoiny, Bayesian inference for nonlinear inverse problems, SFB 1294 Annual Meeting 2022, September 13 - 14, 2022, Universität Potsdam, Institut für Mathematik, September 13, 2022.
V. Spokoiny, Bayesian optimization by Laplace iterations, Workshop on Statistical Inference and Convex Optimization, June 13 - 15, 2022, Université Grenoble Alpes, Laboratoire Jean Kuntzmann, France, June 13, 2022.
P. Dvurechensky, Accelerated gradient methods and their applications to Wasserstein barycenter problem (online talk), The XIII International Scientific Conference and Young Scientist School ``Theory and Numerics of Inverse and Ill-posed Problems'' (Online Event), April 12 - 22, 2021, Mathematical Center in Akademgorodok, Novosibirsk, Russian Federation, April 14, 2021.
P. Dvurechensky, Newton method over networks is fast up to the statistical precision (online talk), Thirty-eighth International Conference on Machine Learning (Online Event), July 18 - 24, 2021, Carnegie Mellon University, Pittsburgh, USA, July 20, 2021.
P. Dvurechensky, On a combination of alternating minimization and Nesterov's momentum (online talk), Thirty-eighth International Conference on Machine Learning (Online Event), July 18 - 24, 2021, Carnegie Mellon University, Pittsburgh, USA, July 20, 2021.
P. Dvurechensky, Primal-dual accelerated gradient methods with alternating minimization (online talk), Conference Optimization without Borders, July 12 - 18, 2021, Sirius University of Science and Technology, Sochi, Russian Federation, July 15, 2021.
P. Dvurechensky, Wasserstein barycenters from the computational perspective (online talk), Moscow Conference on Combinatorics and Applications (Online Event), May 31 - June 4, 2021, Moscow Institute of Physics and Technology, School of Applied Mathematics and Computer Science, Moscow, Russian Federation, June 2, 2021.
M. Hintermüller, Mathematics of quantitative MRI (online talk), The 5th International Symposium on Image Computing and Digital Medicine (ISICDM 2021), December 17 - 20, 2021, Guilin, China, December 18, 2021.
M. Hintermüller, Optimization with learning-informed differential equation constraints and its applications (online talk), Deep Learning and Inverse Problems (MDLW02), September 27 - October 1, 2021, Isaac Newton Institute for Mathematical Sciences (Hybrid Event), Oxford, UK, October 1, 2021.
M. Hintermüller, Quantitative imaging: Physics integrated and machine learning based models in MRI (online talk), MATH-IMS Joint Applied Mathematics Colloquium Series, The Chinese University of Hong Kong, Center for Mathematical Artificial Intelligence, China, December 3, 2021.
K. Papafitsoros, A. Kofler, Classical vs. data driven regularization methods in imaging (online tutorial), MATH+ Thematic Einstein Semester on Mathematics of Imaging in Real-Word Challenges, Berlin, October 29, 2021.
M. Hintermüller, A function space framework for structural total variation regularization with applications in inverse problems, 71st Workshop: Advances in Nonsmooth Analysis and Optimization (NAO2019), June 25 - 30, 2019, International School of Mathematics ``Guido Stampacchia'', Erice, Italy, June 26, 2019.
M. Hintermüller, A function space framework for structural total variation regularization with applications in inverse problems, Thematic Programme ``Modern Maximal Monotone Operator Theory: From Nonsmooth Optimization to Differential Inclusions'', Workshop ``Nonsmooth and Variational Analysis'', January 28 - February 1, 2019, Erwin Schrödinger International Institute for Mathematics and Physics, Vienna, Austria, February 1, 2019.
M. Hintermüller, Applications in image processing, Workshop on Efficient Operator Splitting Techniques for Complex System and Large Scale Data Analysis, January 15 - 18, 2019, Sanya, China, January 14, 2019.
M. Hintermüller, Structural total variation regularization with applications in inverse problems, 90th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM), DFG Priority Programme 1962 ``Non Smooth and Complementarity-Based Distributed Parameter Systems: Simulation and Hierarchical Optimization'', February 18 - 22, 2019, Technische Universität Wien, Austria, February 19, 2019.
M. Marschall, Adaptive low-rank approximation in Bayesian inverse problems, 3rd International Conference on Uncertainty Quantification in Computational Sciences and Engineering (UNCECOMP 2019), Minisymposium 6--IV ``Uncertainty Computations with Reduced Order Models and Low-Rank Representations'', June 24 - 26, 2019, Crete, Greece, June 25, 2019.
M. Marschall, Complexity reduction in Bayesian inverse problems by low-rank tensor representation, Robert Bosch GmbH, Corporate Research -- Advanced Engineering Computer Vision Systems (CR/AEC4), Hildesheim, April 16, 2019.
M. Marschall, Low-rank surrogates in Bayesian inverse problems, 19th French-German-Swiss Conference on Optimization (FGS'2019), Minisymposium 1 ``Recent Trends in Nonlinear Optimization 1'', September 17 - 20, 2019, Nice, France, September 17, 2019.
M. Marschall, Random domains in PDE problems with low-rank surrogates. Forward and backward, Physikalisch-Technische Bundesanstalt, Arbeitsgruppe 8.41 ``Mathematische Modellierung und Datenanalyse'', Berlin, April 10, 2019.
P. Mathé, Relating direct and inverse Bayesian problems via the modulus of continuity, Stochastic Computation and Complexity (ibcparis2019), April 15 - 16, 2019, Institut Henri Poincaré, Paris, France, April 16, 2019.
P. Mathé, Relating direct and inverse problems via the modulus of continuity, The Chemnitz Symposium on Inverse Problems 2019, September 30 - October 2, 2019, Technische Universität Chemnitz, Fakultät für Mathematik, Frankfurt a. M., October 1, 2019.
P. Mathé, The role of the modulus of continuity in inverse problems, Forschungsseminar Inverse Probleme, Technische Universität Chemnitz, Fachbereich Mathematik, August 13, 2019.
M. Eigel, Adaptive Galerkin FEM for stochastic forward and inverse problems, Optimisation and Numerical Analysis Seminars, University of Birmingham, School of Mathematics, UK, February 15, 2018.
M. Eigel, Adaptive tensor methods for forward and inverse problems, SIAM Conference on Uncertainty Quantification (UQ18), Minisymposium 122 ``Low-Rank Approximations for the Forward- and the Inverse Problems III'', April 16 - 19, 2018, Garden Grove, USA, April 19, 2018.
M. Hintermüller, Automated regularization parameter choice rule in image processing, Workshop ``New Directions in Stochastic Optimisation'', August 19 - 25, 2018, Mathematisches Forschungsinstitut Oberwolfach, August 23, 2018.
M. Hintermüller, Bilevel optimisation in automated regularisation parameter selection in image processing, WIAS--PGMO Workshop on Nonsmooth and Stochastic Optimization, June 26, 2018, Humboldt-Universität zu Berlin, June 26, 2018.
M. Hintermüller, Bilevel optimization and some "parameter learning" applications in image processing, SIAM Conference on Imaging Science, Minisymposium MS5 ``Learning and Adaptive Approaches in Image Processing'', June 5 - 8, 2018, Bologna, Italy, June 5, 2018.
M. Marschall, Bayesian inversion with adaptive low-rank approximation, Analysis, Control and Inverse Problems for PDEs -- Workshop of the French-German-Italian LIA (Laboratoire International Associe) COPDESC on Applied Analysis, November 26 - 30, 2018, University of Naples Federico II and Accademia Pontaniana, Italy, November 29, 2018.
P. Mathé, Bayesian inverse problems with non-commuting operators, Chemnitz September of Applied Mathematics 2018, Chemnitz Symposium on Inverse Problems, September 27 - 28, 2018, Technische Universität Chemnitz, Fakultät für Mathematik, September 28, 2018.
P. Mathé, Complexity of linear ill-posed problems in Hilbert space, Stochastisches Kolloquium, Georg-August-Universität Göttingen, Institut für Mathematische Stochastik, February 7, 2018.
M. Eigel, Adaptive stochastic FE for explicit Bayesian inversion with hierarchical tensor representations, Institut National de Recherche en Informatique et en Automatique (INRIA), SERENA (Simulation for the Environment: Reliable and Efficient Numerical Algorithms) research team, Paris, France, June 1, 2017.
R. Gruhlke, Multi-scale failure analysis with polymorphic uncertainties for optimal design of rotor blades, Frontiers of Uncertainty Quantification in Engineering (FrontUQ 2017), September 6 - 8, 2017, München, September 6, 2017.
M. Hintermüller, Bilevel optimization and applications in imaging, Workshop ``Emerging Developments in Interfaces and Free Boundaries'', January 22 - 28, 2017, Mathematisches Forschungsinstitut Oberwolfach.
M. Hintermüller, Bilevel optimization and applications in imaging, Mathematisches Kolloquium, Universität Wien, Austria, January 18, 2017.
M. Hintermüller, Bilevel optimization and some ``parameter learning'' applications in image processing, LMS Workshop ``Variational Methods Meet Machine Learning'', September 18, 2017, University of Cambridge, Centre for Mathematical Sciences, UK, September 18, 2017.
A. Koziuk, Bootstrap for the regression problem with instrumental variables, Haindorf Seminar 2017, January 24 - 28, 2017, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, Hejnice, Czech Republic, January 26, 2017.
M. Marschall, Bayesian inversion using hierarchical tensors, 88th Annual Meeting of the International Association of Applied Mathematics and Mechanics (GAMM 2017), Section S15 ``Uncertainty Quantification'', March 6 - 10, 2017, Bauhaus Universität Weimar/Technische Universität Ilmenau, Weimar, March 8, 2017.
M. Marschall, Sampling-free Bayesian inversion with adaptive hierarchical tensor representation, Frontiers of Uncertainty Quantification in Engineering (FrontUQ 2017), September 6 - 8, 2017, München, September 7, 2017.
M. Marschall, Sampling-free Bayesian inversion with adaptive hierarchical tensor representation, International Conference on Scientific Computation and Differential Equations (SciCADE2017), MS21 ``Tensor Approximations of Multi-Dimensional PDEs'', September 11 - 15, 2017, University of Bath, UK, September 14, 2017.
P. Mathé, Bayesian inverse problems with non-commuting operators, Statistical Foundations of Uncertainty Quantification for Inverse Problems Workshop, June 19 - 22, 2017, University of Cambridge, Center for Mathematical Sciences, UK, June 21, 2017.
P. Mathé, Complexity of supervised learning, ibc-paris2017 : Information Based Complexity, High-Dimensional Problems, March 14 - 15, 2017, Institut Henri Poincaré, Paris, France, March 15, 2017.
P. Mathé, Numerical integration (mini course), November 20 - December 4, 2017, Fudan University, School of Mathematical Sciences, China.
N. Buzun, Multiplier bootstrap for change point detection, Mathematical Statistics and Inverse Problems, February 8 - 12, 2016, Faculté des Sciences de Luminy, France, February 11, 2016.
N. Buzun, Multiplier bootstrap for change point detection, Spring School ``Structural Inference 2016", March 13 - 18, 2016, DFG Forschergruppe FOR 1735, Lübeck, Germany, March 14, 2016.
M. Eigel, Bayesian inversion using hierarchical tensor approximations, SIAM Conference on Uncertainty Quantification, Minisymposium 67 ``Bayesian Inversion and Low-rank Approximation (Part II)'', April 5 - 8, 2016, Lausanne, Switzerland, April 6, 2016.
T. Wu, Bilevel optimization and applications in imaging sciences, August 24 - 25, 2016, Shanghai Jiao Tong University, Institute of Natural Sciences, China.
P. Dvurechensky, Gradient and gradient-free methods for pagerank algorithm learning, Workshop on Modern Statistics and Optimization, February 23 - 24, 2016, Russian Academy of Sciences, Institute for Information Transmission Problems, Moscow, Russian Federation, February 24, 2016.
P. Dvurechensky, Random gradient-free methods for web-page ranking model learning, 30th annual conference of the Belgian Operational Research Society, January 27 - 29, 2016, Louvain-la-Neuve, Belgium, January 28, 2016.
M. Hintermüller, K. Papafitsoros, C. Rautenberg, A fine scale analysis of spatially adapted total variation regularisation, Imaging, Vision and Learning based on Optimization and PDEs, Bergen, Norway, August 29 - September 1, 2016.
M. Hintermüller, Bilevel optimization and applications in imaging, Imaging, Vision and Learning based on Optimization and PDEs, August 29 - September 1, 2016, Bergen, Norway, August 30, 2016.
M. Hintermüller, Bilevel optimization for a generalized total-variation model, SIAM Conference on Imaging Science, Minisymposium ``Non-Convex Regularization Methods in Image Restoration'', May 23 - 26, 2016, Albuquerque, USA, May 26, 2016.
M. Hintermüller, Optimal selection of the regularisation function in a localised TV model, SIAM Conference on Imaging Science, Minisymposium ``Analysis and Parameterisation of Derivative Based Regularisation'', May 23 - 26, 2016, Albuquerque, USA, May 24, 2016.
P. Mathé, Complexity of linear ill-posed problems in Hilbert space, IBC on the 70th anniversary of Henryk Wozniakowski, August 29 - September 2, 2016, Banach Center, Bedlewo, Poland, August 31, 2016.
P. Mathé, Complexity of linear ill-posed problems in Hilbert space, Chemnitz Symposium on Inverse Problems, September 22 - 23, 2016, Technische Universität Chemnitz, Fakultät für Mathematik, September 22, 2016.
P. Mathé, Discrepancy based model selection in statistical inverse problems, Mathematical Statistics and Inverse Problems, February 8 - 12, 2016, Centre International de Rencontres Mathématiques (CIRM), Luminy, France, February 11, 2016.
V. Spokoiny, Gradient and gradient-free methods for pagerank algorithm learning, Workshop on Modern Statistics and Optimization, February 23 - 24, 2016, Russian Academy of Sciences, Institute for Information Transmission Problems, Moscow, Russian Federation.
P. Dvurechensky, Semi-supervised pagerank model learning with gradient-free optimization methods, Traditional Youth School ``Control, Information and Optimization'', June 14 - 20, 2015, Moscow, Russian Federation, June 17, 2015.
P. Mathé, Minimax signal detection in statistical inverse problems, Algorithms and Complexity for Continuous Problems, September 21 - 25, 2015, Schloss Dagstuhl, September 25, 2015.
P. Mathé, A random surfer in the internet, German-Estonian Academic Week ``Academica'', October 13 - 15, 2014, University of Tartu, Faculty of Mathematics and Computer Science, Estonia, October 14, 2014.
P. Mathé, Bayesian analysis of statistical inverse problems, Colloquium of the Faculty of Mathematics and Computer Science, University of Tartu, Estonia, October 13, 2014.
P. Mathé, Bayesian regularization of statistical inverse problems, Rencontres de Statistique Mathématique: Nouvelles Procédures pour Nouvelles Données, December 15 - 19, 2014, Centre International de Rencontres Mathématiques (CIRM), Marseille, France, December 17, 2014.
P. Mathé, Convergence of Bayesian schemes in inverse problems, Kolloquium über Angewandte Mathematik, Georg-August-Universität Göttingen, Fakultät für Mathematik und Informatik, June 10, 2014.
P. Mathé, Merging regularization theory into Bayesian inverse problems, Chemnitz Symposium on Inverse Problems, September 15 - 19, 2014, Universität Chemnitz, Fachbereich Mathematik, September 18, 2014.
P. Mathé, Oracle-type posterior contraction rates in Bayesian inverse problems, International Workshop ``Advances in Optimization and Statistics'', May 15 - 16, 2014, Russian Academy of Sciences, Institute of Information Transmission Problems (Kharkevich Institute), Moscow, May 16, 2014.
P. Mathé, Parameter estimation by thresholding, Mathematical Modelling and Simulation Workshop, WIAS Berlin, April 15, 2014.
P. Mathé, Smoothness beyond differentiability, Seminar for Doctoral Candidates, University of Tartu, Faculty of Mathematics and Computer Science, Estonia, October 15, 2014.
A. Andresen, Finite sample analysis of maximum likelihood estimators and convergence of the alternating procedure, 29th European Meeting of Statisticians (EMS), July 20 - 25, 2013, Eötvös Loránd University, Budapest, Hungary, July 20, 2013.
H. Mai, Estimating a subordinators density, DynStoch 2013, April 17 - 19, 2013, University of Copenhagen, Department of Mathematical Sciences, Denmark, April 17, 2013.
P. Mathé, Bayes analysis for model calibration, Alstom Workshop, WIAS, May 7, 2013.
P. Mathé, Signal detection in inverse problems, PreMoLab Workshop on: Advances in predictive modeling and optimization, May 16 - 17, 2013, WIAS-Berlin, May 16, 2013.
P. Mathé, Signal detection in inverse problems, Mathematical Modelling and Analysis ( MMA2013 ) and Approximation Methods and Orthogonal Expansions ( AMOE2013 ), May 27 - 31, 2013, University of Tartu, Institute of Mathematics, Estonia, May 29, 2013.
P. Mathé, Statistical Inverse Problems, Applied Math Seminar, University of Warwick, Mathematics Institute, Coventry, UK, October 18, 2013.
S. Becker, Image processing via orientation scores, Workshop ``Computational Inverse Problems'', October 23 - 26, 2012, Mathematisches Forschungsinstitut Oberwolfach, October 25, 2012.
A. Andresen, Non asymptotic Wilks phenomenon in semiparametric estimation, PreMoLab: Moscow-Berlin Stochastic and Predictive Modeling, May 31 - June 1, 2012, Russian Academy of Sciences, Institute for Information Transmission Problems (Kharkevich Institute), Moscow, May 31, 2012.
A. Andresen, Nonasymptotic Wilks phenomenon in semiparametric estimation, 2. Structural Inference Day, WIAS, April 23, 2012.
A. Andresen, Nonasymptotic Wilks phenomenon in semiparametric estimation, Haindorf Seminar 2012 (Klausurtagung des SFB 649), February 9 - 12, 2012, Humboldt-Universität zu Berlin, Wirtschaftswissenschaftliche Fakultät, Hejnice, Czech Republic, February 10, 2012.
P. Mathé, An oracle-type bound for a statistical RG-rule, Nonparametric and High-dimensional Statistics, December 17 - 21, 2012, Centre International de Rencontres Mathématiques (CIRM), Marseille, France, December 20, 2012.
P. Mathé, Diskrepanz-basierte Parameterwahl in statistischen inversen Problemen, Technische Universität Chemnitz, Fakultät für Mathematik, September 19, 2012.
P. Mathé, Regularization of statistical inverse problems in Hilbert space, Journées Statistiques du Sud 2012, June 20 - 22, 2012, Université Toulouse, Institut National des Sciences Appliquées, France, June 20, 2012.
P. Mathé, Statistische Datenanalyse unter BOP, Workshop on Simulation in Industrial Process Engineering, WIAS, September 6, 2012.
P. Mathé, Using the discrepancy principle in statistical inverse problems, Regularisation symposium, Australian National University, Mathematical Sciences Institute, Canberra, November 22, 2012.
V. Spokoiny, Basics of modern parametric statistics, February 13 - 28, 2012, Independent University of Moscow, Center for Continuous Mathematical Education, Russian Federation.
V. Spokoiny, Bernstein--von Mises theorem for quasi posteriors, International Workshop on Recent Advances in Time Series Analysis (RATS 2012), June 8 - 12, 2012, University of Cyprus, Department of Mathematics and Statistics, Protaras, June 9, 2012.
V. Spokoiny, Bernstein--von Mises theorem for quasi posteriors, Workshop ``Frontiers in Nonparametric Statistics'', March 11 - 17, 2012, Mathematisches Forschungsinstitut Oberwolfach, March 12, 2012.
V. Spokoiny, Bernstein--von Mises theorem for quasi posteriors, Workshop II on Financial Time Series Analysis: High-dimensionality, Non-stationarity and the Financial Crisis, June 19 - 22, 2012, National University of Singapore, Institute for Mathematical Sciences, June 21, 2012.
V. Spokoiny, Bernstein--von Mises theorem for quasi posteriors, Workshop on Recent Developments in Statistical Multiscale Methods, July 16 - 18, 2012, Georg-August-Universität Göttingen, Institut für Mathematische Stochastik, July 17, 2012.
V. Spokoiny, Bernstein--von Mises theorem for quasi posteriors, PreMoLab Seminar, Russian Academy of Sciences, Institute for Information Transmission Problems (Kharkevich Institute), Moscow, March 15, 2012.
V. Spokoiny, Parametric estimation: Modern view, PreMoDay I, February 24, 2012, Russian Academy of Sciences, Institute for Information Transmission Problems (Kharkevich Institute), Moscow, February 24, 2012.
V. Spokoiny, Some methods of modern statistics, Information Technology and Systems 2, August 19 - 25, 2012, Russian Academy of Sciences, Institute for Information Transmission Problems (Kharkevich Institute), Petrozavodsk, August 20, 2012.
P. Mathé, Conjugate gradient iteration with noisy data, Foundations of Computational Mathematics (FoCM'11), July 4 - 14, 2011, Budapest, Hungary, July 6, 2011.
V. Spokoiny, Alternating and semiparametric efficiency, École Nationale de la Statistique et de l'Analyse de l'Information (ENSAI), Rennes, France, September 16, 2011.
V. Spokoiny, Modern parametric theory, September 13 - 16, 2011, École Nationale de la Statistique et de l'Analyse de l'Information (ENSAI), Rennes, France.
P. Mathé, Conjugate gradient iteration under white noise, International Conference on Scientific Computing (SC2011), October 10 - 14, 2011, Università di Cagliari, Dipartimento di Matematica e Informatica, Cagliari, Italy, October 14, 2011.
V. Spokoiny, Semiparametric estimation alternating and efficiency, École Nationale de la Statistique et de l'Administration Économique (ENSAE), Paris, France, December 12, 2011.
M. Becker, Self-intersection local times: Exponential moments in subcritical dimensions, Excess Self-Intersections and Related Topics, December 6 - 10, 2010, Centre international de rencontres mathématiques (CIRM), Luminy, France, December 6, 2010.
S. Becker, Regularization of statistical inverse problems and the Bakushinskii veto, Rencontres de Statistiques Mathématiques 10, December 13 - 17, 2010, Centre International de Rencontres Mathématiques (CIRM), Luminy, France, December 17, 2010.
N. Krämer, Conjugate gradient regularization --- A statistical framework for partial least squares regression, 4th Workshop on Partial Least Squares and Related Methods for Cutting-edge Research in Experimental Sciences, May 10 - 11, 2010, École Supérieure d'Électricité, Department of Information Systems and Decision Sciences, Gif-sur-Yvette, France, May 11, 2010.
D. Belomestny, Estimating the distribution of jumps in regular affine models: Uniform rates of convergence, Leipziger Stochastik-Tage, March 1 - 5, 2010, Universität Leipzig, Fakultät für Mathematik und Informatik, March 2, 2010.
D. Belomestny, Statistical inference for multidimensional timechanged Levy processes based on low--frequency data, 28th European Meeting of Statisticians, August 19 - 23, 2010, University of Piraeus, Department of Statistics and Insurance Science, Greece, August 23, 2010.
G. Blanchard, N. Krämer, Kernel partial least squares is universally consistent, AI & Statistics 2010, Sardinia, Italy, May 13 - 15, 2010.
G. Blanchard, N. Krämer, Optimal rates for conjugate gradient regularization, AI & Statistics 2010, Sardinia, Italy, May 13 - 15, 2010.
P. Mathé, Analysis of inverse problems under general smoothness assumptions, 5th International Conference on Inverse Problems: Modeling and Simulation, May 24 - 29, 2010, Izmir University, Department of Mathematics and Computer Sciences, Antalya, Turkey, May 25, 2010.
P. Mathé, Regularization under general noise assumptions, Chemnitz Symposium on Inverse Problems 2010, September 23 - 24, 2010, Chemnitz University of Technology, Department of Mathematics, September 23, 2010.
P. Mathé, Warum und wie rechnen wir mit allgemeinen Quelldarstellungen?, Seminar Inverse Probleme (Fr. C. Böckmann), Universität Potsdam, Institut für Mathematik, January 12, 2010.
D. Belomestny, Estimation of the jump activity of a Lévy process from low frequency data, Haindorf Seminar 2009, February 12 - 15, 2009, Humboldt-Universität zu Berlin, CASE -- Center for Applied Statistics and Economics, Hejnice, Czech Republic, February 12, 2009.
D. Belomestny, Spectral estimation of the fractional order of a Lévy process, Workshop ``Statistical Inference for Lévy Processes with Applications to Finance'', July 15 - 17, 2009, EURANDOM, Eindhoven, Netherlands, July 16, 2009.
G. Blanchard, Convergence du gradient conjugué fonctionnel pour l'apprentissage statistique, École Normale Supérieure, Paris, France, March 16, 2009.
P. Mathé, Approximation theoretic aspects in variable Hilbert scales, 2nd Dolomites Workshop on Constructive Approximation and Applications, September 3 - 10, 2009, Università degli Studi di Verona, Dipartimento di Informatica, Italy, September 5, 2009.
P. Mathé, Discretization under general smoothness assumptions, Monday Lecture Series, Johann Radon Institute for Computational and Applied Mathematics (RICAM), Linz, Austria, June 29, 2009.
P. Mathé, Inverse problems in different settings, Seminar ``Algorithms and Complexity for Continuous Problems'', September 20 - 25, 2009, Schloss Dagstuhl, September 25, 2009.
P. Mathé, On some minimization-based heuristic parameter choice in inverse problems, Chemnitz-RICAM Symposium on Inverse Problems, Johann Radon Institute for Computational and Applied Mathematics (RICAM), Linz, Austria, July 14, 2009.
P. Mathé, Typical behavior in heuristic parameter choice, Kick-Off Meeting of Mini Special Semester on Inverse Problems, May 18 - 22, 2009, Johann Radon Institute for Computational and Applied Mathematics (RICAM), Linz, Austria, May 18, 2009.
V. Spokoiny, Saddle point model selection for inverse problems, Workshop ``Challenges in Statistical Theory: Complex Data Structures and Algorithmic Optimization'', August 23 - 29, 2009, Mathematisches Forschungsinstitut Oberwolfach, August 26, 2009.
P. Mathé, Average case analysis of inverse problems, Symposium on Inverse Problems 2008, September 25 - 26, 2008, TU Chemnitz, Fachbereich Mathematik, September 25, 2008.

External Preprints

D. Pasechniuk, P. Dvurechensky, C.A. Uribe, A. Gasnikov, Decentralised convex optimisation with probability-proportional-to-size quantization, Preprint no. arXiv:2501.18312, Cornell University, , DOI 10.48550/arXiv.2501.18312 .
A. Vasin, V. Krivchenko, D. Kovalev, F. Stonyakin, N. Tupitsa, P. Dvurechensky, M. Alkousa, N. Kornilov, A. Gasnikov, On solving minimization and min-max problems by first-order methods with relative error in gradients, Preprint no. arXiv:2503.06628, Cornell University, , DOI 10.48550/arXiv.2503.06628 .
P. Dvurechensky, M. Staudigl, Barrier algorithms for constrained non-convex optimization, Preprint no. arXiv:2404.18724, Cornell University, 2024, DOI 10.48550/arXiv.2404.18724 .
O. Yufereva, M. Persiianov, P. Dvurechensky, A. Gasnikov, D. Kovalev, Decentralized convex optimization on time-varying networks with application to Wasserstein barycenters, Preprint no. arXiv:2205.15669, Cornell University, 2023, DOI 10.48550/arXiv.2205.15669 .
E. Gladin, A. Gasnikov, P. Dvurechensky, Accuracy certificates for convex minimization with inexact Oracle, Preprint no. arXiv:2310.00523, Cornell University, 2023, DOI 10.48550/arXiv.2310.00523 .
Abstract
Accuracy certificates for convex minimization problems allow for online verification of the accuracy of approximate solutions and provide a theoretically valid online stopping criterion. When solving the Lagrange dual problem, accuracy certificates produce a simple way to recover an approximate primal solution and estimate its accuracy. In this paper, we generalize accuracy certificates for the setting of inexact first-order oracle, including the setting of primal and Lagrange dual pair of problems. We further propose an explicit way to construct accuracy certificates for a large class of cutting plane methods based on polytopes. As a by-product, we show that the considered cutting plane methods can be efficiently used with a noisy oracle even thought they were originally designed to be equipped with an exact oracle. Finally, we illustrate the work of the proposed certificates in the numerical experiments highlighting that our certificates provide a tight upper bound on the objective residual.
N. Kornilov, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, Gradient free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact, Preprint no. arXiv:2304.02442, Cornell University, 2023, DOI 10.48550/arXiv.2304.02442 .
N. Kornilov, E. Gorbunov, M. Alkousa, F. Stonyakin, P. Dvurechensky, A. Gasnikov, Intermediate gradient methods with relative inexactness, Preprint no. arXiv:2310.00506, Cornell University, 2023, DOI 10.48550/arXiv.2310.00506 .
Abstract
This paper is devoted to first-order algorithms for smooth convex optimization with inexact gradi- ents. Unlike the majority of the literature on this topic, we consider the setting of relative rather than absolute inexactness. More precisely, we assume that an additive error in the gradient is propor- tional to the gradient norm, rather than being globally bounded by some small quantity. We propose a novel analysis of the accelerated gradient method under relative inexactness and strong convex- ity and improve the bound on the maximum admissible error that preserves the linear convergence of the algorithm. In other words, we analyze how robust is the accelerated gradient method to the relative inexactness of the gradient information. Moreover, based on the Performance Estimation Problem (PEP) technique, we show that the obtained result is optimal for the family of accelerated algorithms we consider. Motivated by the existing intermediate methods with absolute error, i.e., the methods with convergence rates that interpolate between slower but more robust non-accelerated algorithms and faster, but less robust accelerated algorithms, we propose an adaptive variant of the intermediate gradient method with relative error in the gradient.
D.A. Pasechnyuk, M. Persiianov, P. Dvurechensky, A. Gasnikov, Algorithms for Euclidean-regularised optimal transport, Preprint no. arXiv:2307.00321, Cornell University, 2023, DOI 10.48550/arXiv.2307.00321 .
M. Alkousa, A. Gasnikov, P. Dvurechensky, A. Sadiev, L. Razouk, An approach for non-convex uniformly concave structured saddle point problem, Preprint no. arXiv:2202.06376, Cornell University, 2022, DOI 10.48550/arXiv.2202.06376 .
Abstract
Recently, saddle point problems have received much attention due to their powerful modeling capability for a lot of problems from diverse domains. Applications of these problems occur in many applied areas, such as robust optimization, distributed optimization, game theory, and many applications in machine learning such as empirical risk minimization and generative adversarial networks training. Therefore, many researchers have actively worked on developing numerical methods for solving saddle point problems in many different settings. This paper is devoted to developing a numerical method for solving saddle point problems in the non-convex uniformly-concave setting. We study a general class of saddle point problems with composite structure and Hölder-continuous higher-order derivatives. To solve the problem under consideration, we propose an approach in which we reduce the problem to a combination of two auxiliary optimization problems separately for each group of variables, outer minimization problem w.r.t. primal variables, and inner maximization problem w.r.t the dual variables. For solving the outer minimization problem, we use the Adaptive Gradient Method, which is applicable for non-convex problems and also works with an inexact oracle that is generated by approximately solving the inner problem. For solving the inner maximization problem, we use the Restarted Unified Acceleration Framework, which is a framework that unifies the high-order acceleration methods for minimizing a convex function that has Hölder-continuous higher-order derivatives. Separate complexity bounds are provided for the number of calls to the first-order oracles for the outer minimization problem and higher-order oracles for the inner maximization problem. Moreover, the complexity of the whole proposed approach is then estimated.
A. Gasnikov, A. Novitskii, V. Novitskii, F. Abdukhakimov, D. Kamzolov, A. Beznosikov, M. Takáč, P. Dvurechensky, B. Gu, The power of first-order smooth optimization for black-box non-smooth problems, Preprint no. arXiv:2201.12289, Cornell University, 2022, DOI 10.48550/arXiv.2201.12289 .
Abstract
Gradient-free/zeroth-order methods for black-box convex optimization have been extensively studied in the last decade with the main focus on oracle calls complexity. In this paper, besides the oracle complexity, we focus also on iteration complexity, and propose a generic approach that, based on optimal first-order methods, allows to obtain in a black-box fashion new zeroth-order algorithms for non-smooth convex optimization problems. Our approach not only leads to optimal oracle complexity, but also allows to obtain iteration complexity similar to first-order methods, which, in turn, allows to exploit parallel computations to accelerate the convergence of our algorithms. We also elaborate on extensions for stochastic optimization problems, saddle-point problems, and distributed optimization.
P. Dvurechensky, S. Shtern, M. Staudigl, A conditional gradient homotopy method with applications to semidefinite programming, Preprint no. arXiv:2207.03101, Cornell University, 2022, DOI 10.48550/arXiv.2207.03101 .
A. Agafonov, P. Dvurechensky, G. Scutari, A. Gasnikov, D. Kamzolov, A. Lukashevich, A. Daneshmand, An accelerated second-order method for distributed stochastic optimization, Preprint no. arXiv:2103.14392, Cornell University Library, arXiv.org, 2021.
Abstract
We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication complexity bound for this setting and improves upon existing upper bound. We further exploit this algorithm to obtain convergence rate bounds for the original stochastic optimization problem and compare our bounds with the existing bounds in several regimes when the goal is to minimize the number of communication rounds and increase the parallelization by increasing the number of workers.
E. Gladin, A. Sadiev, A. Gasnikov, P. Dvurechensky, A. Beznosikov, M. Alkousa, Solving smooth min-min and min-max problems by mixed oracle algorithms, Preprint no. arXiv:2103.00434, Cornell University Library, arXiv.org, 2021.
Abstract
In this paper, we consider two types of problems that have some similarity in their structure, namely, min-min problems and min-max saddle-point problems. Our approach is based on considering the outer minimization problem as a minimization problem with inexact oracle. This inexact oracle is calculated via inexact solution of the inner problem, which is either minimization or a maximization problem. Our main assumptions are that the problem is smooth and the available oracle is mixed: it is only possible to evaluate the gradient w.r.t. the outer block of variables which corresponds to the outer minimization problem, whereas for the inner problem only zeroth-order oracle is available. To solve the inner problem we use accelerated gradient-free method with zeroth-order oracle. To solve the outer problem we use either inexact variant of Vaydya's cutting-plane method or a variant of accelerated gradient method. As a result, we propose a framework that leads to non-asymptotic complexity bounds for both min-min and min-max problems. Moreover, we estimate separately the number of first- and zeroth-order oracle calls which are sufficient to reach any desired accuracy.
E. Gorbunov, M. Danilova, I. Shibaev, P. Dvurechensky, A. Gasnikov, Near-optimal high probability complexity bounds for non-smooth stochastic optimization with heavy-tailed noise, Preprint no. arXiv:2106.05958, Cornell University Library, arXiv.org, 2021.
Abstract
Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice, e.g., in several NLP tasks. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.
A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over time-varying graphs, Preprint no. arXiv:2103.15598, Cornell University Library, arXiv.org, 2021.
Abstract
We consider a distributed stochastic optimization problem that is solved by a decentralized network of agents with only local communication between neighboring agents. The goal of the whole system is to minimize a global objective function given as a sum of local objectives held by each agent. Each local objective is defined as an expectation of a convex smooth random function and the agent is allowed to sample stochastic gradients for this function. For this setting we propose the first accelerated (in the sense of Nesterov's acceleration) method that simultaneously attains optimal up to a logarithmic factor communication and oracle complexity bounds for smooth strongly convex distributed stochastic optimization. We also consider the case when the communication graph is allowed to vary with time and obtain complexity bounds for our algorithm, which are the first upper complexity bounds for this setting in the literature.
V. Tominin , Y. Tominin , E. Borodich , D. Kovalev, A. Gasnikov, P. Dvurechensky, On accelerated methods for saddle-point problems with composite structure, Preprint no. arXiv:2103.09344, Cornell University Library, arXiv.org, 2021.
Abstract
We consider strongly-convex-strongly-concave saddle-point problems with general non-bilinear objective and different condition numbers with respect to the primal and the dual variables. First, we consider such problems with smooth composite terms, one of which having finite-sum structure. For this setting we propose a variance reduction algorithm with complexity estimates superior to the existing bounds in the literature. Second, we consider finite-sum saddle-point problems with composite terms and propose several algorithms depending on the properties of the composite terms. When the composite terms are smooth we obtain better complexity bounds than the ones in the literature, including the bounds of a recently proposed nearly-optimal algorithms which do not consider the composite structure of the problem. If the composite terms are prox-friendly, we propose a variance reduction algorithm that, on the one hand, is accelerated compared to existing variance reduction algorithms and, on the other hand, provides in the composite setting similar complexity bounds to the nearly-optimal algorithm which is designed for non-composite setting. Besides that, our algorithms allow to separate the complexity bounds, i.e. estimate, for each part of the objective separately, the number of oracle calls that is sufficient to achieve a given accuracy. This is important since different parts can have different arithmetic complexity of the oracle, and it is desired to call expensive oracles less often than cheap oracles. The key thing to all these results is our general framework for saddle-point problems, which may be of independent interest. This framework, in turn is based on our proposed Accelerated Meta-Algorithm for composite optimization with probabilistic inexact oracles and probabilistic inexactness in the proximal mapping, which may be of independent interest as well.
P. Dvurechensky, M. Staudigl, Hessian barrier algorithms for non-convex conic optimization, Preprint no. arXiv:2111.00100, Cornell University Library, arXiv.org, 2021.
Abstract
We consider the minimization of a continuous function over the intersection of a regular cone with an affine set via a new class of adaptive first- and second-order optimization methods, building on the Hessian-barrier techniques introduced in [Bomze, Mertikopoulos, Schachinger, and Staudigl, Hessian barrier algorithms for linearly constrained optimization problems, SIAM Journal on Optimization, 2019]. Our approach is based on a potential-reduction mechanism and attains a suitably defined class of approximate first- or second-order KKT points with the optimal worst-case iteration complexity O(??2) (first-order) and O(??3/2) (second-order), respectively. A key feature of our methodology is the use of self-concordant barrier functions to construct strictly feasible iterates via a disciplined decomposition approach and without sacrificing on the iteration complexity of the method. To the best of our knowledge, this work is the first which achieves these worst-case complexity bounds under such weak conditions for general conic constrained optimization problems.
M. Danilova, P. Dvurechensky, A. Gasnikov, E. Gorbunov, S. Guminov, D. Kamzolov, I. Shibaev, Recent theoretical advances in non-convex optimization, Preprint no. arXiv:2012.06188, Cornell University, 2020.
A. Rastogi, P. Mathé, Inverse learning in Hilbert scales, Preprint no. arXiv:2002.10208, Cornell University, 2020.
Abstract
We study the linear ill-posed inverse problem with noisy data in the statistical learning setting. Approximate reconstructions from random noisy data are sought with general regularization schemes in Hilbert scale. We discuss the rates of convergence for the regularized solution under the prior assumptions and a certain link condition. We express the error in terms of certain distance functions. For regression functions with smoothness given in terms of source conditions the error bound can then be explicitly established.
A. Sadiev, A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zeroth-order algorithms for smooth saddle-point problems, Preprint no. arXiv:2009.09908, Cornell University, 2020.
Abstract
In recent years, the importance of saddle-point problems in machine learning has increased. This is due to the popularity of GANs. In this paper, we solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order oracles. Theoretical analysis shows that in the case when the optimization set is a simplex, we lose only logn times in the stochastic convergence term. The paper also provides an approach to solving saddle-point problems, when the oracle for one of the variables has zero order, and for the second - first order. Subsequently, we implement zeroth-order and 1/2th-order methods to solve practical problems.
D. Tiapkin, A. Gasnikov, P. Dvurechensky, Stochastic saddle-point optimization for Wasserstein barycenters, Preprint no. arXiv:2006.06763, Cornell University, 2020.
Abstract
We study the computation of non-regularized Wasserstein barycenters of probability measures supported on the finite set. The first result gives a stochastic optimization algorithm for the discrete distribution over the probability measures which is comparable with the current best algorithms. The second result extends the previous one to the arbitrary distribution using kernel methods. Moreover, this new algorithm has a total complexity better than the Stochastic Averaging approach via the Sinkhorn algorithm in many cases.
N. Tupitsa, P. Dvurechensky, A. Gasnikov, C.A. Uribe , Multimarginal optimal transport by accelerated alternating minimization, Preprint no. arXiv:2004.02294, Cornell University Library, arXiv.org, 2020.
Abstract
We consider a multimarginal optimal transport, which includes as a particular case the Wasserstein barycenter problem. In this problem one has to find an optimal coupling between m probability measures, which amounts to finding a tensor of the order m. We propose an accelerated method based on accelerated alternating minimization and estimate its complexity to find the approximate solution to the problem. We use entropic regularization with sufficiently small regularization parameter and apply accelerated alternating minimization to the dual problem. A novel primal-dual analysis is used to reconstruct the approximately optimal coupling tensor. Our algorithm exhibits a better computational complexity than the state-of-the-art methods for some regimes of the problem parameters.
P. Dvurechensky, K. Safin, S. Shtern, M. Staudigl, Generalized self-concordant analysis of Frank--Wolfe algorithms, Preprint no. arXiv:2010.01009, Cornell University, 2020.
Abstract
Projection-free optimization via different variants of the Frank-Wolfe (FW) method has become one of the cornerstones in large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant (GSC) functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex. Indeed, in a number of applications, e.g. inverse covariance estimation or distance-weighted discrimination problems in support vector machines, the loss is given by a GSC function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. This paper closes this apparent gap in the literature by developing provably convergent FW algorithms with standard O(1/k) convergence rate guarantees. If the problem formulation allows the efficient construction of a local linear minimization oracle, we develop a FW method with linear convergence rate.
P. Dvurechensky, S. Shtern, M. Staudigl, P. Ostroukhov, K. Safin, Self-concordant analysis of Frank--Wolfe algorithms, Preprint no. arXiv:2002.04320, Cornell University, 2020.
Abstract
Projection-free optimization via different variants of the Frank-Wolfe (FW) method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k), k being the iteration counter. If the problem can be represented by a local linear minimization oracle, we are the first to propose a FW method with linear convergence rate without assuming neither strong convexity nor a Lipschitz continuous gradient.
A. Rastogi, G. Blanchard, P. Mathé, Convergence analysis of Tikhonov regularization for non-linear statistical inverse learning problems, Preprint no. arXiv:1902.05404, Cornell University Library, arXiv.org, 2019.
Abstract
We study a non-linear statistical inverse learning problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization, MOR) approach to reconstruct the estimator of the quantity for the non-linear ill-posed inverse problem. The estimator is defined as the minimizer of a Tikhonov functional, which is the sum of a data misfit term and a quadratic penalty term. We develop a theoretical analysis for the minimizer of the Tikhonov regularization scheme using the ansatz of reproducing kernel Hilbert spaces. We discuss optimal rates of convergence for the proposed scheme, uniformly over classes of admissible solutions, defined through appropriate source conditions.
F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, A. Kroshnin, V. Piskunova, Inexact model: A framework for optimization and variational inequalities, Preprint no. arXiv:1902.00990, Cornell University Library, arXiv.org, 2019.
C.A. Uribe, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, A. Nedić, Distributed computation of Wasserstein barycenters over networks, Preprint no. arXiv:1803.02933, Cornell University Library, arXiv.org, 2018.
P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C.A. Uribe, A. Nedić, Decentralize and randomize: Faster algorithm for Wasserstein barycenters, Preprint no. arXiv:1806.03915, Cornell University Library, arXiv.org, 2018.
Abstract
We study the problem of decentralized distributed computation of a discrete approximation for regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. Particularly, we assume that there is a network of agents/machines/computers where each agent holds a private continuous probability measure, and seeks to compute the barycenter of all the measures in the network by getting samples from its local measure and exchanging information with its neighbors. Motivated by this problem, we develop and theoretically analyze a novel accelerated primal-dual stochastic gradient method for general stochastic convex optimization problems with linear equality constraints. Then, we apply this method to the decentralized distributed optimization setting to propose a new algorithm for the distributed semi-discrete regularized Wasserstein barycenter problem. The proposed algorithm can be executed over arbitrary networks that are undirected, connected and static, using the local information only. Moreover, we show explicit non-asymptotic complexity in terms of the problem parameters. Finally, we show the effectiveness of our method on the distributed computation of the regularized Wasserstein barycenter of univariate Gaussian and von Mises distributions, as well as on some applications to image aggregation.
P. Mathé, Bayesian inverse problems with non-commuting operators, Preprint no. arXiv:1801.09540, Cornell University Library, arXiv.org, 2018.
Abstract
The Bayesian approach to ill-posed operator equations in Hilbert space recently gained attraction. In this context, and when the prior distribution is Gaussian, then two operators play a significant role, the one which governs the operator equation, and the one which describes the prior covariance. Typically it is assumed that these operators commute. Here we extend this analysis to non-commuting operators, replacing the commutativity assumption by a link condition. We discuss its relation to the commuting case, and we indicate that this allows to use interpolation type results to obtain tight bounds for the contraction of the posterior Gaussian distribution towards the data generating element.
M. Hintermüller, N. Strogies, On the identification of the friction coefficient in a semilinear system for gas transport through a network, Preprint, DFG SFB Transregio 154 ``Mathematical Modelling, Simulation and Optimization using the Example of Gas Networks'', 2017.
L. Bogolubsky, P. Dvurechensky, A. Gasnikov, G. Gusev, Y. Nesterov, A. Raigorodskii, A. Tikhonov, M. Zhukovskii, Learning supervised PageRank with gradient-based and gradient-free optimization methods, Preprint no. arXiv:1603.00717, Cornell University Library, arXiv.org, 2016.
A. Chernov, P. Dvurechensky, A. Gasnikov, Fast primal-dual gradient method for strongly convex minimization problems with linear constraints, Preprint no. arXiv:1605.02970, Cornell University Library, arXiv.org, 2016.
S.W. Anzengruber, B. Hofmann, P. Mathé, Regularization properties of the discrepancy principle for Tikhonov regularization in Banach spaces, Preprint no. 12, Technische Universität Chemnitz, Fakultät für Mathematik, 2012.
B. Hofmann, P. Mathé, Parameter choice in Banach space regularization under variational inequalities, Preprint no. 05, Technische Universität Chemnitz, Fakultät für Mathematik, 2012.
V. Spokoiny, Parametric estimation. Finite sample theory, Preprint no. arXiv:1111.3029, Cornell University Library, arXiv.org, 2012.
V. Spokoiny, Roughness penalty, Wilks phenomenon, and Bernstein--von Mises theorem, Preprint no. arXiv:1205.0498, Cornell University Library, arXiv.org, 2012.
R.I. Boţ, B. Hofmann, P. Mathé, Regularizability of ill-posed problems and the modulus of continuity, Preprint no. 17, Technische Universität Chemnitz, Fakultät für Mathematik, 2011.
B. Hofmann, P. Mathé, Some note on the modulus of continuity for ill-posed problems in Hilbert space, Preprint no. 07, Technische Universität Chemnitz, Fakultät für Mathematik, 2011.
G. Blanchard, P. Mathé, Discrepancy principle for statistical inverse problems with application to conjugate gradient iteration, Preprint no. 07, Universität Potsdam, Institut für Mathematik, 2011.

Contact

Prof. Dr. Vladimir Spokoiny