Optimal transport, also known as the transportation problem, is a mathematical framework used to measure the dissimilarity between two probability distributions. It originated from the field of operations research in the mid20th century but has found applications in various fields including economics, computer vision, machine learning, image processing, quantum chemistry, diffusion processes, and many more. It has been very influential in connecting partial differential equations, geometry, and probability. In the context of probability distributions, optimal transport aims to find the most costeffective way to transform one distribution into another. Each distribution is considered as a collection of resources or mass, and the transportation cost is determined by a distance metric or a cost function between these masses. The optimal transport problem seeks to find a transport plan, i.e., a probability measure on the product space, that minimizes this cost, effectively "moving" the mass from one distribution to another in the most efficient manner possible. One of the most common distance metrics used in optimal transport is the KantorovichWasserstein distance (also known as the Earth Mover's Distance), where the cost function is given via the Euclidean distance
Consider grayscale images where each pixel represents a mass. By normalizing pixel intensities, discrete probability measures on a pixel grid arise. The transportation cost between pixels is the squared Euclidean distance. The goal is to transport one image to another using a transportation plan represented by a matrix. The minimum transportation cost, found by varying the plan, gives the KantorovichWasserstein distance, a form of optimal transport distance. This distance is applicable to various objects beyond images but involves intensive computations. Despite this, it captures object geometry well compared to simple distances like Euclidean. The Wasserstein barycenter defines the mean probability measure of a set of measures by minimizing the sum of squared KantorovichWasserstein distances to all measures in the set. Besides images, the probability measures or histograms can model other realworld objects like videos, texts, genetic sequences, protein backbones, etc.
Histograms and transportation plan (left). Euclidean and 2Wasserstein barycenter of images (right).
Statistical Optimal Transport
Quite often modern datasets appear to be too complex for being properly described by linear models. The absence of linearity makes impossible the direct application of standard inference techniques and requires a development of a new toolbox taking into account properties of the underlying space. At WIAS, an approach based on optimal transportation theory was developed that is a convenient instrument for the analysis of complex data sets. This includes the applicability of the classical resampling technique referred to as multiplier bootstrap for the case of 2Wasserstein space. Moreover, the convergence and concentration properties of the empirical barycenters are studied, and the setting is extended to the BuresWasserstein distance. The obtained theoretical results are used for the introduction of Gaussian processes indexed by multidimensional distributions: positive definite kernels between multivariate distributions are constructed via Hilbert space embedding relying on optimal transport maps.Computational Optimal Transport
Calculating the Wasserstein barycenter of m measures is a computationally hard optimization problem that includes repeated computation of m KantorovichWasserstein distances. Moreover, in the largescale setup, storage, and processing of transportation plans, required to calculate KantorovichWasserstein distances, can be intractable for computation on a single computer. On the other hand, recent studies on distributed learning and optimization algorithms demonstrated their efficiency for statistical and optimization problems over arbitrary networks of computers or other devices with inherently distributed data, i.e., the data is produced by a distributed network of sensors or the transmission of information is limited by communication or privacy constraints, i.e., only a limited amount of information can be shared across the network.
Evolution of local approximation of the 2Wasserstein barycenter on each node in the network for von Mises distributions (left), images from MNIST dataset (middle), brain images from IXI dataset (right). As the iteration counter increase, local approximations converge to the same distribution which is an approximation for the barycenter. Simulations made by César A. Uribe for the paper P. Dvurechensky, D. Dvinskikh, A. Gasnikov, C. A. Uribe, and A. Nedic. Decentralize and randomize: Faster algorithm for Wasserstein barycenters. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. CesaBianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, NIPS'18, pages 1078310793. Curran Associates, Inc., 2018. arXiv:1802.04367.
Unbalanced optimal transport and partial differential equations
Based on the dynamical formulation of the KantorovichWasserstein distance by BenamouBrenier, new classes of transportation distances were constructed by generalizing the action functional that is minimized over all connected curves as well as the continuity equation. These "dissipation distances" contain in general additional terms that account for absorption and generation of mass or even reactions between the different components in the vectorial case. In particular, the additional terms lift the restriction of equal mass of the measures, leading to unbalanced optimal transport. The characterization of the induced geometry is immensely complicated. For a special case of dissipation distance with additional absorption and generation of mass, this was possible and led to the definition of the HellingerKantorovich distance. The latter can be seen as an infimal convolution of the KantorovichWasserstein distance and the HellingerKakuktani distance. Several equivalent and useful formulations for this new notion of distance were derived, shortest connecting curves, socalled geodesics, were characterized, convexity conditions for functionals along these geodesics were established, and gradient flows were studied.Highlights
Optimal transport for imaging
Together with partners from HU Berlin, image segmentation were studied using optimal transport distances as a fidelity term within the project “Optimal transport for imaging” of The Berlin Mathematics Research Center MATH+. To obtain further computational efficiency, a novel algorithm was proposed equipped with a quantization of the information transmitted between the nodes of the computational network.Publications
Articles in Refereed Journals

M. Liero, A. Mielke, G. Savaré, Fine properties of geodesics and geodesic $lambda$convexity for the HellingerKantorovich distance, Archive for Rational Mechanics and Analysis, 247 (2023), pp. 112/1112/73, DOI 10.1007/s00205023019411 .
Abstract
We study the fine regularity properties of optimal potentials for the dual formulation of the HellingerKantorovich problem (HK), providing sufficient conditions for the solvability of the primal Monge formulation. We also establish new regularity properties for the solution of the HamiltonJacobi equation arising in the dual dynamic formulation of HK, which are sufficiently strong to construct a characteristic transportdilation flow driving the geodesic interpolation between two arbitrary positive measures. These results are applied to study relevant geometric properties of HK geodesics and to derive the convex behaviour of their Lebesgue density along the transport flow. Finally, exact conditions for functionals defined on the space of measures are derived that guarantee the geodesic lambdaconvexity with respect to the HellingerKantorovich distance. 
TH. Frenzel, M. Liero, Effective diffusion in thin structures via generalized gradient systems and EDPconvergence, Discrete and Continuous Dynamical Systems  Series S, 14 (2021), pp. 395425, DOI 10.3934/dcdss.2020345 .
Abstract
The notion of EnergyDissipationPrinciple convergence (EDPconvergence) is used to derive effective evolution equations for gradient systems describing diffusion in a structure consisting of several thin layers in the limit of vanishing layer thickness. The thicknesses of the sublayers tend to zero with different rates and the diffusion coefficients scale suitably. The FokkerPlanck equation can be formulated as gradientflow equation with respect to the logarithmic relative entropy of the system and a quadratic Wassersteintype gradient structure. The EDPconvergence of the gradient system is shown by proving suitable asymptotic lower limits of the entropy and the total dissipation functional. The crucial point is that the limiting evolution is again described by a gradient system, however, now the dissipation potential is not longer quadratic but is given in terms of the hyperbolic cosine. The latter describes jump processes across the thin layers and is related to the Marcelinde Donder kinetics. 
A. Kroshnin, V. Spokoiny, A. Suvorikova, Statistical inference for BuresWasserstein barycenters, The Annals of Applied Probability, 31 (2021), pp. 12641298, DOI 10.1214/20AAP1618 .

V. Laschos, A. Mielke, Geometric properties of cones with applications on the HellingerKantorovich space, and a new distance on the space of probability measures, Journal of Functional Analysis, 276 (2019), pp. 35293576, DOI 10.1016/j.jfa.2018.12.013 .
Abstract
By studying general geometric properties of cone spaces, we prove the existence of a distance on the space of Probability measures that turns the HellingerKantorovich space into a cone space over the space of probabilities measures. Here we exploit a natural twoparameter scaling property of the HellingerKantorovich distance. For the new space, we obtain a full characterization of the geodesics. We also provide new geometric properties for the original space, including a twoparameter rescaling and reparametrization of the geodesics, localangle condition and some partial Ksemiconcavity of the squared distance, that it will be used in a future paper to prove existence of gradient flows. 
M. Liero, A. Mielke, G. Savaré, Optimal entropytransport problems and a new HellingerKantorovich distance between positive measures, Inventiones mathematicae, 211 (2018), pp. 9691117 (published online on 14.12.2017), DOI 10.1007/s0022201707598 .
Abstract
We develop a full theory for the new class of Optimal EntropyTransport problems between nonnegative and finite Radon measures in general topological spaces. They arise quite naturally by relaxing the marginal constraints typical of Optimal Transport problems: given a couple of finite measures (with possibly different total mass), one looks for minimizers of the sum of a linear transport functional and two convex entropy functionals, that quantify in some way the deviation of the marginals of the transport plan from the assigned measures. As a powerful application of this theory, we study the particular case of Logarithmic EntropyTransport problems and introduce the new HellingerKantorovich distance between measures in metric spaces. The striking connection between these two seemingly far topics allows for a deep analysis of the geometric properties of the new geodesic distance, which lies somehow between the wellknown HellingerKakutani and KantorovichWasserstein distances. 
E. Cinti, F. Otto, Interpolation inequalities in pattern formation, Journal of Functional Analysis, 271 (2016), pp. 10431376.
Abstract
We prove some interpolation inequalities which arise in the analysis of pattern formation in physics. They are the strong version of some already known estimates in weak form that are used to give a lower bound of the energy in many contexts (coarsening and branching in micromagnetics and superconductors). The main ingredient in the proof of our inequalities is a geometric construction which was first used by Choksi, Conti, Kohn, and one of the authors in [4] in the study of branching in superconductors. 
M. Liero, A. Mielke, G. Savaré, Optimal transport in competition with reaction: The HellingerKantorovich distance and geodesic curves, SIAM Journal on Mathematical Analysis, 48 (2016), pp. 28692911.
Abstract
We discuss a new notion of distance on the space of finite and nonnegative measures on Ω ⊂ ℝ ^{d}, which we call HellingerKantorovich distance. It can be seen as an infconvolution of the wellknown KantorovichWasserstein distance and the HellingerKakutani distance. The new distance is based on a dynamical formulation given by an Onsager operator that is the sum of a Wasserstein diffusion part and an additional reaction part describing the generation and absorption of mass. We present a full characterization of the distance and some of its properties. In particular, the distance can be equivalently described by an optimal transport problem on the cone space over the underlying space Ω. We give a construction of geodesic curves and discuss examples and their general properties. 
K. Disser, M. Liero, On gradient structures for Markov chains and the passage to Wasserstein gradient flows, Networks and Heterogeneous Media, 10 (2015), pp. 233253.
Abstract
We study the approximation of Wasserstein gradient structures by their finitedimensional analog. We show that simple finitevolume discretizations of the linear FokkerPlanck equation exhibit the recently established entropic gradientflow structure for reversible Markov chains. Then, we reprove the convergence of the discrete scheme in the limit of vanishing mesh size using only the involved gradientflow structures. In particular, we make no use of the linearity of the equations nor of the fact that the FokkerPlanck equation is of second order. 
M. Erbar, J. Maas, D.R.M. Renger, From large deviations to Wasserstein gradient flows in multiple dimensions, Electronic Communications in Probability, 20 (2015), pp. 112.
Abstract
We study the large deviation rate functional for the empirical distribution of independent Brownian particles with drift. In one dimension, it has been shown by Adams, Dirr, Peletier and Zimmer [ADPZ11] that this functional is asymptotically equivalent (in the sense of Gammaconvergence) to the JordanKinderlehrerOtto functional arising in the Wasserstein gradient flow structure of the FokkerPlanck equation. In higher dimensions, part of this statement (the lower bound) has been recently proved by Duong, Laschos and Renger, but the upper bound remained open, since the proof in [DLR13] relies on regularity properties of optimal transport maps that are restricted to one dimension. In this note we present a new proof of the upper bound, thereby generalising the result of [ADPZ11] to arbitrary dimensions. 
M. Liero, A. Mielke, Gradient structures and geodesic convexity for reactiondiffusion systems, Philosophical Transactions of the Royal Society A : Mathematical, Physical & Engineering Sciences, 371 (2013), pp. 20120346/120120346/28.
Abstract
We consider systems of reactiondiffusion equations as gradient systems with respect to an entropy functional and a dissipation metric given in terms of a socalled Onsager operator, which is a sum of a diffusion part of Wasserstein type and a reaction part. We provide methods for establishing geodesic lambdaconvexity of the entropy functional by purely differential methods, thus circumventing arguments from mass transportation. Finally, several examples, including a driftdiffusion system, provide a survey on the applicability of the theory. We consider systems of reactiondiffusion equations as gradient systems with respect to an entropy functional and a dissipation metric given in terms of a socalled Onsager operator, which is a sum of a diffusion part of Wasserstein type and a reaction part. We provide methods for establishing geodesic lambdaconvexity of the entropy functional by purely differential methods, thus circumventing arguments from mass transportation. Finally, several examples, including a driftdiffusion system, provide a survey on the applicability of the theory.
Contributions to Collected Editions

A. Kroshnin, D. Dvinskikh, P. Dvurechensky, A. Gasnikov, N. Tupitsa, C.A. Uribe, On the complexity of approximating Wasserstein barycenter, in: International Conference on Machine Learning, 915 June 2019, Long Beach, California, USA, 97 of Proceedings of Machine Learning Research, 2019, pp. 35303540.
Abstract
We study the complexity of approximating Wassertein barycenter of discrete measures, or histograms by contrasting two alternative approaches, both using entropic regularization. We provide a novel analysis for our approach based on the Iterative Bregman Projections (IBP) algorithm to approximate the original nonregularized barycenter. We also get the complexity bound for alternative acceleratedgradientdescentbased approach and compare it with the bound obtained for IBP. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to ", which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximalIBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.
Preprints, Reports, Technical Reports

M. Eigel, Ch. Miranda, J. Schütte, D. Sommer, Approximating Langevin Monte Carlo with ResNetlike neural network architectures, Preprint no. 3077, WIAS, Berlin, 2023, DOI 10.20347/WIAS.PREPRINT.3077 .
Abstract, PDF (795 kByte)
We sample from a given target distribution by constructing a neural network which maps samples from a simple reference, e.g. the standard normal distribution, to samples from the target. To that end, we propose using a neural network architecture inspired by the Langevin Monte Carlo (LMC) algorithm. Based on LMC perturbation results, we show approximation rates of the proposed architecture for smooth, logconcave target distributions measured in the Wasserstein2 distance. The analysis heavily relies on the notion of subGaussianity of the intermediate measures of the perturbed LMC process. In particular, we derive bounds on the growth of the intermediate variance proxies under different assumptions on the perturbations. Moreover, we propose an architecture similar to deep residual neural networks and derive expressivity results for approximating the sample to target distribution map. 
M. Eigel, R. Gruhlke, D. Sommer, Less interaction with forward models in Langevin dynamics, Preprint no. 2987, WIAS, Berlin, 2022, DOI 10.20347/WIAS.PREPRINT.2987 .
Abstract, PDF (1423 kByte)
Ensemble methods have become ubiquitous for the solution of Bayesian inference problems. Stateoftheart Langevin samplers such as the Ensemble Kalman Sampler (EKS), Affine Invariant Langevin Dynamics (ALDI) or its extension using weighted covariance estimates rely on successive evaluations of the forward model or its gradient. A main drawback of these methods hence is their vast number of required forward calls as well as their possible lack of convergence in the case of more involved posterior measures such as multimodal distributions. The goal of this paper is to address these challenges to some extend. First, several possible adaptive ensemble enrichment strategies that successively enlarge the number of particles in the underlying Langevin dynamics are discusses that in turn lead to a significant reduction of the total number of forward calls. Second, analytical consistency guarantees of the ensemble enrichment method are provided for linear forward models. Third, to address more involved target distributions, the method is extended by applying adapted Langevin dynamics based on a homotopy formalism for which convergence is proved. Finally, numerical investigations of several benchmark problems illustrates the possible gain of the proposed method, comparing it to stateoftheart Langevin samplers. 
V. Laschos, A. Mielke, Evolutionary variational inequalities on the HellingerKantorovich and spherical HellingerKantorovich spaces, Preprint no. 2973, WIAS, Berlin, 2022, DOI 10.20347/WIAS.PREPRINT.2973 .
Abstract, PDF (491 kByte)
We study the minimizing movement scheme for families of geodesically semiconvex functionals defined on either the HellingerKantorovich or the Spherical HellingerKantorovich space. By exploiting some of the finer geometric properties of those spaces, we prove that the sequence of curves, which are produced by geodesically interpolating the points generated by the minimizing movement scheme, converges to curves that satisfy the Evolutionary Variational Inequality (EVI), when the time step goes to 0. 
R. Gruhlke, M. Eigel, Lowrank Wasserstein polynomial chaos expansions in the framework of optimal transport, Preprint no. 2927, WIAS, Berlin, 2022, DOI 10.20347/WIAS.PREPRINT.2927 .
Abstract, PDF (10 MByte)
A unsupervised learning approach for the computation of an explicit functional representation of a random vector Y is presented, which only relies on a finite set of samples with unknown distribution. Motivated by recent advances with computational optimal transport for estimating Wasserstein distances, we develop a new Wasserstein multielement polynomial chaos expansion (WPCE). It relies on the minimization of a regularized empirical Wasserstein metric known as debiased Sinkhorn divergence.As a requirement for an efficient polynomial basis expansion, a suitable (minimal) stochastic coordinate system X has to be determined with the aim to identify ideally independent random variables. This approach generalizes representations through diffeomorphic transport maps to the case of noncontinuous and noninjective model classes M with different input and output dimension, yielding the relation Y=M(X) in distribution. Moreover, since the used PCE grows exponentially in the number of random coordinates of X, we introduce an appropriate lowrank format given as stacks of tensor trains, which alleviates the curse of dimensionality, leading to only linear dependence on the input dimension. By the choice of the model class M and the smooth loss function, higher order optimization schemes become possible. It is shown that the relaxation to a discontinuous model class is necessary to explain multimodal distributions. Moreover, the proposed framework is applied to a numerical upscaling task, considering a computationally challenging microscopic random nonperiodic composite material. This leads to tractable effective macroscopic random field in adopted stochastic coordinates.

R. Krawchenko, C.A. Uribe, A. Gasnikov, P. Dvurechensky, Distributed optimization with quantization for computing Wasserstein barycenters, Preprint no. 2782, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2782 .
Abstract, PDF (946 kByte)
We study the problem of the decentralized computation of entropyregularized semidiscrete Wasserstein barycenters over a network. Building upon recent primaldual approaches, we propose a sampling gradient quantization scheme that allows efficient communication and computation of approximate barycenters where the factor distributions are stored distributedly on arbitrary networks. The communication and algorithmic complexity of the proposed algorithm are shown, with explicit dependency on the size of the support, the number of distributions, and the desired accuracy. Numerical results validate our algorithmic analysis. 
P. Dupuis, V. Laschos, K. Ramanan, Large deviations for empirical measures generated by Gibbs measures with singular energy functionals, Preprint no. 2203, WIAS, Berlin, 2015, DOI 10.20347/WIAS.PREPRINT.2203 .
Abstract, PDF (448 kByte)
We establish large deviation principles (LDPs) for empirical measures associated with a sequence of Gibbs distributions on nparticle configurations, each of which is defined in terms of an inverse temperature b_{n} and an energy functional that is the sum of a (possibly singular) interaction and confining potential. Under fairly general assumptions on the potentials, we establish LDPs both with speeds (b_{n})/(n) ® ¥, in which case the rate function is expressed in terms of a functional involving the potentials, and with the speed b_{n} =n, when the rate function contains an additional entropic term. Such LDPs are motivated by questions arising in random matrix theory, sampling and simulated annealing. Our approach, which uses the weak convergence methods developed in “A weak convergence approach to the theory of large deviations", establishes large deviation principles with respect to stronger, Wassersteintype topologies, thus resolving an open question in “Firstorder global asymptotics for confined particles with singular pair repulsion". It also provides a common framework for the analysis of LDPs with all speeds, and includes cases not covered due to technical reasons in previous works.