Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function.

The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of \(E: Y \longmapsto SW_2^2(\gamma_Y, \gamma_Z)\), i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support \(Y\)of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation \(E_p\) (estimating the expectation in SW using only p samples) and show convergence results on the critical points of \(E_p\)to those of \(E\), as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising \(E\)and \(E_p\)converge towards (Clarke) critical points of these energies.

PREPRINT

Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance

This paper deals with the reconstruction of a discrete measure \(\gamma_Z \)on \(R^d\)from the knowledge of its pushforward measures \(P_i\)#\(\gamma_Z\)linear applications \(P_i: R^d →R^h\) (for instance projections onto subspaces). The measure \(\gamma_Z \)being fixed, assuming that the rows of the matrices \(P_i\)are independent realizations of laws which do not give mass to hyperplanes, we show that if \(\sum_i h_i > d\), this reconstruction problem has almost certainly a unique solution. This holds for any number of points in \(\gamma_Z\). A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance.

We generalise Wasserstein Barycentres, allowing the computation of barycentres between measures in different spaces. We provide efficient solvers for this new problem, and study a particular case which corresponds to a reconstruction problem. Studying the properties of this problem, we draw partial conclusions on the optima of the discrete Sliced Wasserstein Distance.