Sinkhorn Divergence #92

davibarreira · 2021-06-02T14:38:18Z

Implemented the function sinkhorn_divergence which is an unbiased version of sinkhorn and that is also a metric in the space of probability spaces. This function is similar to the ot.bregman.empirical_sinkhorn_divergence from POT.py.

The tests required the use of PyCall, because ot.bregman.empirical_sinkhorn_divergence is not supported on PythonOT.jl.

…imalTransport.jl into sinkhorndivergence

coveralls · 2021-06-02T14:48:19Z

Pull Request Test Coverage Report for Build 899735377

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.06%) to 96.133%

Totals
Change from base Build 897834664:	0.06%
Covered Lines:	348
Relevant Lines:	362

💛 - Coveralls

devmotion · 2021-06-02T14:58:56Z

The tests required the use of PyCall, because ot.bregman.empirical_sinkhorn_divergence is not supported on PythonOT.jl

It would be good to add it to PythonOT.

davibarreira · 2021-06-02T16:26:44Z

The tests required the use of PyCall, because ot.bregman.empirical_sinkhorn_divergence is not supported on PythonOT.jl

It would be good to add it to PythonOT.

I'll see how this can be done,

davibarreira · 2021-06-02T18:10:17Z

Submitted a PR to the PythonOT.jl package. Once the PR goes through there, I'll update this PR here in order to remove the PyCall dependency on the test.

zsteve · 2021-06-02T18:18:10Z

src/OptimalTransport.jl

+
+The Sinkhorn Divergence is computed as:
+```math
+S_{c,ε}(μ,ν) := OT_{c,ε}(μ,ν) - \\frac{1}{2}(OT_{c,ε}(μ,μ) + OT_{c,ε}(ν,ν)),


Could we use \\operatorname{OT} here instead?

Same for \\operatorname{S}

Yeah, better to add the \\operatorname. Will do.

zsteve · 2021-06-02T18:22:18Z

src/OptimalTransport.jl

+        regularization=regularization,
+        kwargs...,
+    )
+    return max(0, OTμν - 0.5 * (OTμμ + OTνν))


Is there a reason we take max(0, ...) here?

To guarantee that the value is larger or equal to 0. The same is also present in POT.py.

Makes sense. It doesn't come from the 'math' though, I assume any negative values would be due to numerical issues (i.e. any negative values would be very small?)

Correct. In theory, this is a "proper" metric (symmetric, 0 <=> x=y , trig inequality), so it would never be negative.

zsteve · 2021-06-02T18:22:54Z

src/OptimalTransport.jl

+See also: [`sinkhorn2`](@ref)
+"""
+function sinkhorn_divergence(
+    c,


Could we stick to the calling convention of sinkhorn here, i.e. (\mu, \nu, C, \varepsilon) in that order.

I thought we were going to revert all functions to (c,\mu,\nu ...), like the implementations of ot_cost and ot_plan. Also, I think we should use lowercase c for the cost function, and uppercase C for cost matrix.

Ah I see. Makes sense, which PR is this from? Sorry I've not kept up with all of them :P

Not your fault at all! I thought we had settle on it in the issue #63 , but now I see it's not there. So probably me and @devmotion talked about in one of the Multivariate Normals or 1D implementations (which have almost hundreds of comments). Things are moving so fast in this package now, that I actually don't know where it is.
I'll comment in issue #63, so we can make a decision. If I remember correctly, it was actually a suggestion of @devmotion, based on the fact that one usually declares arguments that are functions in the beginning (I'm guessing this is a Julia standard or something).

Yes, this is a standard and mentioned in the Julia documentation about ordering of function arguments since it allows one to use the do syntax

map(rand(20)) do x return x^2 end

instead of

map(x -> x^2, rand(20))

which is very convenient for longer and more complicated functions.

devmotion · 2021-06-02T19:45:18Z

src/OptimalTransport.jl

+See also: [`sinkhorn2`](@ref)
+"""
+function sinkhorn_divergence(
+    c,


Could we implement this with the cost matrices as inputs instead? I think this would have multiple advantages:

it would allow to reuse precomputed cost matrices, here and in other functions

it would be guaranteed to work on the GPU (BTW GPU tests should be added if possible) whereas the default implementation of pairwise probably doesn't for most distances and custom functions (IIRC it uses incorrect containers and indexing which should be avoided on GPUs and is often disabled by users)

Additionally, this would still allow to define the function implemented by just forwarding it to the one with cost matrices that are computed from the DiscreteNonParametric and the cost function.

BTW it's also quite restrictive to only allow univariate messures here it seems.

Yeah, the univariate was a mistake. I forgot that DiscreteNonParametric was only for the univariate case. I could implement it with a matrix cost, but it would actually require 3 matrices, so the it would be quite "ugly". It would be something like:
sinkhorn_divergence(mu,nu,C1,C2,C3,eps). So I think we should stick with the cost function... I don't know much about programming for GPU, but, don't you think we could adapt this somehow?

I could implement another sinkhorn_divergence that would take the cost matrices as argument. And would could perhaps indicate that such version should be used in case the user is interested in using GPU.

You can just define both and forward the call from the one with the cost function to the other:

Additionally, this would still allow to define the function implemented by just forwarding it to the one with cost matrices that are computed from the DiscreteNonParametric and the cost function.

such version should be used in case the user is interested in using GPU.

It could be useful without GPUs as well, e.g., if you evaluate sinkhorn, sinkhorn2 or other other regularized OT distances with the same cost matrices.

In fact, it could also be useful to allow to specify pre-computed plans for the different sinkhorn2 calls.

devmotion · 2021-06-02T19:46:49Z

src/OptimalTransport.jl

+    μ::DiscreteNonParametric,
+    ν::DiscreteNonParametric,
+    ε;
+    regularization=false,


This can be removed since it's the default in sinkhorn2:

Suggested change

regularization=false,

devmotion · 2021-06-02T19:50:11Z

src/OptimalTransport.jl

+See also: [`sinkhorn2`](@ref)
+"""
+function sinkhorn_divergence(
+    c,


Yes, this is a standard and mentioned in the Julia documentation about ordering of function arguments since it allows one to use the do syntax

map(rand(20)) do x return x^2 end

instead of

map(x -> x^2, rand(20))

which is very convenient for longer and more complicated functions.

devmotion · 2021-06-02T19:50:24Z

src/OptimalTransport.jl

+        ν.p,
+        pairwise(c, μ.support, ν.support),
+        ε;
+        regularization=regularization,


Suggested change

regularization=regularization,

devmotion · 2021-06-02T19:50:40Z

src/OptimalTransport.jl

+        μ.p,
+        pairwise(c, μ.support; symmetric=true),
+        ε;
+        regularization=regularization,


Suggested change

regularization=regularization,

devmotion · 2021-06-02T19:50:51Z

src/OptimalTransport.jl

+        ν.p,
+        pairwise(c, ν.support; symmetric=true),
+        ε;
+        regularization=regularization,


Suggested change

regularization=regularization,

devmotion · 2021-06-02T19:54:11Z

src/OptimalTransport.jl

+        regularization=regularization,
+        kwargs...,
+    )
+    return max(0, OTμν - 0.5 * (OTμμ + OTνν))


We should avoid unwanted promotions (e.g. when working with Float32 on GPUs):

Suggested change

return max(0, OTμν - 0.5 * (OTμμ + OTνν))

return max(0, OTμν - (OTμμ + OTνν) / 2)

Intuitively I would have thought that the divergence can be negative if the regularization terms are included (BTW does one actually want to include them)?

Yeah, if the regularization terms are added, it could be negative, so you are correct that we should not allow the regularization keyword. For the Sinkhorn Divergence, one should probably never include the regularization terms, because the goal here is to turn the entropic ot cost into a "proper" metric in the space of probability distributions. Besides, Feydy (in the paper I cited), proved some neat properties of this metric, such as the fact that it also metrizes weak convergence.
TL;DR: I don't think one would want to ever include the regularization terms in the Sinkhorn Divergence.

For sinkhorn2, I saw one paper where the author suggested that the use of the cost plus regularization to behave better than when one removes it. Although, it can be negative, so it loses some of the interpretability. I usually refer to as "Sinkhorn loss" when I add the regularization, and "Sinkhorn cost" or "Sinkhorn distance" when I remove it. But this is personal, and not adopted in the literature generally (but it probably should, cause people use the terms interchangeably, and it becomes very confusing).

Thanks for the detailed explanation! This confirms my intuition, so let's just remove the regularization keyword argument. Maybe then we should even check that kwargs... does not contain a keyword argument regularization and otherwise either throw an error or (maybe better) drop the regularization keyword argument and display a @warning. Maybe something like

....; regularization=nothing, kwargs...) if regularization !== nothing @warn "`sinkhorn_divergence` does not support the `regularization` keyword argument" end

devmotion · 2021-06-02T19:57:42Z

test/entropic.jl

+        @testset "example" begin
+            # create distributions 
+            N = 100
+            μ = DiscreteNonParametric(rand(N), ones(N) / N)


Can we use random histograms?

Yeah. I'll update.

- Created the struct FiniteDiscreteMeasure, - Implemented two versions of sinkhorn_divergence, - Disabled the use of regularization on sinkhorn_divergence, - Fixed docstring with suggestions.

davibarreira · 2021-06-03T22:18:39Z

So, this current push in the PR is actually too much code for a single PR. I intend to break it down in other PR's, but, I thought it would be good to submit here the whole thing so you guys can take a look on how these new functions tie themselves to the rest of the package.
So, besides the sinkhorn_divergence function, I actually implemented the struct FiniteDiscreteMeasure and the cost_matrix function. Each one I'll separate in a different PR, in case we decide that they are appropriate for the package.

The FiniteDiscreteMeasure is a struct is a way that we can define our empirical measures and consistently write functions such as sinkhorn2(c, mu, nu) or sinkhorn2(C, mu, nu). Right now, the package sometimes refers mu and nu only as the support, and sometimes as actual measures. So the new struct would clear things.

The cost_matrix is just a helper function to construct cost matrices easily. This allows us to create other versions of our functions such as sinkhorn2 with a cost function instead of a cost matrix (similar to what I have done with sinkhorn_divergence).

@devmotion and @zsteve , what do you think of these two new additions? Should I submit a PR for each one?

I already wrote the test for all these functions, so if you guys prefer, we can actually just review everything in this PR.

devmotion · 2021-06-03T22:25:41Z

I think it's a good idea to move these two additions to separate PRs, then it's easier to discuss and polish them.

I also think that probably a FiniteDiscreteMeasure is useful, I have some additional suggestions but it's easier to take this in the corresponding PR.

I am a bit less convinced by cost_matrix, it seems to introduce some type instabilities but probably this could be fixed. I don't think it should be exported and become part of the official API though, but also here it's probably best to discuss this in a separate PR 🙂

davibarreira · 2021-06-13T13:10:14Z

The error in the check seems to be unrelated to the PR.

Project.toml

src/OptimalTransport.jl

devmotion · 2021-06-13T15:19:58Z

src/entropic/sinkhorn.jl

+        μ::Union{FiniteDiscreteMeasure, DiscreteNonParametric},
+        ν::Union{FiniteDiscreteMeasure, DiscreteNonParametric},


It's not really what we support, maybe just omit the type or add two docstrings (one basically referring to the other)?

devmotion · 2021-06-13T15:22:22Z

src/entropic/sinkhorn.jl

+        c,
+        μ::Union{FiniteDiscreteMeasure, DiscreteNonParametric},
+        ν::Union{FiniteDiscreteMeasure, DiscreteNonParametric},
+        ε; regularization=false, plan=nothing, kwargs...


It would be useful if we would restructure sinkhorn and introduce the different algorithms, as proposed in the other PR. Then users could also choose the stabilized algorithm or epsilon scaling here.

devmotion · 2021-06-13T15:23:00Z

src/entropic/sinkhorn.jl

+        μ::Union{FiniteDiscreteMeasure, DiscreteNonParametric},
+        ν::Union{FiniteDiscreteMeasure, DiscreteNonParametric},


src/entropic/sinkhorn.jl

devmotion · 2021-06-13T15:28:27Z

test/entropic/sinkhorn.jl

+            for (ε, metrics) in Iterators.product(
+                [0.1, 1.0, 10.0],
+                [
+                    (sqeuclidean, SqEuclidean()),
+                    (euclidean, Euclidean()),
+                    (totalvariation, TotalVariation()),
+                ],
+            )


This is a bit uncommon, usually one would just use two for loops here. One can even use the short notation

for x in xs, y in ys

devmotion · 2021-06-13T15:29:15Z

test/entropic/sinkhorn.jl

+            μ = OptimalTransport.discretemeasure(μsupp, μprobs)
+            ν = OptimalTransport.discretemeasure(νsupp)
+
+            for (ε, metrics) in Iterators.product(


devmotion · 2021-06-13T15:29:41Z

test/finitediscretemeasure.jl

@@ -0,0 +1,55 @@
+using Distributions: DiscreteNonParametric


This file seems unrelated to the PR?

test/runtests.jl

Co-authored-by: David Widmann <[email protected]>

davibarreira · 2021-06-17T12:25:33Z

Took a break from the computer a couple of days. Made the requested changes.

codecov-commenter · 2021-07-01T03:56:37Z

Codecov Report

Merging #92 (e261025) into master (f03b05c) will decrease coverage by 0.11%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master      #92      +/-   ##
==========================================
- Coverage   96.25%   96.14%   -0.12%     
==========================================
  Files          11       12       +1     
  Lines         561      570       +9     
==========================================
+ Hits          540      548       +8     
- Misses         21       22       +1

Impacted Files	Coverage Δ
src/entropic/sinkhorn_divergence.jl	`88.88% <88.88%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f03b05c...e261025. Read the comment docs.

davibarreira · 2021-07-01T12:42:02Z

I think this one is finally ready to go.

davibarreira · 2021-07-10T21:53:18Z

@devmotion or @zsteve, whenever you guys can, please review this PR.

davibarreira added 6 commits June 1, 2021 21:23

Initianning sikhorn divergence

30de49c

Merge branch 'master' of https://github.com/JuliaOptimalTransport/Opt…

1a03325

…imalTransport.jl into sinkhorndivergence

Sinkhorn divergence implemented

4a1f380

Added PyCall to test dependencies

bdc1b5b

Added tests for sinkhorn divergence

416dcb4

Added Sinkhorn Divergence to docs

f593377

zsteve reviewed Jun 2, 2021

View reviewed changes

devmotion reviewed Jun 2, 2021

View reviewed changes

davibarreira added 11 commits June 3, 2021 11:05

Creating FiniteDiscreteMeasure struct

21d38a8

Modifications:

e17bba5

- Created the struct FiniteDiscreteMeasure, - Implemented two versions of sinkhorn_divergence, - Disabled the use of regularization on sinkhorn_divergence, - Fixed docstring with suggestions.

FixedDiscreteMeasure normalizes the weights to sum 1

10e8849

FixedDiscreteMeasure checks if probabilities are positive

52b3c7a

Created tests for FiniteDiscreteMeasure

7d2924d

Added tests for sinkhorn divergence and finite discrete measure

7cf44a6

Fixed the code for creating cost matrices in the sinkhorn_divergence

4764b00

Added costmatrix.jl to tests

98784c5

Fixed docstring for costmatrix

1fb0fc1

Fixed errors in the tests

808d6ac

Minor fixes in the tests

3415386

davibarreira mentioned this pull request Jun 4, 2021

Finite Discrete Measure #95

Merged

Fixed tests and merged

a7361ff

devmotion reviewed Jun 13, 2021

View reviewed changes

Update Project.toml

5e448f9

Co-authored-by: David Widmann <[email protected]>

davibarreira and others added 6 commits June 16, 2021 19:50

Update src/OptimalTransport.jl

a3a6e9c

Co-authored-by: David Widmann <[email protected]>

Update test/runtests.jl

f10938a

Co-authored-by: David Widmann <[email protected]>

Update src/entropic/sinkhorn.jl

957e7ed

Co-authored-by: David Widmann <[email protected]>

Update src/entropic/sinkhorn.jl

725bee7

Co-authored-by: David Widmann <[email protected]>

removed incorrect file

1768af8

Simplified loop format in the test

cc9bd7a

davibarreira added 2 commits June 30, 2021 14:06

Adjusted function to the new Sinkhorn format

ad5ce9e

Fixed format

e261025

davibarreira mentioned this pull request Jul 10, 2021

Added a warning to the discretemeasure function #105

Closed

zsteve mentioned this pull request Sep 13, 2021

add davibarreira's sinkhorn_divergence with some modifications #145

Merged

davibarreira closed this Oct 28, 2021

	return max(0, OTμν - 0.5 * (OTμμ + OTνν))
	return max(0, OTμν - (OTμμ + OTνν) / 2)

		μ::Union{FiniteDiscreteMeasure, DiscreteNonParametric},
		ν::Union{FiniteDiscreteMeasure, DiscreteNonParametric},

Sinkhorn Divergence #92

Sinkhorn Divergence #92

Conversation

davibarreira commented Jun 2, 2021

coveralls commented Jun 2, 2021 • edited Loading

Pull Request Test Coverage Report for Build 899735377

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

devmotion commented Jun 2, 2021

davibarreira commented Jun 2, 2021

davibarreira commented Jun 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira Jun 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsteve Jun 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira Jun 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira commented Jun 3, 2021 • edited Loading

devmotion commented Jun 3, 2021

davibarreira commented Jun 13, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira commented Jun 17, 2021

codecov-commenter commented Jul 1, 2021

Codecov Report

davibarreira commented Jul 1, 2021

davibarreira commented Jul 10, 2021

coveralls commented Jun 2, 2021 •

edited

Loading

davibarreira Jun 2, 2021 •

edited

Loading

zsteve Jun 2, 2021 •

edited

Loading

davibarreira Jun 2, 2021 •

edited

Loading

davibarreira commented Jun 3, 2021 •

edited

Loading