Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add davibarreira's sinkhorn_divergence with some modifications #145

Merged
merged 28 commits into from
Sep 22, 2021

Conversation

zsteve
Copy link
Member

@zsteve zsteve commented Sep 13, 2021

This PR adds sinkhorn_divergence due to @davibarreira in PR #92, except I have removed the use of FiniteDiscreteMeasure and DiscreteNonParametric, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.

Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain), sinkhorn_divergence now takes advantage of the batched Sinkhorn computation.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.

I have two major suggestions/requests though:

  1. I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems.
  2. Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.

@coveralls
Copy link

coveralls commented Sep 13, 2021

Pull Request Test Coverage Report for Build 1229692504

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 11 of 11 (100.0%) changed or added relevant lines in 4 files are covered.
  • 16 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-2.7%) to 91.697%

Files with Coverage Reduction New Missed Lines %
src/entropic/sinkhorn.jl 16 77.14%
Totals Coverage Status
Change from base Build 1227604199: -2.7%
Covered Lines: 508
Relevant Lines: 554

💛 - Coveralls

@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2021

Codecov Report

Merging #145 (f4d6474) into master (cc3ab16) will decrease coverage by 9.14%.
The diff coverage is 30.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #145      +/-   ##
==========================================
- Coverage   94.42%   85.27%   -9.15%     
==========================================
  Files          12       14       +2     
  Lines         538      618      +80     
==========================================
+ Hits          508      527      +19     
- Misses         30       91      +61     
Impacted Files Coverage Δ
src/entropic/sinkhorn_divergence.jl 0.00% <0.00%> (ø)
src/entropic/sinkhorn_stabilized.jl 100.00% <ø> (ø)
src/entropic/symmetric.jl 0.00% <0.00%> (ø)
src/entropic/sinkhorn.jl 100.00% <100.00%> (ø)
src/entropic/sinkhorn_epsscaling.jl 95.23% <100.00%> (ø)
src/entropic/sinkhorn_gibbs.jl 100.00% <100.00%> (ø)
src/entropic/sinkhorn_solve.jl 100.00% <100.00%> (ø)
src/utils.jl 93.02% <0.00%> (+4.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc3ab16...f4d6474. Read the comment docs.

@davibarreira
Copy link
Member

This PR adds sinkhorn_divergence due to @davibarreira in PR #92, except I have removed the use of FiniteDiscreteMeasure and DiscreteNonParametric, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.

Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain), sinkhorn_divergence now takes advantage of the batched Sinkhorn computation.

Thanks, @zsteve, for updating the PR. Why did you remove the use of FiniteDiscreteMeasure? I have to catch up to the recent changes you guys made.

@davibarreira
Copy link
Member

Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.

I have two major suggestions/requests though:

1. I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems.

2. Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.

@devmotion what is this fixed points iterations? If I remember correctly, the paper of Feydy used Sinkhorn with fixed number of iterations. Is this what you are referring to?

@devmotion
Copy link
Member

devmotion commented Sep 13, 2021

No, I am not referring to fixed number of iterations. They use a fixed point iteration for the symmetric case that is different from the standard Sinkhorn algorithm and exploits the additional structure of the problem.

@zsteve
Copy link
Member Author

zsteve commented Sep 13, 2021

This PR adds sinkhorn_divergence due to @davibarreira in PR #92, except I have removed the use of FiniteDiscreteMeasure and DiscreteNonParametric, etc. For now the implementation takes measures as vectors of weights, and precomputed cost matrices as supplied. A future PR could implement wrappers that take generic distributions as inputs.
Notably, in the case where the support is fixed and there is a common cost matrix between source and target measures (as is the case on a fixed discrete domain), sinkhorn_divergence now takes advantage of the batched Sinkhorn computation.

Thanks, @zsteve, for updating the PR. Why did you remove the use of FiniteDiscreteMeasure? I have to catch up to the recent changes you guys made.

I think that was either done earlier (not exactly sure when? or possibly never merged?). For the time being perhaps best to keep as-is (specify vectors of weights, and for empirical measures the locations are encoded in the cost matrix). More than happy to reintroduce FiniteDiscreteMeasure though, and can then e.g. write wrapper API.

@davibarreira
Copy link
Member

My bad, I think the implementation was DiscreteNonParametric.

@zsteve
Copy link
Member Author

zsteve commented Sep 13, 2021

My bad, I think the implementation was DiscreteNonParametric.
Ah ok. then that's from Distributions.jl right?

@davibarreira
Copy link
Member

My bad, I think the implementation was DiscreteNonParametric.
Ah ok. then that's from Distributions.jl right?

I'm reading the code again, it has been quite some time. But for now, the way you coded seems fine. I can then submit another PR later if that's the case.

Copy link
Member

@davibarreira davibarreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only issue I found was with the regularization, which seems that it's not been used.

src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
@zsteve
Copy link
Member Author

zsteve commented Sep 13, 2021

Ok, I think I've worked out the update for the symmetric Sinkhorn (Eq. 25 of Feydy et al.), but in the variables u = exp(f/eps).

log(u) <- (log(u) + log(K*(mu*u)))/2

where K is the Gibbs kernel. This should probably (?) be implemented e.g. as a SymmetricSinkhorn problem through the solve! API. What do you think @devmotion @davibarreira

@zsteve
Copy link
Member Author

zsteve commented Sep 13, 2021

P.S. as part of testing autodiff with sinkhorn_divergence, added a short example that demonstrates Figure 1 of Feydy et al.
image

examples/empirical_sinkhorn_div/script.jl Outdated Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Outdated Show resolved Hide resolved
@davibarreira
Copy link
Member

P.S. as part of testing autodiff with sinkhorn_divergence, added a short example that demonstrates Figure 1 of Feydy et al.
image

Just to guarantee that we are in the same page, we are talking about the paper "Interpolating between optimal transport and mmd using sinkhorn divergences", correct?

@zsteve
Copy link
Member Author

zsteve commented Sep 13, 2021

yes, this one: http://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf
p.s. I guess by "demonstrates", it shows the same kind of result. not reproducing the exact figure.

@devmotion
Copy link
Member

Ok, I think I've worked out the update for the symmetric Sinkhorn (Eq. 25 of Feydy et al.), but in the variables u = exp(f/eps).

log(u) <- (log(u) + log(K*(mu*u)))/2

where K is the Gibbs kernel. This should probably (?) be implemented e.g. as a SymmetricSinkhorn problem through the solve! API. What do you think @devmotion @davibarreira

I think SymmetricSinkhorn might be a good approach but I'd suggest not exporting it (I don't think it's very useful as a standalone algorithm). I would assume that it would be numerically more stable to perform the calculations in log space, i.e., to work with f (or f / eps) instead of u (if this is what you suggested). I.e. to use

$$f <- (f .- eps .* vec(logsumexp(log.(mu) .+ f ./ eps .- C' ./eps; dims=1))) ./ 2$$

or

$$g <- (g .- vec(logsumexp(log.(mu) .+ g .- C' ./eps; dims=1))) ./ 2$$

where g = f / eps.

@zsteve
Copy link
Member Author

zsteve commented Sep 13, 2021

Agree with not exporting. In terms of the fixed point iteration, the current sinkhorn_gibbs algorithm doesn't do computation in log-space. I guess the upshot of that approach is that you can use BLAS for the multiplication with the kernel matrix K. Probably doesn't hurt to implement both, though.

@zsteve
Copy link
Member Author

zsteve commented Sep 14, 2021

I've implemented SymmetricSinkhorn now and changed sinkhorn_divergence to take advantage of it for the correction terms. Running a few tests with debug output shown reveals that indeed SymmetricSinkhornGibbs converges an order of magnitude fewer iterations than SinkhornGibbs, so thanks @devmotion for pointing that out!

The scheme derived in Feydy et al. uses a reference measure \mu \otimes \nu, whereas the reference measure used by sinkhorn2, etc. is \ones \otimes \ones. From Proposition 4.2 of Computational Optimal Transport (G. Peyre), the choice of reference measure is irrelevant but just amounts to a choice of scaling. In the Sinkhorn divergence formula, switching from one reference measure to another produces terms that cancel out.

Where u = v are the dual variables used by the standardSinkhornGibbs, the symmetric-case fixed point iteration scheme of Feydy should be

u <- sqrt.( u .* mu/(K * u))

where u = exp.(f/eps) and f is as in Feydy et al. I've done some pen-and-paper checking of the math behind this, but please feel free also to verify this for yourself.

Another thing that I notice is that sinkhorn2 with regularization = true computes in fact the full transport plan \gamma. When we have the dual variables (u, v), one can in fact compute the loss (= cost + regulariser) by only doing matrix-vector products.

sinkhorn_loss = eps*(dot(u .* log.(u), K*v) + dot(v.*log.(v), K'*u))

When regularizer = true, this should be a more efficient computation as it avoids allocation of the transport plan matrix. As it is currently implemented, we still incur a vector allocation from K'*u, since K'*u is computed throughout the Sinkhorn loop, but its output is not stored in a temp variable. I think it's fine to leave it as-is, since the alternative would be to preallocate another temp variable that's only used once.

@zsteve
Copy link
Member Author

zsteve commented Sep 14, 2021

Nice, I wanted to update the PR this week but since I am a bit busy I'm happy that you took it up.

I have two major suggestions/requests though:

  1. I think we should allow both with and without regularization. POT includes the regularization as well as Feydy et al whereas Genevay et al didn't it seems.

This is now addressed by the regularization argument, which is no longer ignored. The unit tests check for both cases, regularization in (true, false)

  1. Probably we don't want to use standard Sinkhorn algorithm for the symmetric cases but instead the fixed point iterations in Feydy et al which apparently converge much faster.

This is now addressed by the SymmetricSinkhorn algorithm.

examples/empirical_sinkhorn_div/Project.toml Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Outdated Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn.jl Outdated Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Outdated Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Outdated Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Outdated Show resolved Hide resolved
examples/empirical_sinkhorn_div/script.jl Show resolved Hide resolved
src/entropic/sinkhorn_divergence.jl Outdated Show resolved Hide resolved
src/entropic/symmetric.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn_gibbs.jl Outdated Show resolved Hide resolved
src/entropic/sinkhorn_gibbs.jl Outdated Show resolved Hide resolved
src/entropic/symmetric.jl Outdated Show resolved Hide resolved
@zsteve
Copy link
Member Author

zsteve commented Sep 21, 2021

If everything looks good so far, will bump the version and merge once I get a chance.
@devmotion any idea what is going on with some of the failing actions? e.g. CI Julia 1/Ubuntu x64 fails with the following:

Run coverallsapp/github-action@master
Using lcov file: lcov.info
Error: Bad response: 405 <html>
<head><title>405 Not Allowed</title></head>

I'll look into it in the next few days since I'm a little busy rn, but wondering if there's an easy fix.

@devmotion
Copy link
Member

I think coveralls was down for maintenance.

@zsteve zsteve merged commit 803274d into master Sep 22, 2021
@zsteve zsteve deleted the sinkhorn_divergence branch September 22, 2021 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants