Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating and renaming additional functions #63

Closed
zsteve opened this issue May 22, 2021 · 12 comments
Closed

Deprecating and renaming additional functions #63

zsteve opened this issue May 22, 2021 · 12 comments

Comments

@zsteve
Copy link
Member

zsteve commented May 22, 2021

Similar to #45, a lot of functions currently are either named mirroring Python OT's function naming choices, or just ad-hoc as they were added originally. (I agree with ot_cost and ot_plan for unregularised OT as per #45, for the record).

  • Along those lines, for the sinkhorn routines we should use sinkhorn_cost and sinkhorn_plan, etc.
  • What to rename quadreg to? This solves OT regularised with an L2 cost on the transport plan. Currently it's solved using one particular algorithm, but there are other approaches, so I'd prefer not to refer to any particular algorithm in the function name.
    One possibility is to have a general ot_reg_cost and ot_reg_plan wrapper for generalised regularisations. Then the user can specify the regularisation functional and algorithm upon calling. So something like
    ot_reg_cost(mu, nu, C, 0.05; reg_func = "L2", method = "lorenz")
@devmotion
Copy link
Member

I agree, I think a more concise and easier accessible API would be nice. E.g., I guess (actually similar to sinkhorn and sinkhorn2 in POT) it would be good if there is only one sinkhorn_cost and sinkhorn_plan function instead of one for the standard algorithm, the stabilized version and the stabilized version with epsilon scaling.

I am not sure if the API for all regularized optimal transport problems should be unified, maybe it could become too confusing with all the keyword arguments - not all algorithms would support them and their meaning might change depending on the algorithm. Maybe instead of ot_reg_cost(mu, nu, C, 0.05; reg_func = "L2", method = "lorenz") one could use ot_regularized_cost(mu, nu, C, 0.05; regularization=:L2, method=Lorenz(algorithm_specific_options...), general_options...)? Here, general_options could contain e.g. plan. Additionally, maybe one could even make the regularization parameter a mandatory keyword argument, i.e., one without default value, to keep the API closer to the one for exact OT problems.

A different idea would be to adopt the CommonSolve interface used by SciML and also e.g. Roots. Mainly, it would be something along

abstract type AbstractOTProblem end

struct OTProblem <: AbstractOTProblem
    cost::C
	mu::M
    nu::N
end

struct RegularizedOTProblem <: AbstractOTProblem
    cost::C
	mu::M
    nu::N
    regularization::R
	regularization_parameter::P
end

struct L2Regularization end
struct EntropyRegularization end

function EntropicOTProblem(cost, mu, nu, epsilon)
    return RegularizedOTProblem(cost, mu, nu, EntropyRegularization(), epsilon)
end
...

struct StabilizedSinkhorn
    scaling_factor::S
    scaling_steps::Int
....
end

prob = EntropicOTProblem(cost, mu, nu, 0.01)

solve(prob, StabilizedSinkhorn(); ...)

I am just not completely sure yet at which point one should be able to specify that only the cost or plan should be computed. As an option for solve maybe?

BTW I noticed a bug in the epsilon scaling and think regardless of any additional simplifications of the API it should be merged with the standard stabilized version. I'll prepare a PR.

@zsteve
Copy link
Member Author

zsteve commented May 23, 2021

Thanks for the input. I agree that the approach in #66 is problematic for not being extensible (I guess my original thinking was more of a band-aid solution).

Maybe instead of ot_reg_cost(mu, nu, C, 0.05; reg_func = "L2", method = "lorenz") one could use ot_regularized_cost(mu, nu, C, 0.05; regularization=:L2, method=Lorenz(algorithm_specific_options...), general_options...)?

Yes it would be good to separate the problem parameters from the solver details (instead of leaving them in the kwargs). That's something that I can try and incorporate in the existing PRed branch.

Regarding the CommonSolve framework, would you say this should be an additional API to what we have currently (let's say it's PythonOT-esque), or should it replace the current API? I would say it might be good to have the additional API for users who want to work on their own extensions, but keep the simple functionality for standard usage. Another thing I have in mind is differentiability of the output. This isn't possible for a general solver, but it's possible at least for the Sinkhorn algorithm. Hence it would be nice to keep sinkhorn(2) as-is.

Also I wonder how general the API should be -- for instance, should we try to have a common API that can handle 2-marginal but also multimarginal/barycenter problems? Or should these problems have a separate API?

Keen to hear your thoughts on this.

@devmotion
Copy link
Member

Regarding the CommonSolve framework, would you say this should be an additional API to what we have currently (let's say it's PythonOT-esque), or should it replace the current API? I would say it might be good to have the additional API for users who want to work on their own extensions, but keep the simple functionality for standard usage. Another thing I have in mind is differentiability of the output. This isn't possible for a general solver, but it's possible at least for the Sinkhorn algorithm. Hence it would be nice to keep sinkhorn(2) as-is.

I would assume that the main API would be the solve(...) interface since it is more powerful than the current approach and can be extended - you can just add a new problem type and/or algorithm if needed. However, it is still possible to keep convenient functions such as sinkhorn. It would be defined just as

sinkhorn(mu, nu, C, eps) = solve(EntropicOTProblem(C, mu, nu, eps), Sinkhorn())

(or however we want to change the signature of sinkhorn).

In SciML, users differentiate through solve all the time (that's the main point of packages such as DiffEqSensitivity and DiffEqFlux), so this does not depend on the API.

Also I wonder how general the API should be -- for instance, should we try to have a common API that can handle 2-marginal but also multimarginal/barycenter problems? Or should these problems have a separate API?

With the solve interface probably it could just be a special problem such as EnsembleOTProblem or BatchOTProblem or a special algorithm such as EnsembleAlgorithm(algorithm). If there is an internal implementation that works for both cases, with one marginals and multiple marginals, both solve calls could dispatch to it.

@davibarreira
Copy link
Member

I like the way POT.py does things. In my view, the more direct the better. I'm not really a fan of separating the solve.

@devmotion
Copy link
Member

I don't think we have to adopt the solve approach - there might be better choices and, as mentioned above, even if we do one can still keep convenient functions such as sinkhorn etc. However, now that we've moved the Python and POT parts to a separate package, I think the design should be driven mainly by what the best approach is in Julia and not only depend on POT. In particular, we should definitely exploit multiple dispatch which just does not exist (at least not properly) in Python.

@davibarreira
Copy link
Member

I just realized that we haven't settled on some standards regarding order of arguments and such. The new function sinkhorn_divergence(c,mu,nu,epsilon) uses the same order as ot_cost(c,mu,nu) , but the emd2(mu,nu, C) and
sinkhorn2(mu,nu,C,epsilon)` starts with the distributions instead of the cost. I guess me and @devmotion talked about it in one of the PRs, but I can't find it.

Anyways, my suggestion is that we settle on cost in the beginning, such that the new versions of sinkhorn2 and emd2 (as well as all the others), would become, for example, sinkhorn2(C,mu,nu,epsilon). I also suggest that we use C when it's a cost matrix and c when it's a cost function.

Another issue is that mu and nu are sometimes distributions (as in the case of the 1D), and sometimes they are only the support (as in the case of the sinkhorn functions). This is confusing, and we should either rename it for consistency to something such as mu_supp and leave mu for the case when it is a distribution (measure).

Finally, I've realized that Distributions.jl does not have a generic type for Empirical Distributions. The DiscreteNonParametric only works for univariate distributions. So perhaps we can create a type EmpiricalDistributions to add consistency (I actually suggested this in the Distributions.jl GitHub, but don't know if they are interested).

@devmotion
Copy link
Member

Finally, I've realized that Distributions.jl does not have a generic type for Empirical Distributions. The DiscreteNonParametric only works for univariate distributions. So perhaps we can create a type EmpiricalDistributions to add consistency (I actually suggested this in the Distributions.jl GitHub, but don't know if they are interested).

IIRC I defined such a type in StochasticOptimalTransport. Since we don't need the full Distributions interface but only the probabilities and the support I assume, I don't think it's worth adding any proper Distribution and/or maintaining such a package currently.

@davibarreira
Copy link
Member

davibarreira commented Jun 2, 2021

Another option would be to take a look at MeasureTheory.jl. Maybe it covers our needs, but I haven't actually used it. I see that it appeared in the issue you mentioned.

@devmotion
Copy link
Member

devmotion commented Jun 2, 2021

While I like some of the ideas and motivations behind MeasureTheory, I think very strongly that we should not switch to it. Mostly because it is not thorougly tested, it is not well established, it is not well documented, it requires Julia 1.5, it has a huge list of dependencies, and most importantly, I don't see the need for sophisticated measures or distributions, not even from Distributions. We just need something that wraps the support and the probabilities as https://github.com/JuliaOptimalTransport/StochasticOptimalTransport.jl/blob/1f6ee6ad45950423d55dff4f40b4fe334a56547d/src/utils.jl#L1-L10, but we don't need any sophisticated cdf, logpdf, rand etc. definitions. For the 1D discrete measures, it is nice that we can use DiscreteNonParametric whose support is already sorted, but IMO we really don't have to operate with Distributions but can use any structure that fits.

@davibarreira
Copy link
Member

davibarreira commented Jun 2, 2021

Awesome. I think we should go with your implementation, although I think we should just make some small changes, such as add a dimension property, and incorporate the DiscreteNonParametric inside, so we can just use one larger type and dispatch accordingly. Also, we should maybe rename it from DiscreteMeasure to EmpiricalMeasure or FiniteDiscreteMeasure, cause other distributions such as Poisson are Discrete, but are not supported on algorithms such as Sinkhorn.
P.S: I find very odd that Distributions.jl does not have this type of distribution, cause it seems very ubiquitous and useful.

@zsteve
Copy link
Member Author

zsteve commented Jun 4, 2021

I'd comment that in most settings, for the purposes of computing OT we don't even need to know the support, just the probabilities and the cost matrix.

Of course, one would then need to precompute the cost matrix (as we currently do), and that is left up to the user. At least for the basic functionality like (un)regularised OT cost/plan computation, I'd say we don't need the supports.

We could multiple dispatch though, so something like sinkhorn(mu::DiscreteMeasure, nu::DiscreteMeasure) handles the cost matrix computation for the user.

@davibarreira
Copy link
Member

davibarreira commented Jun 4, 2021

In ML (which is mostly where I apply OT), we usually have the empirical distribution (the support is the data instance in a batch, all with equal probability) and not the cost matrix. Besides coding the package, I also use it. In most cases I have the samples, and I have to create the cost matrix. So I'd really like to have a cost_matrix function ready to go :) Although, I can see how this should perhaps be left to the user.

"We could multiple dispatch though, so something like sinkhorn(mu::DiscreteMeasure, nu::DiscreteMeasure) handles the cost matrix computation for the user."
Yeah, I totally agree here. We could dispatch a version where the user only provides the probabilities and the cost function matrix, but I think we should then rename to something like e.g. sinkhorn(C, muprobs, nuprobs), and leave muand nu when they are actual measures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants