Rework `cat` implementation #18

lkdvos · 2025-02-04T13:10:20Z

I cooked up a minimal version of the parts of a small adaptation to the Base implementation of cat with more well-defined entry-points for customization, and which should be friendly on both inference and invalidations because the dispatching is happening in steps.

Let me know how you feel about this and we can think about the most convenient way of integrating this into the rest of the framework.

codecov · 2025-02-04T13:11:42Z

Codecov Report

Attention: Patch coverage is 86.27451% with 7 lines in your changes missing coverage. Please review.

Project coverage is 81.56%. Comparing base (8371a5e) to head (11f2013).

Files with missing lines	Patch %	Lines
src/concatenate.jl	85.71%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
+ Coverage   72.89%   81.56%   +8.67%     
==========================================
  Files           9       10       +1     
  Lines         321      293      -28     
==========================================
+ Hits          234      239       +5     
+ Misses         87       54      -33

Flag	Coverage Δ
docs	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mtfishman · 2025-02-04T15:01:02Z

I think it is ok, I see that you tried to stick closer to the implementation of Base.cat. However, I think I would prefer a strategy more like the design of Base.Broadcast.Broadcasted, which as you know stores the expression and a style based on combining the styles of the arguments. Having to overload cat_similar(::Type{T}, shape, args...) to get the destination object isn't ideal since you still have the issue of having many potential combinations of arguments to deal with.

I think we can have some compromise between the two approaches, for example:

function cat(dims, args...)
  # `instantiate` reduces over the `args` and determines a common `CatStyle`,
  # returning a new `Catted` with the `CatStyle` stored.
  return copy(instantiate(Catted(dims, args...)))
end

function Base.copy(c::Catted)
  dest = similar(c)
  copyto!(dest, c)
  return dest
end

function Base.copyto!(dest::AbstractArray, c::Catted)
  cat!(dest, axes(c), c.dims, c.args...)
  return dest
end

function cat!(dest, shape, catdims, args...)
  # Implementation of `cat!`.
end

lkdvos · 2025-02-05T13:00:19Z

I initially started with a design similar to Base.Broadcast, but quickly realized that it looks like a huge overkill for this specific situation.
Given that we really only want need to make sure the destination has the right type, my current approach is the minimal amount of changes (and thus code to test, document, understand and maintain) I could come up with to make that happen.
In particular, I like that this could reuse the interfaces for defining an implementation of @interface cat_similar, so we don't have to introduce a new style, new combination rules, etc.

I agree that sure, the Broadcast approach is more general, more customizable and more flexible, but it's also more complicated and I am not convinced we really need that. There's this quote about "every line of code you write is a potential bug", which I think could be applicable here :).

Concretely, I would suggest we avoid the Catted object and keep the current structure, but have a default implementation for cat_similar(::Type{T}, shape, args...) = cat_similar(interface(args...), T, shape, args...). This keeps the simplicity and still has the flexibility?

mtfishman · 2025-02-05T13:24:51Z

Concretely, I would suggest we avoid the Catted object and keep the current structure, but have a default implementation for cat_similar(::Type{T}, shape, args...) = cat_similar(interface(args...), T, shape, args...). This keeps the simplicity and still has the flexibility?

I'm a bit concerned about reusing interface for this purpose, does that really output the correct type or give the right level of customization in general (say when mixing different kinds of sparse types)?

lkdvos · 2025-02-05T14:00:10Z

It's possible, but then we can still specialize on the types. Also, since we currently don't have that many different sparse types in place, I would much rather have a simple solution that works for now, which we can fix later if need be, instead of a more complicated one that may account for future options, but may also still have to be fixed later because we cannot foresee all possibilities without actual examples.

mtfishman · 2025-02-05T19:19:43Z

I still personally prefer a solution based around a Catted object (like the current implementation in the package, though of course it doesn't have to follow that code exactly), I don't think it is so complicated to implement and is a way to help organize the code. For now we can forgo defining CatStyle to keep things simpler, for example use interface as a customization point as you suggested (Catted could store the result of interface(args...), for example).

lkdvos · 2025-02-11T16:03:06Z

I reworked the implementation to include a Concatenated object (this somehow felt like a better name than Catted, but I'm happy to switch it out). Let me know how you feel about the general implementation here, and I'll try to hook it into the rest of the machinery?

mtfishman · 2025-02-11T16:16:05Z

I like the new names, Concatenated definitely looks better than Catted.

What do you have in mind for how to incorporate this into the rest of the package? I'm curious what you think of: #12 (comment). It seems like those broader questions around what this package should be and how opinionated it should be has implications for what we decide to do with these cat methods. If we go the route of having this package be very unopinionated and not really having any implementations, it might then make sense for this PR to be put in a separate package Concatenated.jl, more analogous to https://github.com/ITensor/MapBroadcast.jl, and then other packages like SparseArraysBase.jl can opt-in to that implementation of cat, say by forwarding Base.cat/Base._cat to Concatenated.concatenate or @interface ConcatenatedArrayInterface() Base.cat (which could be defined as Concatenated.concatenate).

lkdvos · 2025-02-11T16:55:05Z

I think I would lean towards having defaults that aren't opinionated, while providing tools for customization.
It seems fair to me to have this machinery as an opt-in option, indeed by having Base._cat(dims, args...) = concatenate(args...) and then using the concatenation-specific implementation. The annoying part is mostly that there is no way to fix the Base.cat(::Array1, ::Array2) ambiguities that arise from definitions like cat(::Array1, ::AbstractArray), except for manually dispatching the offending signatures to concatenate.
In some sense, it would almost be more convenient to start using concatenate instead of cat, but that might be too much to ask too...

I'm not very enthusiastic about splitting this off into a separate package. This has the drawback that it would then require us to also add the logic of having interfaces and interface promotion rules there, or simply depend on this package anyways. I fear that if we need to have Interface, CatStyle, BroadcastStyle, and support rules for combining them, that becomes a bit much, and if we need to depend on this package anyways, then I don't think there's that much to gain from splitting it off.

Another somewhat unfortunate factor is that these kinds of solutions don't follow the same format as the other functions, and it's a bit hard to try and fit it into the @derive signature. This doesn't need to be a problem, I'm definitely okay with just manually writing Base._cat(dims, args...) = DerivableInterfaces.concatenate(args...; dims) (In fact, I even prefer the more verbose way), since this makes it very clear what is going on and where.

mtfishman · 2025-02-11T17:13:12Z

I think I would lean towards having defaults that aren't opinionated, while providing tools for customization. It seems fair to me to have this machinery as an opt-in option, indeed by having Base._cat(dims, args...) = concatenate(args...) and then using the concatenation-specific implementation. The annoying part is mostly that there is no way to fix the Base.cat(::Array1, ::Array2) ambiguities that arise from definitions like cat(::Array1, ::AbstractArray), except for manually dispatching the offending signatures to concatenate. In some sense, it would almost be more convenient to start using concatenate instead of cat, but that might be too much to ask too...

Yeah, I think that would be a bit too much (it would be unfortunate for someone to call Base.cat on a SparseArrayDOK and have it use a bad or broken Base implementation...), but we can think of it as a proposal for a potential new version of Base.cat, like how ArrayLayouts.jl started out as a proposed rewrite of parts of LinearAlgebra.jl.

I'm not very enthusiastic about splitting this off into a separate package. This has the drawback that it would then require us to also add the logic of having interfaces and interface promotion rules there, or simply depend on this package anyways. I fear that if we need to have Interface, CatStyle, BroadcastStyle, and support rules for combining them, that becomes a bit much, and if we need to depend on this package anyways, then I don't think there's that much to gain from splitting it off.

Right, for now I was picturing that if we moved it to a separate package we could depend on this package and the interface mechanism for inferring the output interface/type. I think the gain is conceptual/organizational, I just don't want this package growing into a "Base alternative", which you already identified it was doing a bit too much already.

Another somewhat unfortunate factor is that these kinds of solutions don't follow the same format as the other functions, and it's a bit hard to try and fit it into the @derive signature. This doesn't need to be a problem, I'm definitely okay with just manually writing Base._cat(dims, args...) = DerivableInterfaces.concatenate(args...; dims) (In fact, I even prefer the more verbose way), since this makes it very clear what is going on and where.

As you say, I think that is just the nature of the fact that Base.cat isn't good to overload, and is unavoidable right now (unless we define an @interface version of Base._cat but I'm not a big fan of that approach...).

I'm ok keeping it in this package right now, but lets keep it in mind going forward. I would want to have some "philosophy" or guiding principle for what goes in this package vs. in a separate package. Maybe the principle is that if it implements functionality that is close to functionality in Base but provides better customization points based around interface, it can go in this package.

src/cat.jl

mtfishman · 2025-02-12T00:17:10Z

@lkdvos this looks good to me, what are the next steps here?

lkdvos · 2025-02-12T15:23:43Z

Since this one is breaking, I would suggest either merging but not yet tagging, or slightly holding off on merging, because maybe we can simultaneously alter the AbstractArrayInterface having too many different interpretations, as mentioned in #12 .
Concretely, I would suggest changing the following breaking changes before tagging a release:

remove AbstractArrayOps
new concatenate API
AbstractArrayInterface rework

mtfishman · 2025-02-12T15:28:40Z

Sure, that sounds reasonable to do all of those in a single breaking release. So, up to you if you want to merge this and then work on those next, or keep this unmerged and building off of it (I think I prefer not merging this yet since if there are non-breaking things we want to fix in the meantime it would make that easier).

lkdvos · 2025-02-12T15:54:51Z

I'm okay with keeping this open, the other changes should not depend on this anyways

src/concatenate.jl

test/SparseArrayDOKs.jl

mtfishman · 2025-02-18T21:55:49Z

src/concatenate.jl

+# copy_or_fill!(A, inds, x) = fill!(view(A, inds...), x)
+# copy_or_fill!(A, inds, x::AbstractArray) = (A[inds...] = x)
+
+zero!(x::AbstractArray) = fill!(x, zero(eltype(x)))


It seems like we are going to end up with a few zero! definitions in various places. Should we define DerivableInterfaces.zero! which is used here and in SparseArraysBase.jl?

As we've discussed, SparseArraysBase.jl currently uses ArrayLayouts.zero!, but maybe it would be better to have DerivableInterfaces.zero! which we use in our own code, and then types that want to use ArrayLayouts.jl can overload ArrayLayouts.zero! (which could just forward to DerivableInterfaces.zero!).

I definitely think that's a good idea, it's strange to me that it doesn't exist in Base anyways.

mtfishman assigned lkdvos Feb 5, 2025

mtfishman mentioned this pull request Feb 7, 2025

Docstring and skeleton updates #12

Open

mtfishman reviewed Feb 11, 2025

View reviewed changes

src/cat.jl Outdated Show resolved Hide resolved

lkdvos force-pushed the cat-2 branch from 30aa206 to bfdd4a0 Compare February 11, 2025 18:47

lkdvos added 12 commits February 18, 2025 07:35

Add cat proposition

4a7a52c

Formatter

efd288c

Rework cat to include Concatenated object

399b82f

Fix some docstrings

e29b271

simplify some functions

3ad0b84

remove out-of-date comments

13e97a9

rename and include file

4363c3c

small simplifications

1889807

remove previous implementation

00c30a2

remove arraytype in favor of similar(interface, T, ax)

eff68ce

Various fixes

0293917

formatter

d074033

lkdvos force-pushed the cat-2 branch from a278432 to d074033 Compare February 18, 2025 12:36

mtfishman reviewed Feb 18, 2025

View reviewed changes

src/concatenate.jl Outdated Show resolved Hide resolved

mtfishman reviewed Feb 18, 2025

View reviewed changes

src/concatenate.jl Outdated Show resolved Hide resolved

mtfishman reviewed Feb 18, 2025

View reviewed changes

src/concatenate.jl Outdated Show resolved Hide resolved

Avoid Base._copy_or_fill!

11f2013

mtfishman reviewed Feb 18, 2025

View reviewed changes

test/SparseArrayDOKs.jl Show resolved Hide resolved

lkdvos added 3 commits February 18, 2025 11:24

rename concatenate(!) to cat(!)

4c70818

Add comment about overloading Base._cat

af8c0ed

remove specialization points and simplify

e4f0d89

lkdvos force-pushed the cat-2 branch from 4e2a68c to e4f0d89 Compare February 18, 2025 19:00

lkdvos mentioned this pull request Feb 18, 2025

[WIP] v0.3 refactor #28

Draft

4 tasks

mtfishman reviewed Feb 18, 2025

View reviewed changes

Add zero!

363b7b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework `cat` implementation #18

Rework `cat` implementation #18

lkdvos commented Feb 4, 2025

codecov bot commented Feb 4, 2025 •

edited

Loading

mtfishman commented Feb 4, 2025

lkdvos commented Feb 5, 2025

mtfishman commented Feb 5, 2025

lkdvos commented Feb 5, 2025

mtfishman commented Feb 5, 2025

lkdvos commented Feb 11, 2025

mtfishman commented Feb 11, 2025 •

edited

Loading

lkdvos commented Feb 11, 2025

mtfishman commented Feb 11, 2025

mtfishman commented Feb 12, 2025

lkdvos commented Feb 12, 2025

mtfishman commented Feb 12, 2025

lkdvos commented Feb 12, 2025

mtfishman Feb 18, 2025

lkdvos Feb 18, 2025

Rework cat implementation #18

Are you sure you want to change the base?

Rework cat implementation #18

Conversation

lkdvos commented Feb 4, 2025

codecov bot commented Feb 4, 2025 • edited Loading

Codecov Report

mtfishman commented Feb 4, 2025

lkdvos commented Feb 5, 2025

mtfishman commented Feb 5, 2025

lkdvos commented Feb 5, 2025

mtfishman commented Feb 5, 2025

lkdvos commented Feb 11, 2025

mtfishman commented Feb 11, 2025 • edited Loading

lkdvos commented Feb 11, 2025

mtfishman commented Feb 11, 2025

mtfishman commented Feb 12, 2025

lkdvos commented Feb 12, 2025

mtfishman commented Feb 12, 2025

lkdvos commented Feb 12, 2025

mtfishman Feb 18, 2025

Choose a reason for hiding this comment

lkdvos Feb 18, 2025

Choose a reason for hiding this comment

Rework `cat` implementation #18

Rework `cat` implementation #18

codecov bot commented Feb 4, 2025 •

edited

Loading

mtfishman commented Feb 11, 2025 •

edited

Loading