Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map accepting row functions #290

Closed
davibarreira opened this issue Sep 17, 2024 · 6 comments
Closed

Map accepting row functions #290

davibarreira opened this issue Sep 17, 2024 · 6 comments
Labels
discussion help wanted Extra attention is needed refactor Refactor existing code.

Comments

@davibarreira
Copy link

davibarreira commented Sep 17, 2024

It seems that the Map transform does not accept row functions, for example:

Map(row->row.a + row.b => :NewColumnName)
Map(row->sum(row) => :NewColumnName)

Is this by design? I've taken a look at the implementation of how Map is applied, and I noticed that the data is first transformed into column form, and then a generator is produced. Isn't the Map transform always applied row-wise? Is there a reason to parsing the table column-wise instead?

Using the ColumnSelectors.AllSelector type, one can place an if and apply the function row-wise, without modifying much of the current implementation. But this is not a perfect solution, as it produces errors if the user actually pass an AllSelector and writes the function with (a,b,c,d) containing all columns.

@juliohm
Copy link
Member

juliohm commented Sep 17, 2024

Good question. We are effectively mapping a function to all rows of the table, so we could potentially consider the proposed syntax. However, not all row objects support the syntax row.a so we need to make sure that the table type is wrapped into a new table type with this guarantee.

@eliascarv do you have any comment regarding this proposal? At a quick glance, it can drastically simplify the Map syntax.

@juliohm juliohm added help wanted Extra attention is needed discussion refactor Refactor existing code. labels Sep 17, 2024
@davibarreira
Copy link
Author

davibarreira commented Sep 17, 2024

If you don't want to change the core of the code, here is a simple solution I mentioned. Inside the applyfeat for map.jl, you can add the following:

  mapped = map(selectors, funs, targets) do selector, fun, target
    snames = selector(names)
    newname = isnothing(target) ? _makename(snames, fun) : target
    
    #####################
    if selector isa AllSelector
      newcolumn = map(fun, Tables.rows(cols))
      return newname => newcolumn
    end
    #####################
    
    scolumns = (Tables.getcolumn(cols, nm) for nm in snames)
    newcolumn = map(fun, scolumns...)
    newname => newcolumn
  end

Then, create some more dispatches for the _extract in the map.jl:

# utility types
const TargetName = Union{Symbol,AbstractString}
const PairWithTarget = Pair{<:Any,<:Pair{<:Function,<:TargetName}}
const PairWithoutTarget = Pair{<:Any,<:Function}
const PairFunctionTarget = Pair{<:Function,<:TargetName}
const MapPair = Union{PairWithTarget,PairWithoutTarget,PairFunctionTarget,Function}

# utility functions
_extract(p::PairWithTarget) = selector(first(p)), first(last(p)), Symbol(last(last(p)))
_extract(p::PairWithoutTarget) = selector(first(p)), last(p), nothing
_extract(p::PairFunctionTarget) = AllSelector(), first(p), Symbol(last(p))
_extract(p::Function) = AllSelector(), p, nothing

With this, both examples below work:

Map((row->row.a + row.b) => :NewColumnName)
Map(row->sum(row) => :NewColumnName)

The issue would be that Map(((a,b,c,d)->a + b + c +d) => :NewColumnName) would error.

@eliascarv
Copy link
Member

@davibarreira, after discussion, perhaps the best decision would be to change the current syntax to this:

Map(fun, ...)
Map(fun => target, ....)

Where fun is a function that receives a row and returns a value: fun(row) -> value.

For example, the following code:

Map(:a => sin, "b" => cos => :cos_b)

Would be written like this:

Map((row -> sin(row.a)), (row -> cos(row.b)) => :cos_b)

Can you do a PR to investigate this please?

@juliohm
Copy link
Member

juliohm commented Sep 17, 2024

To clarify, this PR would be investigative. We can compare the pros and cons of the two approaches and decide later if the breaking change is worth.

@davibarreira
Copy link
Author

I've implemented the solution I mentioned and submitted the PR. You guys can check if you think it properly addresses the issue, or if it is better to keep as it is now.

@juliohm
Copy link
Member

juliohm commented Sep 23, 2024

Thank you @davibarreira for experimenting with this idea. Closing the issue.

@juliohm juliohm closed this as completed Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion help wanted Extra attention is needed refactor Refactor existing code.
Projects
None yet
Development

No branches or pull requests

3 participants