Unique() inconsistencies #621

ClaudiaComito · 2020-07-07T06:04:03Z

Description
While looking into #564, I found a number of inconsistencies in ht.unique().

Add clarification to documentation of unique() #564 is no documentation issue.
- The kwarg sorted is set to False by default for ht.unique(), but it's True by default for torch.unique(). Currently, ht.unique(a) results in a sorted DNDarray if a.split=None (pure torch implementation: ht.unique(a) = torch.unique(a._DNDarray__array)), whereas if a.split is not None, the result will not be sorted (heat implementation).
- Replacing ht.unique(a) = torch.unique(a._DNDarray__array) with
  ht.unique(a) = torch.unique(a._DNDarray__array, sorted=sorted) doesn't help, because sorted=False means different things for heat and torch:
  - heat interpretation: leave result unsorted;
  - torch interpretation: leave result REVERSE SORTED. See discussion on Add clarification to documentation of unique() #564 for an example.
    I propose setting sorted=True by default in ht.unique() as at the moment it's the only way to prevent inconsistencies with torch, although I'm aware that the sorting comes with significant overhead. Incidentally, numpy.unique() returns the "sorted unique elements of an array" and not sorting is not even an option.
if return_inverse=True, ht.unique() by design returns a list of one DNDarray (the unique elements) and one torch tensor (the inverse indices). Should be two DNDarrays.
it is currently not possible to run ht.unique(a, sorted=True, axis=axis) if axis != split. Error message:

Sorting with axis != split is not supported yet. See vectorized sorting #363

This needs to be followed up.

To Reproduce
Steps to reproduce the behavior:

Which module/class/function is affected?
manipulations.unique()
What are the circumstances under which the bug appears?
see above
What is the exact error message / erroneous behavior?
see above

Version Info
current main branch

The text was updated successfully, but these errors were encountered:

ClaudiaComito · 2023-08-21T09:31:39Z

Still open and will be fixed with #749

Reviewed within #1109

github-actions · 2023-10-09T09:06:23Z

Branch bugs/621-Unique_inconsistencies created!

ClaudiaComito added the bug Something isn't working label Jul 7, 2020

ClaudiaComito self-assigned this Jul 7, 2020

ClaudiaComito linked a pull request Mar 24, 2021 that will close this issue

Features/unique sort distributed #749

Draft

4 tasks

ClaudiaComito linked a pull request Mar 30, 2021 that will close this issue

Features/unique sort distributed #749

Draft

4 tasks

ClaudiaComito mentioned this issue Feb 8, 2022

Address distributed non-ordered indexing #914

Open

ClaudiaComito added this to the Repo Clean-Up milestone Jul 31, 2023

ClaudiaComito removed this from the Repo Clean-Up milestone Aug 21, 2023

ClaudiaComito added the manipulations label Aug 21, 2023

ClaudiaComito assigned ClaudiaComito and unassigned ClaudiaComito Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unique() inconsistencies #621

Unique() inconsistencies #621

ClaudiaComito commented Jul 7, 2020

ClaudiaComito commented Aug 21, 2023

github-actions bot commented Oct 9, 2023

Unique() inconsistencies #621

Unique() inconsistencies #621

Comments

ClaudiaComito commented Jul 7, 2020

ClaudiaComito commented Aug 21, 2023

github-actions bot commented Oct 9, 2023