Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow operations on scalar tensors that are no-ops #794

Open
reillyeon opened this issue Nov 23, 2024 · 9 comments
Open

Disallow operations on scalar tensors that are no-ops #794

reillyeon opened this issue Nov 23, 2024 · 9 comments
Assignees

Comments

@reillyeon
Copy link
Contributor

reillyeon commented Nov 23, 2024

There are a number of operators, such as transpose() and reshape() which are no-ops when applied to a scalar tensor. In addition there are operators like gather() which expect indices but a scalar doesn't have indices.

While implementations can work around this (typically by treating a scalar as a single-element 1D tensor) it seems better to simply disallow these cases which add complexity without adding any expressivity.

@reillyeon
Copy link
Contributor Author

As an additional data point, both LiteRT and Core ML do not directly support applying their transpose operators to a scalar and implementations based on these frameworks will require workarounds. DirectML only supports this operation because transposition is implemented by adjusting the dimensions and strides of a tensor as it is passed between other nodes, which is a no-op for a scalar tensor.

@reillyeon
Copy link
Contributor Author

We recall that @fdwr had a reason why supporting this was useful for developers, but if so then we should include a more complete discussion of tensor operations on scalars in the specification. Right now there's not much mention that a tensor can have a rank of 0.

@fdwr
Copy link
Collaborator

fdwr commented Nov 26, 2024

I have no qualms about disallowing this for gather indices, and I can contently let go of transpose (because one must balance between "what logically should happen to follow consistent semantics and is mathematically sensible" vs "what is practical to implement in the real world"), but reshape as a more fundamental operation I really feel should follow the axiom that you can always reshape a tensor to its own shape. So if you have an input of sizes = [4,3], then it should be legal to reshape it to sizes = [4,3]; and if you have sizes = [], then you should be able to reshape it to sizes = [] - a consistent rule, orthogonal to scalarity or not. Granted, it seems useless at first, but it does happen in higher levels with generic routines that you encounter these kinds of cases.

I don't have easy access to CoreML or TFLiteRT to try, but I'd be surprised if reshape of a scalar didn't work as expected for them, as TF and PT and NumPy all agree (code at bottom)?

Right now there's not much mention that a tensor can have a rank of 0.

Yeah, we could add more mention. reshape is documented to support rank N (thanks to @inexorabletash's #657), which means 0 through the maximum backend limit ...

operand allowed data types allowed ranks
input any N
output same as input newShape's size

...but we could clarify the existing wording "allowed ranks ... are given as an explicit rank (e.g. 1), or N to allow any dimensionality" to explicitly include 0 when defining N. Note the general expectation is that most operators support scalars by default, except for those operators that don't (rather than treating scalars as a special case that only apply to a subset of operators).

Reshape reference code

# pip freeze tensorflow==2.11.0
import tensorflow as tf

x = tf.constant(42, dtype=tf.float32)
y = tf.reshape(x, [])

print("value:", y)
print("shape:", y.shape)

# value: 42
# shape: []
# pip freeze torch==1.11.0+cpu
import torch

x = torch.tensor(42, dtype=torch.float32)
print("x value:", x)
print("x shape:", x.shape)
print("x dtype:", x.dtype)

y = x.reshape(())
print("y value:", y)
print("y shape:", y.shape)
print("y dtype:", y.dtype)

# x value: tensor(42.)
# x shape: torch.Size([])
# x dtype: torch.float32
# y value: tensor(42.)
# y shape: torch.Size([])
# y dtype: torch.float32
# pip freeze numpy==1.21.6
import numpy
x = numpy.array(42, dtype=numpy.float32)
y = x.reshape([])

print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)

# value: 42.0
# shape: ()
# dtype: float32

@reillyeon
Copy link
Contributor Author

I agree with the argument in favor of reshape(). It was incorrect to refer to it as a no-op since its purpose is to change the shape, rather than the value, of the tensor and changing the shape from a scalar to a single-element tensor is valid.

I'm less sure about other operators. For example gather() probably makes sense if you require that indices is not a scalar as otherwise the output rank calculation would return -1. pad() only makes sense if you implicitly reshape the input to a single-element tensor first.

What I'd like to see are examples of where silently tolerating scalars is useful for developers. This might end up in a similar place as #391.

To start we should go through each case where we allow a rank of N and clarify whether 0 is a valid value of N.

@inexorabletash
Copy link
Member

we could clarify the existing wording "allowed ranks ... are given as an explicit rank (e.g. 1), or N to allow any dimensionality" to explicitly include 0 when defining N

+1 - if this issue turns into a PR we should include that.

@fdwr
Copy link
Collaborator

fdwr commented Jan 16, 2025

Next steps:

  • Enumerate all operators, and determine which:
    • ✅ support scalars
      • any simple elementwise operators should consistently support scalars (note WebNN has a lot of elementwise operators {like hardSwish, relu, softplus...} that were not put under the arbitrary "elementwise" bucket, but they functionally are)
    • ❌ definitely do not
      • require a rank greater than 0
      • include an axis parameter (and since axis must be within 0 to N-1, 0D cannot satisfy that)
    • ❔ are in question here
      • notably any that have parallel fields equal to the input rank (permutations, slice window, new shape, padding values)
  • For ❔, determine which backends support scalars already, which do not, and whether it's worth extra code to wrap that operator in a temporary reshape of 0D to 1D (this already happens in the Chromium DML backend because the tensor description is massaged to 1D before calling DML anyway, but other backends may require actual explicit reshape calls).
    • Note that frameworks, convertors, and model generators might generate some of these constructs as a by-product, even if a user directly calling the API would probably avoid doing so, and so that should be considered too.
    • Shape manipulation operators (like reshape and expand) have higher priority to me to support scalars, since those are more likely to occur as generation byproducts.
Operator Verdict of scalar support
argMin/argMax operations ❌ requires axis
batchNormalization ❌ requires axis
cast ✅ elementwise
clamp ✅ elementwise
concat ❌ requires axis
conv2d ❌ input.rank == 4
convTranspose2d ❌ input.rank == 4
Element-wise binary operations ✅ elementwise
Element-wise logical operations ✅ elementwise
Element-wise unary operations ✅ elementwise
elu ✅ elementwise
expand ❔ a reshape operator
gather ❌ requires axis
gelu ✅ elementwise
gemm ❌ input.rank == 2
gru ❌ input.rank == 3
gruCell ❌ input.rank == 2
hardSigmoid ✅ elementwise
hardSwish ✅ elementwise
instanceNormalization ❌ input.rank == 4
layerNormalization ✅ input.rank == 0 to N
leakyRelu ✅ elementwise
linear ✅ elementwise
lstm ❌ input.rank == 3
lstmCell ❌ input.rank == 2
matmul ❌ input.rank >= 2
pad
Pooling operations ❌ input.rank == 4
prelu ✅ elementwise
Reduction operations ✅ see noop_with_empty_axes for nop cases
relu ✅ elementwise
resample2d ❌ input.rank == 4
reshape
sigmoid ✅ elementwise
slice
softmax ❌ requires axis
softplus ✅ elementwise
softsign ✅ elementwise
split ❌ requires axis
tanh ✅ elementwise
transpose
triangular ❌ input.rank >= 2
where ✅ tri-elementwise

That leaves:

Operator Verdict of scalar support
expand ❔ dwayner - should be consistent with reshape
pad
reshape ❔ dwayner - my POV is yes ✅
slice
transpose ❔dwayner - although logically the backends should just treat an empty permutation as a no-axis nop, it may not be worth the added code to enable that in the WebNN calling layer, especially given the unlikeliness of occurrence

@huningxin
Copy link
Contributor

Re Chromium prototype:

  • expand supports scalar.
  • gather doesn't support scalar. It has MLGatherOptions.axis.
  • pad doesn't support scalar. It may be able to support scalar if empty beginningPadding and endingPadding mean nop.
  • reshape supports scalar.
  • slice doesn't support scalar. It may be able to support scalar if empty starts, ends and strides mean nop.
  • transpose allows scalar, empty permutation is a nop.

@fdwr
Copy link
Collaborator

fdwr commented Jan 17, 2025

gather doesn't support scalar. It has MLGatherOptions.axis.

@huningxin : TY, missed that when filling in table. Updated.

@huningxin
Copy link
Contributor

huningxin commented Jan 22, 2025

For other wave3 gather/scatter ops, according to Chromium prototype:

  • gatherElements doesn't support scalar input and indices. It requires axis and indices.rank == input.rank.
  • gatherND doesn't support scalar input and indices. According to its definition, the last dimension of indices contains indices of elements or slices in input tensor. Both should be at least 1D.
  • scatterElements doesn't support scalar input and indices (same as gatherElements). updates can't be scalar, because it requires updates.shape == indices.shape.
  • scatterND doesn't support scalar input and indices (same as gatherND). updates might be scalar, as updates.rank = indices.rank - 1 + input.rank - indices.shape[-1].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants