当前inplace的kernel计算逻辑是否存在问题？ #70853

USTCKAY · 2025-01-16T03:00:51Z

请提出你的问题 Please ask your question

#67009 中为了避免infer meta阶段错误修改了输入的Meta信息而在infermeta之前做了临时DenseTensor的浅copy，给的代码示例如下：

  Tensor& api_output = x;  // <---- inplace 机制下，out 直接用的x，会导致infermeta后 x 的meta会覆盖
  auto kernel_out = SetKernelOutput(&api_output);

  phi::RecordEvent *infer_shape_record_event = nullptr;
  if(phi::RecordEvent::IsEnabled()){
    infer_shape_record_event = new phi::RecordEvent("reshape infer_meta", phi::TracerEventType::OperatorInner, 1);
  }

  auto origin_input_x = *input_x;     // <----- 此处浅copy了输入Tensor
  phi::MetaTensor meta_out(kernel_out, kernel_result.is_stride_kernel);

  phi::ReshapeInferMeta(MakeMetaTensor(origin_input_x), shape, &meta_out); // <--- 使用origin_input_x来infermeta

  if(infer_shape_record_event != nullptr){
    delete infer_shape_record_event;
  }
  using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, const phi::IntArray&, phi::DenseTensor*);
  auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
  phi::RecordEvent* kernel_record_event = nullptr;
  if(phi::RecordEvent::IsEnabled()){
    kernel_record_event = new phi::RecordEvent("reshape compute", phi::TracerEventType::OperatorInner, 1);
  }
    (*kernel_fn)(*dev_ctx, origin_input_x, phi::IntArray(shape), kernel_out);
  if(kernel_record_event != nullptr){
    delete kernel_record_event;
  }

但其中似乎存在问题，即调用kernel时的语句 (*kernel_fn)(*dev_ctx, origin_input_x, phi::IntArray(shape), kernel_out); 的输入使用的是浅拷贝后的tensororigin_input_x，输出是原tensor的指针kernel_out，这就导致实际上输入输出并不是同一个tensor（虽然共享allocation），这会给kernel中的一些inplace判断带来麻烦。所以在调用kernel时是否应该将输入改为原tensor，即通过 (*kernel_fn)(*dev_ctx, *input_x, phi::IntArray(shape), kernel_out); 来调用？辛苦@Aurelius84 帮忙解答一下，谢谢！

The text was updated successfully, but these errors were encountered:

USTCKAY · 2025-01-16T03:04:00Z

虽然通过DenseTensor::IsSharedWith()来判断底层Allocation是否相同能替代原来的inplce check，但毕竟输入输出tensor不是同一个，可能会带来别的问题

wanghuancoder · 2025-01-17T02:05:56Z

auto origin_input_x = *input_x;
这种写法是很早做inplace的同学写的，无法追究最早设计的渊源，但我猜测原因是：
对于inplace动作，无论是InferMeta还是Kernel执行阶段都会对output做很多修改，这种修改可能是meta的修改也可能是Allocation内容的修改，还有可能是给output更换一个新的Allocation，比如：x = paddle.rand([2,2],dtype="float16"); x.cast_("float32")。
Kernel计算过程中读取input的meta或者Allocation，的目的都是读取老的数据，如果被inplace篡改为新的数据，又被Kernel当成老的数据用，就会出问题。因此当时的设计者使用auto origin_input_x = *input_x;做老的数据备份。

我认为这个设计暂时不能被改掉。你还是得想别的办法确认是否处于inplace模式下。最好的办法就是刚刚进入Kernel时，做Allocation指针是否相等的判断。

USTCKAY · 2025-01-17T02:13:44Z

auto origin_input_x = *input_x; 这种写法是很早做inplace的同学写的，无法追究最早设计的渊源，但我猜测原因是：对于inplace动作，无论是InferMeta还是Kernel执行阶段都会对output做很多修改，这种修改可能是meta的修改也可能是Allocation内容的修改，还有可能是给output更换一个新的Allocation，比如：x = paddle.rand([2,2],dtype="float16"); x.cast_("float32")。 Kernel计算过程中读取input的meta或者Allocation，的目的都是读取老的数据，如果被inplace篡改为新的数据，又被Kernel当成老的数据用，就会出问题。因此当时的设计者使用auto origin_input_x = *input_x;做老的数据备份。

我认为这个设计暂时不能被改掉。你还是得想别的办法确认是否处于inplace模式下。最好的办法就是刚刚进入Kernel时，做Allocation指针是否相等的判断。

我理解保存原始meta信息的初衷，所以我没有说infermeta时也用原tensor。我的意思是，做infermeta时仍用origin_input_x ，调用kernel计算时用原tensor *input_x，这样是否可行呢？毕竟调用kernel计算时已不会再修改meta信息，那使用与origin_input_x Allocation一致仅meta信息不同的 *input_x应当是没有问题的？

USTCKAY added status/new-issue 新建 type/question 用户提问 labels Jan 16, 2025

paddle-bot bot assigned wangguan1995 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

当前inplace的kernel计算逻辑是否存在问题？ #70853

当前inplace的kernel计算逻辑是否存在问题？ #70853

USTCKAY commented Jan 16, 2025 •

edited

Loading

USTCKAY commented Jan 16, 2025

wanghuancoder commented Jan 17, 2025

USTCKAY commented Jan 17, 2025

当前inplace的kernel计算逻辑是否存在问题？ #70853

当前inplace的kernel计算逻辑是否存在问题？ #70853

Comments

USTCKAY commented Jan 16, 2025 • edited Loading

请提出你的问题 Please ask your question

USTCKAY commented Jan 16, 2025

wanghuancoder commented Jan 17, 2025

USTCKAY commented Jan 17, 2025

USTCKAY commented Jan 16, 2025 •

edited

Loading