Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add paddle support #206

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

HydrogenSulfate
Copy link

@HydrogenSulfate HydrogenSulfate commented Nov 26, 2024

@rgommers
Copy link
Member

Cool, thanks for working on this @HydrogenSulfate!

I am curious to learn a bit more about Paddle. In particular conceptually what is supported - https://www.paddlepaddle.org.cn/documentation/docs/en/guides/jit/index_en.html and a few other guides tell me a bit, but not quite what I was most interested in. A few questions if you don't mind:

  • Is the default execution model eager or lazy/graph-based?
  • It looks like there is a JIT compiler, what's the syntax and does it work similar to, for example, jax.jit or torch.compile?
  • Is item and slice assignment supported via __setitem__? And indexing with boolean mask?
  • Is mixed integer and floating-point type promotion supported?
  • I see it has CPU and NVIDIA GPU support, plus some other vendors of accelerators that I don't immediately recognize. Are those all GPUs as well? And ROCm and Intel XPUs are not supported (now or in the near future)?

@HydrogenSulfate
Copy link
Author

HydrogenSulfate commented Nov 26, 2024

Cool, thanks for working on this @HydrogenSulfate!

I am curious to learn a bit more about Paddle. In particular conceptually what is supported - https://www.paddlepaddle.org.cn/documentation/docs/en/guides/jit/index_en.html and a few other guides tell me a bit, but not quite what I was most interested in. A few questions if you don't mind:

  • Is the default execution model eager or lazy/graph-based?
  • It looks like there is a JIT compiler, what's the syntax and does it work similar to, for example, jax.jit or torch.compile?
  • Is item and slice assignment supported via __setitem__? And indexing with boolean mask?
  • Is mixed integer and floating-point type promotion supported?
  • I see it has CPU and NVIDIA GPU support, plus some other vendors of accelerators that I don't immediately recognize. Are those all GPUs as well? And ROCm and Intel XPUs are not supported (now or in the near future)?

Thanks much for reply and attention to this PR,

  1. Paddle use eager execution mode as default(eager Tensor running with dynamic graph), and can be manually switched to static graph(lazy Tensor running with static computational Program) by model = paddle.jit.to_static(model).

  2. The usage of paddle.jit.to_static and torch.compile/jiax.jit is very similar. When designing these interfaces, we referred to influential and great tools such as pytorch/jax. The usage of paddle.jit.* is roughly as follows:

    1. Firstly, users will use dynamic graphs for programming and training models
    2. Secondly, if users need it, they can use one line of code to convert the model: model = paddle.jit.to_static(model) without any other modifications, convert the model to a static graph model, and then start training. Due to the advantages of static graph models, this usually results in a small performance improvement, and our conversion rate has been extensively tested on our existing models, with a success rate close to 100%
    3. If there is a higher performance requirement, users can add the option to enable the CINN compiler in jit.to_static: modulus-sym code for exmample, which can capture the entire computation graph, including forward pass, backward pass, even double-backward pass(or higher-order), and further accelerate the program. We have tested it on 40+models in the NVIDIA/modulus-sym suite and achieved IPS performance that exceeds PyTorch by about 70% when the CINN compiler is enabled (of course, this is partly because PyTorch does not seem to support capturing and compiling high-order backward)
    4. After training, we can save the computational program of model via: paddle.jit.save(model, output_path) to get a deployable model(like .pb of tensorflow).
  3. item and slice assignment are supported with broadcasting as below

    import paddle
    
    x = paddle.randn([4, 3, 2])
    v = paddle.randn([3, 2])
    x[0, 1] = 3.0
    print(x)
    
    x[:] = v
    print(x)
    
    mask = paddle.to_tensor([True, False, True, False])
    
    x[mask] = paddle.zeros([3, 2])
    print(x)
  4. Our implicit promotion support fp32/fp64, c32/c64 promotion, but do not support mixed integer and bool type(the purpose is to avoid covert transformations that are easily overlooked by users, which can lead to the model giving unexpected results), detailed table can be checked at url:
    image

  5. We support XPU and ROCM, I will supplement these devices type in subsequent commits

@rgommers
Copy link
Member

Thanks for the detailed explanation @HydrogenSulfate, much appreciated.

5. We support XPU and ROCM, I will supplement these devices type in subsequent commits

I'll note that I tried inferring supported devices from this Install page, where ROCm/XPU aren't yet present:

image

@HydrogenSulfate
Copy link
Author

谢谢你的详细解释@HydrogenSulfate,非常感谢。

  1. 我们支持XPUROCM,我会在后续的提交中补充这些设备类型

我会注意到,我尝试从此安装页面推断支持的设备,但其中 ROCm/XPU 尚不存在:

图像

Embarrassingly, our English documents are somewhat outdated, so you can use the browser's translation feature to translate the Chinese documents into English.

ROCm is used in HYGON:
image
image

XPU is used in KUNLUNXIN:
image

@rgommers
Copy link
Member

Embarrassingly, our English documents are somewhat outdated

Not embarrassing at all - we still haven't even deployed our Chinese translations on https://numpy.org/ (they're coming though!).

Thanks for the tips. Once this is ready, I'll try giving Paddle + SciPy a spin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants