ROCm / TransformerEngine Public

Notifications You must be signed in to change notification settings
Fork 3
Star 12

Code
Issues 9
Pull requests 5
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Projects
Security
Insights

Issues: ROCm/TransformerEngine

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9 Open 3 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Build issue]: Building wheel for specific arch not supported without GPU but with all ROCm components installed

#92 opened Nov 13, 2024 by caaatch22

[Out of Box Experience]: ROCm Transformer Engine Should Be Included in AMD Pytorch Images

#82 opened Oct 18, 2024 by OrenLeung

[FSDP 8xMI300X] Llama3 8B FP8 is 21% slower than BF16 & OOMs on the same batch size

#79 opened Oct 15, 2024 by OrenLeung

[FSDP 8xMI300X]: LLama3 70B 4 Layer Proxy Model GPU Core Dumps

#78 opened Oct 15, 2024 by OrenLeung

[DDP 8xMI300X] GPT2-1.5B FP8 is 25% slower than BF16 & OOMs on the same batch size

#76 opened Oct 15, 2024 by OrenLeung

[Issue]: MI300X fused_attn CK Backend Broken HIP runtime error: invalid device function 3rdparty/composable_kernel/include/ck_tile/host/hip_check_error.hpp: 18in function: hip_check_error

#74 opened Oct 15, 2024 by OrenLeung

MI300X FP8 TE.Linear 2x Slower than AMP BF16 F.Linear

#73 opened Oct 13, 2024 by OrenLeung

[1xMI300X] GPT-2 XL 1.5B FP8 Training ~30% slower than H100 FP8

#72 opened Oct 13, 2024 by OrenLeung

[TE] Investigate parallelism implementation in Transformer Engine

#34 opened Apr 23, 2024 by wangye805

4 of 5 tasks

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly