run sdpa with dtensor #180

tianyu-l · 2024-03-30T00:23:20Z

Stack from ghstack (oldest at bottom):

This PR gets rid of the manual adjustment of num of heads in attention layers, by using dtensor outputs of wq, wk, wv, so that the SDPA is aware of the distributedness.

[ghstack-poisoned]

ghstack-source-id: 33d3d0b6a19c747269aab1a95589bb61bf9c1f51 Pull Request resolved: #180

This PR gets rid of the manual adjustment of num of heads in attention layers, by using dtensor outputs of `wq`, `wk`, `wv`, so that the SDPA is aware of the distributedness. [ghstack-poisoned]

ghstack-source-id: 43941c1ca0dfc7a04589a7513a110b877c217917 Pull Request resolved: #180

wanchaol · 2024-04-01T18:20:54Z

torchtrain/parallelisms/parallelize_llama.py

-                "attention.wq": col_parallel_strategy(),
-                "attention.wk": col_parallel_strategy(),
-                "attention.wv": col_parallel_strategy(),
+                "attention.wq": col_parallel_strategy(use_local_output=False),


🤔 I thought we need to replicate the freq_cis but here it seems we don't need to?

wconstab · 2024-04-30T17:00:36Z

just curious, is this gonna land soon or does it have some risk or unfinished business?

also looks like this could use a rebase. i got a little confused applying it on my branch bc some of the sharding config seems changed (attention.wo and attention_norm)

tianyu-l · 2024-04-30T18:08:46Z

just curious, is this gonna land soon or does it have some risk or unfinished business?

also looks like this could use a rebase. i got a little confused applying it on my branch bc some of the sharding config seems changed (attention.wo and attention_norm)

It hasn't been landed because there is a very strange bug (#267) associated with (but seemingly not caused by) multiplication using DTensor. It would be triggered in the rotary embedding computation if this PR is landed. I will work on the bug soon since it will also benefit PP (iiuc). @wconstab

wconstab · 2024-04-30T22:02:34Z

It would be triggered in the rotary embedding computation if this PR is landed

oh, is this related to dispatching for complex numbers by any chance?

tianyu-l · 2024-04-30T22:09:59Z

oh, is this related to dispatching for complex numbers by any chance?

@wconstab Possibly, we don't know. The aten.mul op returns bad results with inputs being raw torch.Tensor (desugared from DTensor), and this bug is only present in the backward pass. Do you know who I should ask for help from?

[ghstack-poisoned]

ghstack-source-id: 58ba72163a4b03d77f4b2ba7c97cef7e7e8b3096 Pull Request resolved: #180

[ghstack-poisoned]

ghstack-source-id: a18a3cb1ba48fb751f437a5ee44f186ff9a26e9a Pull Request resolved: #180

[ghstack-poisoned]

ghstack-source-id: b8b2b58ffc72fcb8bfc88f4ba2a3455e3cc92c0a Pull Request resolved: #180

[ghstack-poisoned]

ghstack-source-id: 55bb9e1ba289c212f4af58e19d9bede2ad0246a8 Pull Request resolved: #180

run sdpa with dtensor

89baba9

[ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Mar 30, 2024

run sdpa with dtensor

b67c664

ghstack-source-id: 33d3d0b6a19c747269aab1a95589bb61bf9c1f51 Pull Request resolved: #180

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 30, 2024

tianyu-l mentioned this pull request Mar 30, 2024

run sdpa with dtensor #163

Closed

tianyu-l requested a review from wanchaol March 30, 2024 00:25

Update on "run sdpa with dtensor"

eec9bfb

This PR gets rid of the manual adjustment of num of heads in attention layers, by using dtensor outputs of `wq`, `wk`, `wv`, so that the SDPA is aware of the distributedness. [ghstack-poisoned]

tianyu-l added a commit that referenced this pull request Mar 30, 2024

run sdpa with dtensor

df14507

ghstack-source-id: 43941c1ca0dfc7a04589a7513a110b877c217917 Pull Request resolved: #180

wanchaol reviewed Apr 1, 2024

View reviewed changes

wconstab mentioned this pull request May 1, 2024

Add support for seed checkpoint creation for meta-init flow #172

Merged

Update

126f9ba

[ghstack-poisoned]

This was referenced May 1, 2024

Add Pipeline Parallel (and 2D PP+FSDP) support #161

Closed

Enable TP+PP support #285

Closed

wconstab pushed a commit that referenced this pull request May 1, 2024

run sdpa with dtensor

1c0d5fb

ghstack-source-id: 58ba72163a4b03d77f4b2ba7c97cef7e7e8b3096 Pull Request resolved: #180

Update

4d35123

[ghstack-poisoned]

wconstab pushed a commit that referenced this pull request May 2, 2024

run sdpa with dtensor

ab65d93

ghstack-source-id: a18a3cb1ba48fb751f437a5ee44f186ff9a26e9a Pull Request resolved: #180

Update

e145f8a

[ghstack-poisoned]

wconstab pushed a commit that referenced this pull request May 2, 2024

run sdpa with dtensor

c530a64

ghstack-source-id: b8b2b58ffc72fcb8bfc88f4ba2a3455e3cc92c0a Pull Request resolved: #180

Update

a28e74e

[ghstack-poisoned]

wconstab pushed a commit that referenced this pull request May 2, 2024

run sdpa with dtensor

6ff10c1

ghstack-source-id: 55bb9e1ba289c212f4af58e19d9bede2ad0246a8 Pull Request resolved: #180

tianyu-l force-pushed the gh/tianyu-l/7/base branch from 9d45a6c to e773b75 Compare August 16, 2024 21:00

tianyu-l force-pushed the gh/tianyu-l/7/head branch from fe1f241 to a28e74e Compare August 16, 2024 21:00

tianyu-l added a commit that referenced this pull request Aug 16, 2024

run sdpa with dtensor

2659b30

ghstack-source-id: 55bb9e1ba289c212f4af58e19d9bede2ad0246a8 Pull Request resolved: #180

tianyu-l deleted the branch gh/tianyu-l/7/base December 18, 2024 00:43

tianyu-l closed this Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run sdpa with dtensor #180

run sdpa with dtensor #180

tianyu-l commented Mar 30, 2024 •

edited by wconstab

Loading

wanchaol Apr 1, 2024

wconstab commented Apr 30, 2024 •

edited

Loading

tianyu-l commented Apr 30, 2024

wconstab commented Apr 30, 2024

tianyu-l commented Apr 30, 2024

run sdpa with dtensor #180

run sdpa with dtensor #180

Conversation

tianyu-l commented Mar 30, 2024 • edited by wconstab Loading

wanchaol Apr 1, 2024

Choose a reason for hiding this comment

wconstab commented Apr 30, 2024 • edited Loading

tianyu-l commented Apr 30, 2024

wconstab commented Apr 30, 2024

tianyu-l commented Apr 30, 2024

tianyu-l commented Mar 30, 2024 •

edited by wconstab

Loading

wconstab commented Apr 30, 2024 •

edited

Loading