Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Add support for seq split in Domino #961

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

duanhx1037
Copy link

No description provided.

@duanhx1037 duanhx1037 requested a review from tjruwase as a code owner March 4, 2025 20:23
@GuanhuaWang GuanhuaWang marked this pull request as draft March 4, 2025 22:24
@GuanhuaWang
Copy link

@duanhx1037 ,

same thing here as deepspeedai/DeepSpeed#7111, I just changed it to draft branch, thank you!

@GuanhuaWang GuanhuaWang changed the title Add support for seq split in Domino [Draft] Add support for seq split in Domino Mar 4, 2025
@@ -206,6 +206,8 @@ def parse_args():
help='Report loss and timing interval.')
parser.add_argument('--save-interval', type=int, default=None,
help='Number of iterations between checkpoint saves.')
parser.add_argument('--input-split-dim', type=str, default='batch',
help='Dimension for input split.')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimension for input split. ['batch', 'seq']

@@ -127,6 +127,7 @@ def __init__(self,
self.init_method = config.init_method
self.encoder_attn_mask_type = encoder_attn_mask_type
self.encoder_hidden_state = None
self.input_split_dim = config.input_split_dim

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one possible thing we need to verify and check is if split dim is seq, can we still initialize the same rope embedding, which is a very popular position embedding function. because for rope:

        if self.use_rotary_position_embeddings:
            self.seq_length = args.seq_length

our self.seq_length may not be correct ( ours might be half of original seq_length).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants