-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Add support for seq split in Domino #961
base: master
Are you sure you want to change the base?
Conversation
same thing here as deepspeedai/DeepSpeed#7111, I just changed it to draft branch, thank you! |
@@ -206,6 +206,8 @@ def parse_args(): | |||
help='Report loss and timing interval.') | |||
parser.add_argument('--save-interval', type=int, default=None, | |||
help='Number of iterations between checkpoint saves.') | |||
parser.add_argument('--input-split-dim', type=str, default='batch', | |||
help='Dimension for input split.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dimension for input split. ['batch', 'seq']
@@ -127,6 +127,7 @@ def __init__(self, | |||
self.init_method = config.init_method | |||
self.encoder_attn_mask_type = encoder_attn_mask_type | |||
self.encoder_hidden_state = None | |||
self.input_split_dim = config.input_split_dim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one possible thing we need to verify and check is if split dim is seq, can we still initialize the same rope embedding, which is a very popular position embedding function. because for rope:
if self.use_rotary_position_embeddings:
self.seq_length = args.seq_length
our self.seq_length may not be correct ( ours might be half of original seq_length).
No description provided.