Skip to content

Commit

Permalink
Merge pull request #254 from Modalities/warmstart_infrastructure_switch
Browse files Browse the repository at this point in the history
Warmstart infrastructure switch
  • Loading branch information
le1nux authored Sep 17, 2024
2 parents dace200 + 9a3ff8c commit 8158de7
Show file tree
Hide file tree
Showing 97 changed files with 3,911 additions and 949 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,5 +160,5 @@ pyenv*
noteboks/*

tests/tmp/*
*wandb_storage*
.coverage/*
wandb_storage/
21 changes: 13 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ We successfully scaled Modalities up to 2048 GPUs on two HPC centers, namely [Le
Besides its scalabilty, Modalities allows to seamlessly integrate new components and features, such as custom attention mechanisms, loss functions, optimizers or models. We provide a series of tutorials to help you get started with training and evaluating models using Modalities. We achieve this level of extensibility by having clear interfaces for each component type (e.g., model, optimizer, etc.), that a component must implement to be registered within Modalities at runtime.

## Getting Started
For training and evaluation of a model, feel free to checkout [this](https://github.com/Modalities/modalities/blob/main/examples/getting_started/README.md) getting started tutorial, in which we train a small, 60M-parameter GPT model on a tiny subset of the Redpajama V2 dataset.
For training and evaluation of a model, feel free to checkout [this](https://github.com/Modalities/modalities/blob/main/tutorials/getting_started/README.md) getting started tutorial, in which we train a small, 60M-parameter GPT model on a tiny subset of the Redpajama V2 dataset.

## Installation

Expand Down Expand Up @@ -108,7 +108,7 @@ Explanation:

* `$(which modalities) run`: This part dynamically finds the path to the Modalities executable and runs it. The run command triggers the main process to start the training.

* `--config_file_path configs/pretraining_config.yaml`: The --config_file_path argument provides the path to the configuration file for the training job. In the example above, it is given by `configs/pretraining_config.yaml`. A configuraton file contains an exhaustive parameterization for all the training components (e.g., dataset, model, optimizer, etc.), making training fully reproducible. An example configuration file can be found [here](examples/getting_started/example_config.yaml), and a complete list of components available in Modalities is provided [here](docs/components/components.md).
* `--config_file_path configs/pretraining_config.yaml`: The --config_file_path argument provides the path to the configuration file for the training job. In the example above, it is given by `configs/pretraining_config.yaml`. A configuraton file contains an exhaustive parameterization for all the training components (e.g., dataset, model, optimizer, etc.), making training fully reproducible. An example configuration file can be found [here](tutorials/getting_started/example_config.yaml), and a complete list of components available in Modalities is provided [here](docs/components/components.md).

If you are a VSCode user, you may want to add this to your `launch.json`:
```json
Expand Down Expand Up @@ -155,7 +155,7 @@ The `modalities data create_raw_index` command triggers the process of creating

### Raw Training Dataset Tokenization

Tokenization is the process of converting raw text data into a sequence of tokens that can be used as input to the model. The tokenization requires a configuration file, fully describing the tokenization process, making it fully reproducible. An example tokenization config can be found [here](examples/getting_started/example_dataset_config_train.yaml).
Tokenization is the process of converting raw text data into a sequence of tokens that can be used as input to the model. The tokenization requires a configuration file, fully describing the tokenization process, making it fully reproducible. An example tokenization config can be found [here](tutorials/getting_started/example_dataset_config_train.yaml).

Example:
```sh
Expand All @@ -164,7 +164,7 @@ modalities data pack_encoded_data configs/tokenization_config.yaml

### Inference

For inference on a model checkpoint, we have to pass a configuration file that specifies the full inference setup. An example inference config can be found [here](examples/getting_started/example_text_generation_config.yaml).
For inference on a model checkpoint, we have to pass a configuration file that specifies the full inference setup. An example inference config can be found [here](tutorials/getting_started/example_text_generation_config.yaml).

Example:

Expand All @@ -176,14 +176,19 @@ modalities generate_text --config_file_path example_text_generation_config.yaml
## Tutorials
Even though Modalities significantly simplifies LLM training, there is still some technical complexity left. We provide a series of tutorials to help you get started with training and evaluating models using Modalities.

- [Getting Started](examples/getting_started/README.md)</br>
- [Modalities in 15mins](tutorials/modalities_in_15_mins/README.md) </br>
Train a dense model with Modalities in 15 minutes

- [Getting Started](tutorials/getting_started/README.md)</br>
Brief overview on how to get started with Modalities by training a small GPT model on a tiny subset of the Redpajama V2 dataset.

- [Library Usage](examples/library_usage/README.md)</br>
- [Warmstart](tutorials/warmstart/README.md) </br>
Continue the training from a checkpoint, e.g., after the training was interrupted or had crashed.

- [Library Usage](tutorials/library_usage/README.md)</br>
How to use Modalities as a library and register custom components with Modalities.

- [Modalities in 15mins] </br>
Jupyter notebook will be added soon



## Supported Features
Expand Down
114 changes: 65 additions & 49 deletions config_files/training/config_example_coca.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,53 @@ settings:
referencing_keys:
sample_key: input_ids
target_key: target_ids
training:
training_log_interval_in_steps: 2
checkpointing_interval_in_steps: 2
evaluation_interval_in_steps: 2
global_num_seen_tokens: 0
activation_checkpointing_modules: []
gradient_acc_steps: 1
local_train_micro_batch_size: 3
sequence_length: 256
prediction_key: logits
cuda_env:
local_rank: ${cuda_env:LOCAL_RANK}
global_rank: ${cuda_env:RANK}
world_size: ${cuda_env:WORLD_SIZE}
paths:
checkpointing_path: data/checkpoints

tokenizer:
component_key: tokenizer
variant_key: gpt2_tokenizer_fast
config:
tokenizer_file: data/tokenizer/tokenizer_gpt2.json
checkpoint_saving_path: data/checkpoints
train_dataset_path: ./data/lorem_ipsum.pbin
intervals:
training_log_interval_in_steps: 2
checkpointing_interval_in_steps: 2
evaluation_interval_in_steps: 2
consistency_enforcement:
enforce_tokens_per_step_consistency: true
enforce_last_step_logged: false
enforce_last_step_evaluated: false
enforce_last_step_checkpointed: false
step_profile:
gradient_accumulation_steps: 1
local_train_micro_batch_size: 1
sequence_length: 256
training_target:
num_target_tokens:
component_key: number_conversion
variant_key: num_tokens_from_num_steps
config:
num_steps: ${settings.training_target.num_target_steps}
num_ranks: ${settings.cuda_env.world_size}
local_micro_batch_size: ${settings.step_profile.local_train_micro_batch_size}
sequence_length: ${settings.step_profile.sequence_length}
gradient_accumulation_steps: ${settings.step_profile.gradient_accumulation_steps}
num_target_steps: # for the batch progress subscriber
component_key: number_conversion
variant_key: num_steps_from_num_samples
config:
num_ranks: ${settings.cuda_env.world_size}
local_micro_batch_size: ${settings.step_profile.local_train_micro_batch_size}
global_num_samples: ${settings.coca_example_settings.train_num_samples}
gradient_accumulation_steps: ${settings.step_profile.gradient_accumulation_steps}
training_progress:
global_num_seen_tokens: 0
num_seen_steps: 0
local_num_seen_batches: 0
last_step: -1
coca_example_settings:
train_num_samples: 64
val_num_samples: 32

collate_fn:
component_key: collate_fn
Expand All @@ -41,7 +67,7 @@ train_dataset:
component_key: dataset
variant_key: dummy_dataset
config:
num_samples: 64
num_samples: ${settings.coca_example_settings.train_num_samples}
sample_definition:
- sample_key: images
sample_shape: [3, 224, 224]
Expand All @@ -54,7 +80,7 @@ val_dataset:
component_key: dataset
variant_key: dummy_dataset
config:
num_samples: 32
num_samples: ${settings.coca_example_settings.val_num_samples}
sample_definition:
- sample_key: images
sample_shape: [3, 224, 224]
Expand All @@ -69,23 +95,26 @@ train_dataloader:
config:
num_workers: 2
pin_memory: true
shuffle: false
dataloader_tag: "train"
dataloader_tag: train
skip_num_batches: ${settings.training_progress.local_num_seen_batches}
dataset:
instance_key: train_dataset
pass_type: BY_REFERENCE
batch_sampler:
component_key: batch_sampler
variant_key: default
config:
batch_size: ${settings.training.local_train_micro_batch_size}
batch_size: ${settings.step_profile.local_train_micro_batch_size}
drop_last: true
sampler:
component_key: sampler
variant_key: distributed_sampler
config:
rank: ${settings.cuda_env.global_rank}
num_replicas: ${settings.cuda_env.world_size}
shuffle: true
drop_last: true
seed: 42
dataset:
instance_key: train_dataset
pass_type: BY_REFERENCE
Expand All @@ -99,23 +128,25 @@ val_dataloader:
config:
num_workers: 2
pin_memory: true
shuffle: false
dataloader_tag: "val"
dataloader_tag: val
dataset:
instance_key: val_dataset
pass_type: BY_REFERENCE
batch_sampler:
component_key: batch_sampler
variant_key: default
config:
batch_size: ${settings.training.local_train_micro_batch_size}
batch_size: ${settings.step_profile.local_train_micro_batch_size}
drop_last: true

sampler:
component_key: sampler
variant_key: distributed_sampler
config:
rank: ${settings.cuda_env.global_rank}
num_replicas: ${settings.cuda_env.world_size}
shuffle: false
drop_last: true
dataset:
instance_key: train_dataset
pass_type: BY_REFERENCE
Expand All @@ -140,22 +171,16 @@ checkpoint_saving:
component_key: checkpoint_saving_execution
variant_key: fsdp
config:
checkpoint_path: ${settings.paths.checkpointing_path}
checkpoint_path: ${settings.paths.checkpoint_saving_path}
global_rank: ${settings.cuda_env.global_rank}
experiment_id: ${settings.experiment_id}
get_num_tokens_from_num_steps_callable:
component_key: number_conversion
variant_key: num_tokens_from_num_steps_callable
config:
num_ranks: ${settings.cuda_env.world_size}
local_micro_batch_size: ${settings.training.local_train_micro_batch_size}
sequence_length: ${settings.training.sequence_length}

loss_fn:
component_key: loss
variant_key: clm_cross_entropy_loss
config:
target_key: ${settings.referencing_keys.target_key}
prediction_key: logits
prediction_key: ${settings.referencing_keys.prediction_key}

wrapped_model:
component_key: model
Expand All @@ -169,7 +194,7 @@ wrapped_model:
sharding_strategy: FULL_SHARD
block_names: [TransformerBlock, VisionTransformerBlock]

model:
model:
component_key: model
variant_key: model_initialized
config:
Expand Down Expand Up @@ -241,9 +266,10 @@ scheduler:
max_lr: 6e-4
div_factor: 10
final_div_factor: 1
total_steps: 64
total_steps: ${settings.training_target.num_target_steps}
pct_start: 0.01
anneal_strategy: cos
last_epoch: ${settings.training_progress.last_step}

optimizer:
component_key: optimizer
Expand All @@ -267,24 +293,14 @@ gradient_clipper:
pass_type: BY_REFERENCE
norm_type: P2_NORM


batch_progress_subscriber:
progress_subscriber:
component_key: progress_subscriber
variant_key: rich
config:
global_rank: ${settings.cuda_env.global_rank}
global_num_seen_steps:
component_key: number_conversion
variant_key: num_steps_from_num_tokens
config:
num_ranks: ${settings.cuda_env.world_size}
local_micro_batch_size: ${settings.training.local_train_micro_batch_size}
global_num_tokens: ${settings.training.global_num_seen_tokens}
sequence_length: ${settings.training.sequence_length}
gradient_acc_steps: ${settings.training.gradient_acc_steps}
train_dataloader:
instance_key: train_dataloader
pass_type: BY_REFERENCE
num_seen_steps: ${settings.training_progress.num_seen_steps}
num_target_steps: ${settings.training_target.num_target_steps}
train_dataloader_tag: ${train_dataloader.config.dataloader_tag}
eval_dataloaders:
instance_key: eval_dataloaders
pass_type: BY_REFERENCE
Expand Down
Loading

0 comments on commit 8158de7

Please sign in to comment.