Feat/coca #263

sthoduka · 2024-10-11T13:36:07Z

What does this PR do?

This PR updates the CoCa model so that it can be trained jointly on text-aligned images, audio and video. The webdataset-based dataset and loader are also included.

General Changes

add AudioTransformer model
update the VisionTransformer model for video
add the MultimodalWebDataset dataset for loading audio-text, image-text and video-text in the webdataset format
add a multi-loss function for specifying a weighted-sum of different losses
update the CoCa model to include encoders for video and audio

Breaking Changes

the LLMDataLoader now contains a Pytorch Dataloader object as a member variable instead of inheriting from it.

Checklist before submitting final PR

My PR is minimal and addresses one issue in isolation
I have merged the latest version of the target branch into this feature branch
I have reviewed my own code w.r.t. correct implementation, missing type hints, proper documentation, etc.
I have run a sample config for model training
I have checked that all tests run through (python tests/tests.py) (some tests related to MFU calculation were failing, but I think those are unrelated to this PR)
I have updated the internal changelog (CHANGELOG_DEV.md)

…ctions

…aLoader, so that both LLMDataLoader and WebLoader inherit only from DataLoaderIF

…meters configurable

builder etc.

…on and decay groups

spravil added 30 commits May 7, 2024 13:01

feat: add basic webdataset

d909ae0

fix: dim of cls token

e233676

feat: simple console logging

9986691

fix: add attention mask to cross entropy loss

c47b6c1

feat: allow multiple loss functions

70823e1

fix: register nce loss

0c87d91

feat: add dataloader for webdataset

b652a7d

chore: add config

b0e933a

feat: add nicer logging to wandb

d8d5a5f

fix: hardcoded batches in web loader

4ea65c8

chore: update coca config

dfe88c9

fix: rebase

5a3e844

fix: print only on main rank in component factory

043384d

fix: total loss average logging

3cd9244

fix: cuda env and run script

b09d20e

chore: update coca config

e09745d

fix: print parameters and done only on main rank

5a74dee

chore: update coca wds config

f7b725c

fix: tokenizer config of coca

c7308e2

fix: add multinode splitter to webdataset

32d0b19

fix: webdataset slow loading

55c039f

fix: add batching

966d237

fix: add more options to webloader

dacf639

fix: webloader

b40ecd5

fix: dataset factory

a9ce132

fix: webdataset

63ef47c

fix: loss accumulation

d6d84dc

fix: loss average for eval

3d04f78

refactor: remove unused code from coca collator

e65a3cd

fix: coca collator

622570d

sthoduka and others added 25 commits September 23, 2024 10:38

fix: update path for coca tokenizer

3b89853

chore: use built-in types

8f7a114

chore: refactor loss related items to match main

3eb77a3

docs: add docstrings and type hints for audio-related classes and fun…

3366bc9

…ctions

test: update coca model test and add coca collator test

61cfe51

refactor: verify correctness of coca model config

56ea087

feat: multiple loss functions

8f5aea4

Merge branch 'feat/multiple_loss_functions' into feat/coca

f1f0fe5

test: add more tests for loss functions

c291fcc

revert: add back default values for NCELoss

f1dbe91

refactor: use composition to wrap the pytorch DataLoader using LLMDat…

146682f

…aLoader, so that both LLMDataLoader and WebLoader inherit only from DataLoaderIF

refactor: rename WebLoader to WebDataLoader

b985b9b

docs: update docs for WebDataLoader and MultimodalWebDataset

9113f8a

docs: update docs for coca model

2a1cfa1

docs: update docs for vision transforms and make video transform para…

b9bfaea

…meters configurable

test: add test for webdataset dataset and dataloader

d6da583

Merge branch 'main' into feat/coca

8312ed6

docs: add docstring to WebDataloader, typehints to _init_modality

152ebf2

fix: mask creation for audio inputs

0353969

test: add tests for audio_transformer

a12ff8a

docs: misc. docstrings and type hints for VideoTransform, web dataset

79c6c6b

builder etc.

fix: update simple progress subscriber config

23267e8

refactor: rename norm layers for easier regex for weight initializati…

29352b7

…on and decay groups

test: fix weight initialization and weight decay tests for coca

2aa4fc0

fix: update directory name for getting started example

c84bbda

sthoduka requested review from le1nux and mali-git October 11, 2024 13:36

sthoduka and others added 3 commits October 11, 2024 15:36

docs: update changelog with info about CoCa PR

e33076f

chore: fix linting

71a5bd1

docs: fix minor docstring inconsistencies

de9baab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/coca #263

Feat/coca #263

sthoduka commented Oct 11, 2024 •

edited

Loading

Feat/coca #263

Are you sure you want to change the base?

Feat/coca #263

Conversation

sthoduka commented Oct 11, 2024 • edited Loading

What does this PR do?

General Changes

Breaking Changes

Checklist before submitting final PR

sthoduka commented Oct 11, 2024 •

edited

Loading