MetaCAT BERT update to tutorials 4.1 and 4.2 #26

shubham-s-agarwal · 2024-08-20T10:31:27Z

Updates to Tutorials 4.1 and 4.2: Integration of BERT Implementation

Tutorial 4.1: No changes were necessary since BERT utilizes its own pre-trained tokenizer. A note has been added advising users to proceed directly to Tutorial 4.2 if using BERT.
Tutorial 4.2: BERT implementation has been added, showing training on the given dataset. Unlike BiLSTM, which fine-tunes a previously trained model pack, BERT is trained from scratch.

Adding BERT implementation for MetaCAT in tutorial

mart-r

4.1 looks good. Don't think there needs to be more in here for that.

4.2

The MetaCAT configuration and the Train MetaCAT (sub)section are now duplicated. It would be good to have these appear once. Otherwise we'll end up having to maintain 2 copies of the same text/code.
The two MetaCAT instances (BERT vs BiLSTM) should provide the exact same interface anyway. The only difference is that some config entries don't affect one or the other.
Perhaps we can trim the included output a little? I know it's already long in the existing one, but just doing this comparison it was pretty annoying to have to scroll around the sizeable chunks of output.
There is no description of the model_variant config option in the tutorial

EDIT:
Also, regarding 4.2. The changes to the config in the first code section in the For BERT section is the reason I was talking about creating a static method for getting a BERT-based meta-cat with defaults.
I.e in ConfigMetaCAT we could have something along the lines:

    @classmethod
    def get_default_bert_config(cls, category_name: str, model_name: str = 'bert', nclasses: int = 2) -> 'ConfigMetaCAT':
        config = cls()
        cls.model.model_name = model_name
        cls.model.nclasses = nclasses
        cls.general.category_name = category_name
        return cls

shubham-s-agarwal · 2024-08-21T09:39:26Z

For 4.2:

MetaCAT configuration and the Train MetaCAT have subsections for BERT and Bi-LSTM in them.
Since model_variant isn't commonly changed, I haven't added it in the tutorial. There is a link to all config variables in the section.
Regarding the config section for BERT, we have to explicitly change the model_name, category_name and other variables since we are starting the training from scratch. For Bi-LSTM, we are loading the model pack; if we were training Bi-LSTM from scratch, we would have the do the same.

mart-r

Since model_variant isn't commonly changed, I haven't added it in the tutorial. There is a link to all config variables in the section.

But you're using it in the tutorial - passing it to TokenizerWrapperBERT.load. Someone going through the tutorial won't have any idea what this is or what value it holds.

In 4.2 I don't think we need 2 different (almost identical) mc.train_from_json code blocks. We could just have 1 and get the suffix for the save file from config_metacat.model.model_name automatically.

Other than that, I think this looks good.

shubham-s-agarwal · 2024-08-27T15:09:56Z

Fixed both!

mart-r

Thought I already approved yesterday, but I think I was waiting for GHA workflow just in case.

Looks good to me.

shubham-s-agarwal added 6 commits August 16, 2024 15:37

Update tutorial for MetaCAT

2200719

Adding BERT implementation for MetaCAT in tutorial

Update Part_4_2_Supervised_Training_and_Meta_annotations.ipynb

9a4ca95

Update Part_4_2_Supervised_Training_and_Meta_annotations.ipynb

31b811f

Update Part_4_2_Supervised_Training_and_Meta_annotations.ipynb

ab29b05

Update Part_4_2_Supervised_Training_and_Meta_annotations.html

893912f

Pushing change for tutorial 4.1

85f8f5e

shubham-s-agarwal requested review from tomolopolis and mart-r August 20, 2024 10:31

shubham-s-agarwal self-assigned this Aug 20, 2024

mart-r requested changes Aug 20, 2024

View reviewed changes

Pushing change for 4.2

bef4f3a

mart-r requested changes Aug 27, 2024

View reviewed changes

Pushing new format for tutorial 4.2

86290d6

mart-r approved these changes Aug 28, 2024

View reviewed changes

shubham-s-agarwal merged commit ae31b0f into main Aug 29, 2024
6 checks passed

shubham-s-agarwal deleted the MetaCAT_bert_update branch August 29, 2024 08:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MetaCAT BERT update to tutorials 4.1 and 4.2 #26

MetaCAT BERT update to tutorials 4.1 and 4.2 #26

shubham-s-agarwal commented Aug 20, 2024

mart-r left a comment •

edited

Loading

shubham-s-agarwal commented Aug 21, 2024 •

edited

Loading

mart-r left a comment

shubham-s-agarwal commented Aug 27, 2024

mart-r left a comment

MetaCAT BERT update to tutorials 4.1 and 4.2 #26

MetaCAT BERT update to tutorials 4.1 and 4.2 #26

Conversation

shubham-s-agarwal commented Aug 20, 2024

mart-r left a comment • edited Loading

Choose a reason for hiding this comment

shubham-s-agarwal commented Aug 21, 2024 • edited Loading

mart-r left a comment

Choose a reason for hiding this comment

shubham-s-agarwal commented Aug 27, 2024

mart-r left a comment

Choose a reason for hiding this comment

mart-r left a comment •

edited

Loading

shubham-s-agarwal commented Aug 21, 2024 •

edited

Loading