Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer based NER model #2

Open
walter-hernandez opened this issue Jun 25, 2021 · 4 comments
Open

Transformer based NER model #2

walter-hernandez opened this issue Jun 25, 2021 · 4 comments

Comments

@walter-hernandez
Copy link

Discussion 🗣

Hello all!

Continuing a discussion with @EasonC13 and @niallroche, I am opening this thread to keep track of the different approaches to fine-tune a transformer model (BERT or some variation of it like ALBERT) and its usage with Snorkel.

Context

@EasonC13 already did some work to generate a dataset with multiple NER labels using Keras here: https://github.com/accordproject/labs-cicero-classify/blob/dev/Practice/keras/keras_decompose_NER_model.ipynb

To replicate the above, we can:

Detailed Description

If we go with spacy, Snorkel has compatibility with it out of the box. However, it is limited to version 2 and depending of our needs, we can do the pull request to facilitate the implementation spacy v3 in Snorkel. Although, we can go without having to do it.

Still, we can create a labelling function with our fine-tuned transformer model and use it as a custom labelling function while using Snorkel's implementation of spacy for the preprocessing needed.

Another thing to consider is the way to do inference, having in mind the high run-time cost in production of a transformer model, with the fine-tuned transformer model being used as a labelling function:

@EasonC13
Copy link
Collaborator

EasonC13 commented Jun 26, 2021

Let me fill up the previous context.

at 6/24, I think I am encounter a challenge.
It is easy to edit BERT NER model to have custom label. So the model can predict with one label, like “Eason” is a person.
However, I still looking on how to let NER model have “multi custom label”, I want the model to know, like, “Eason” is not only a person but also a party and a string. Because the NER example I see so far, BERT turn the label into a 128 dim label ids, not like classic classification model is the one-hot encoding.
Even-though the model at the pytorch print said it is a n dim output, the actual input data seems not like this.

So I ask my mentors for help, and turn to Keras because I am more familiar with it.

6/25

I successfully change the model output as I wish to categorial mode via Keras.
https://github.com/accordproject/labs-cicero-classify/blob/dev/Practice/keras/keras_decompose_NER_model.ipynb

I first reference this tutorial.
https://apoorvnandan.github.io/2020/08/02/bert-ner/

Then I change the classical NER model's loss to CategoricalCrossentropy and transform train_y into the one-hot encoding format.

Now the model is training! I can't wait to validate it and create a multi_label dataset to verify this method will work or not.

I think I can do it since I first learn Deep learning via Keras. Now I need to know how to do the same thing via PyTorch. I will start reading the document on it.

If you have any clue about how to do the same setting on PyTorch, please let me know. I hope I can build one in PyTorch because Keras is CUDA version-specific and eats all my memory. So it is not a good tool for long-term production use.

And feel free to comment about this approach (and the code).

6/26

I keep learning and looking on how to implement one on PyTorch.

Will also take a look on Spacy.
I think we can discuss that weather use pytorch or use Spacy version 3 as @walter-hernandez mention on monday's meeting.

@walter-hernandez
Copy link
Author

Hello @EasonC13

Using the same entity for multiple labels is something that I think may not be possible to do within the same model in BERT or at least I am not fully aware of clean ways to do it. However, you can try some hack approaches like:

  1. Have one label being "person+party+string" and distinguish it from label "person" when training the model
  2. Train multiple NER models for different sets of labels. So, one model would tag Eason as person, the other one would tag it as party and the third one would tag it as string

If you go for the second approach, you can try using:

  • Lightweight versions of BERT like ALBERT
  • Do adapter tuning

If yo do Adapter tuning, it means that you would have to:

  • Train a model for each set of labels that identify "Eason":
  • One model for a group of labels where "Eason" is a PERSON
  • One model for a group of labels where "Eason" is a STRING

Each individual model keeps BERT's pretrained weights frozen, which means "Adapter-based tuning requires training two orders of magnitude fewer parameters to fine-tuning, while attaining similar performance".

Each individual model or adapter can be combined using adapter fusion:

"The AdapterFusion component takes as input the representations of multiple adapters trained on different tasks and learns a parameterized mixer of the encoded information.", which means sharing information across multiple tasks.

-- Check the documentation about how to do adapter tuning and use adapter fusion: https://docs.adapterhub.ml/

@EasonC13
Copy link
Collaborator

Hi @walter-hernandez , I think I can use N adapter for N labels, each adapter specialist for one datatype. So when user correct, it is easy to re-train them without interrupt others.

Now I'm trying to implement it.

While it seems adapter only can train with same output label. So it is a little challenge to set up an effective training pipeline, or I should train N times for N adapter. Now I have come up with a plan to done effective training. That is, train all wanted adapter together with one label (ex: 1), no matter it is the correct label or not, but only do gradient descent on those with correct label, then train the opposite label (ex: 0) and do gradient descent on others.

While I first will build a prototype which have N adapter for N labels, on basic NER dataset, to prove adapter will work for this job.

What do you think about it?

@rjurney
Copy link

rjurney commented Oct 13, 2021

@EasonC13 how did this go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants