-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer based NER model #2
Comments
Let me fill up the previous context. at 6/24, I think I am encounter a challenge. So I ask my mentors for help, and turn to Keras because I am more familiar with it. 6/25I successfully change the model output as I wish to categorial mode via Keras. I first reference this tutorial. Then I change the classical NER model's loss to CategoricalCrossentropy and transform train_y into the one-hot encoding format. Now the model is training! I can't wait to validate it and create a multi_label dataset to verify this method will work or not. I think I can do it since I first learn Deep learning via Keras. Now I need to know how to do the same thing via PyTorch. I will start reading the document on it. If you have any clue about how to do the same setting on PyTorch, please let me know. I hope I can build one in PyTorch because Keras is CUDA version-specific and eats all my memory. So it is not a good tool for long-term production use. And feel free to comment about this approach (and the code). 6/26I keep learning and looking on how to implement one on PyTorch. Will also take a look on Spacy. |
Hello @EasonC13 Using the same entity for multiple labels is something that I think may not be possible to do within the same model in BERT or at least I am not fully aware of clean ways to do it. However, you can try some hack approaches like:
If you go for the second approach, you can try using:
If yo do Adapter tuning, it means that you would have to:
Each individual model keeps BERT's pretrained weights frozen, which means "Adapter-based tuning requires training two orders of magnitude fewer parameters to fine-tuning, while attaining similar performance". Each individual model or adapter can be combined using adapter fusion:
-- Check the documentation about how to do adapter tuning and use adapter fusion: https://docs.adapterhub.ml/ |
Hi @walter-hernandez , I think I can use N adapter for N labels, each adapter specialist for one datatype. So when user correct, it is easy to re-train them without interrupt others. Now I'm trying to implement it. While it seems adapter only can train with same output label. So it is a little challenge to set up an effective training pipeline, or I should train N times for N adapter. Now I have come up with a plan to done effective training. That is, train all wanted adapter together with one label (ex: 1), no matter it is the correct label or not, but only do gradient descent on those with correct label, then train the opposite label (ex: 0) and do gradient descent on others. While I first will build a prototype which have N adapter for N labels, on basic NER dataset, to prove adapter will work for this job. What do you think about it? |
@EasonC13 how did this go? |
Discussion 🗣
Hello all!
Continuing a discussion with @EasonC13 and @niallroche, I am opening this thread to keep track of the different approaches to fine-tune a transformer model (BERT or some variation of it like ALBERT) and its usage with Snorkel.
Context
@EasonC13 already did some work to generate a dataset with multiple NER labels using Keras here: https://github.com/accordproject/labs-cicero-classify/blob/dev/Practice/keras/keras_decompose_NER_model.ipynb
To replicate the above, we can:
Detailed Description
If we go with spacy, Snorkel has compatibility with it out of the box. However, it is limited to version 2 and depending of our needs, we can do the pull request to facilitate the implementation spacy v3 in Snorkel. Although, we can go without having to do it.
Still, we can create a labelling function with our fine-tuned transformer model and use it as a custom labelling function while using Snorkel's implementation of spacy for the preprocessing needed.
Another thing to consider is the way to do inference, having in mind the high run-time cost in production of a transformer model, with the fine-tuned transformer model being used as a labelling function:
The text was updated successfully, but these errors were encountered: