-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mean vs identity pooling? #15
Comments
Yes if you fix the number of chunks.
…On Mon, Nov 2, 2020, 10:27 AM Vipula Rawte ***@***.***> wrote:
Hi,
The papers describe four pooling functions: 1. Mean, 2. Identity, 3.
Transformer, and 4. LSTM.
I am confused between mean and identity. I follow that mean means simply
average all the [CLS] embeddings for all the chunks which would result in
a final [768] -dimensional vector. In this way, how would identity
function work? Does it concatenating all [CLS] vectors and if so,
wouldn't it turn into a very long vector like: number of chunks x 768?
Any help in understanding this concept would be appreciated!
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#15>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJ4TBRJ6VIMEXNYMZYEOOTSN3FXJANCNFSM4THTLMKA>
.
|
So in your case, it would be 20 x 768? (since max # chunks = 20)? |
I'm also confused. This paper references this repo as its source code and refers to the four models noted above: However, in the code, I see It looks like the But @AndriyMulyar thank you very much for providing your code. I'm a novice here -- is there something I'm missing? EDIT: I think it's also worth noting that the |
Hi, the public codebase just hasn't been updated. You can change the pooling from max to mean in the implementation to replicate the stated results in the paper. Cheers |
Hi,
The paper describes four pooling functions: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM.
I am confused between
mean
andidentity
. I follow thatmean
means simply average all the[CLS]
embeddings for all the chunks which would result in a final[768]
-dimensional vector. In this way, how wouldidentity
function work? Does it mean concatenating all[CLS]
vectors and if so, wouldn't it turn into a very long vector like:number of chunks x 768
?Any help in understanding this concept would be appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: