mean vs identity pooling? #15

vr25 · 2020-11-02T15:27:32Z

Hi,

The paper describes four pooling functions: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM.

I am confused between mean and identity. I follow that mean means simply average all the [CLS] embeddings for all the chunks which would result in a final [768] -dimensional vector. In this way, how would identity function work? Does it mean concatenating all [CLS] vectors and if so, wouldn't it turn into a very long vector like: number of chunks x 768 ?

Any help in understanding this concept would be appreciated!

Thanks!

The text was updated successfully, but these errors were encountered:

AndriyMulyar · 2020-11-02T15:29:13Z

Yes if you fix the number of chunks.

…

On Mon, Nov 2, 2020, 10:27 AM Vipula Rawte ***@***.***> wrote: Hi, The papers describe four pooling functions: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM. I am confused between mean and identity. I follow that mean means simply average all the [CLS] embeddings for all the chunks which would result in a final [768] -dimensional vector. In this way, how would identity function work? Does it concatenating all [CLS] vectors and if so, wouldn't it turn into a very long vector like: number of chunks x 768? Any help in understanding this concept would be appreciated! Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#15>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADJ4TBRJ6VIMEXNYMZYEOOTSN3FXJANCNFSM4THTLMKA> .

vr25 · 2020-11-02T15:29:53Z

So in your case, it would be 20 x 768? (since max # chunks = 20)?

n8henrie · 2021-04-29T21:49:30Z

I'm also confused.

This paper references this repo as its source code and refers to the four models noted above: 1. Mean, 2. Identity, 3. Transformer, and 4. LSTM.

However, in the code, I see LSTM and Transformer, but instead of Mean and Identity, I see Linear and MaxPool: https://github.com/AndriyMulyar/bert_document_classification/blob/572883204cb1aca50d346979319905f698ad7049/bert_document_classification/document_bert_architectures.py

It looks like the Linear model matches the description of the Identity (reshaping output into a concatenation of CLS via bert_output.view(bert_output.shape[0], -1)).

But MaxPool looks to be doing something totally different than what is described: bert_output.max(dim=1)[0] vs A dimension-wise mean over all CLS embeddings.

@AndriyMulyar thank you very much for providing your code. I'm a novice here -- is there something I'm missing?

EDIT: I think it's also worth noting that the Mean model was the top performer in the paper, so it seems odd to have a different implementation here than what is described.

AndriyMulyar · 2021-04-29T21:53:14Z

Hi, the public codebase just hasn't been updated. You can change the pooling from max to mean in the implementation to replicate the stated results in the paper.

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mean vs identity pooling? #15

mean vs identity pooling? #15

vr25 commented Nov 2, 2020 •

edited

Loading

AndriyMulyar commented Nov 2, 2020 via email

vr25 commented Nov 2, 2020

n8henrie commented Apr 29, 2021 •

edited

Loading

AndriyMulyar commented Apr 29, 2021

mean vs identity pooling? #15

mean vs identity pooling? #15

Comments

vr25 commented Nov 2, 2020 • edited Loading

AndriyMulyar commented Nov 2, 2020 via email

vr25 commented Nov 2, 2020

n8henrie commented Apr 29, 2021 • edited Loading

AndriyMulyar commented Apr 29, 2021

vr25 commented Nov 2, 2020 •

edited

Loading

n8henrie commented Apr 29, 2021 •

edited

Loading