Is there a problem in function:"remove_overlapping(sorted_spans):" ? #21

DtYXs · 2020-07-30T13:37:37Z

Hey,
Is there a problem In utils.py, "def remove_overlapping(sorted_spans):" function in line 98 ?
I think we want to accept "span i" when "si.i1 < sj.i1 <= si.i2 < sj.i2 OR sj.i1 < si.i1 <= sj.i2 < si.i2".
But "if len(set(taken)) == 1 or (taken[0] == taken[-1] == False):" seems do the opposite thing.
For example, if seen = [2, 3, 4, 5, 6], when sj.i1 = 3, sj.2 = 5, then taken = [True, True, True], so len(set(taken)) == 1, it will be appended to the nonoverlapping[] list.

DtYXs · 2020-07-30T14:04:57Z

when START(i) < START(j) <= END(i) < END(j) or START(j) < START(i) <= END(j) < END(j)
if seen = [2, 3, 4, 5, 6], START(j) == 3, END(j) == 8, then taken = [True, True, True, True, False, False], len(set(taken))==2, taken[0]==True. So span j will not be accepted.

grig-guz · 2020-08-01T21:07:54Z

I agree that this function is wrong, but I don't think the spans "si.i1 < sj.i1 <= si.i2 < sj.i2 OR sj.i1 < si.i1 <= sj.i2 < si.i2" should be accepted - the paper says they should be rejected.
If we previously accepted spans [1, 2, 3] and [4, 5, 6] before, then when we are considering span si=[2, 3, 4, 5], it will be wrongfully accepted.

DtYXs · 2020-08-04T07:53:13Z

I agree that this function is wrong, but I don't think the spans "si.i1 < sj.i1 <= si.i2 < sj.i2 OR sj.i1 < si.i1 <= sj.i2 < si.i2" should be accepted - the paper says they should be rejected.
If we previously accepted spans [1, 2, 3] and [4, 5, 6] before, then when we are considering span si=[2, 3, 4, 5], it will be wrongfully accepted.

I wonder if we accepted span[3, 4, 5, 6], then span[4, 5] and span[2, 3, 4, 5, 6, 7] will be accepted or not? The code will accept them I think.
Besides, do you get the training results? I run the code but the precision is always 0.

grig-guz · 2020-08-04T21:00:11Z

Yes, these spans should be accepted. For example:
"Mister Smith and his wife Missus Smith..."
We have that "Mister Smith", "Missus Smith" and "Mister Smith and his wife Missus Smith" are valid entities - later in the text you can refer to them as "he", "she" or "they". So they should be accepted no matter of the order the remove_overlapping function receives them.
Regarding the performance, I had precision of 0 initially, but after fixing a few bugs I have precision of around 45 now. Still working on finding other bugs because the results are still very far from what is reported in the paper.

DtYXs · 2020-08-06T14:37:31Z

Yes, these spans should be accepted. For example:
"Mister Smith and his wife Missus Smith..."
We have that "Mister Smith", "Missus Smith" and "Mister Smith and his wife Missus Smith" are valid entities - later in the text you can refer to them as "he", "she" or "they". So they should be accepted no matter of the order the remove_overlapping function receives them.
Regarding the performance, I had precision of 0 initially, but after fixing a few bugs I have precision of around 45 now. Still working on finding other bugs because the results are still very far from what is reported in the paper.

Thanks a lot for your answer.
As for the performance, is your precision of around 45 the training precision? I find that the calculation method of the precision seems not the same as what is calculated in the paper. The calculation method in the paper are MUC, B^3 and CEAF. But the calculation method of the code which is in 'def train_doc' seems just calculate a mean. In other words, the precision of the train code is not the Avg F1 in the paper.
Besides, in 'loader.py' 'class Document' 'def truncate', the sents are truncated to a maximun 50 as the paper. I think in 'coref.py' 'def train_epoch', the 'mentions_found' and 'corefs_found' should be smaller than not to be truncated, but 'total_mentions' and 'total_corefs' seem to be the whole document batch. When I set MAX in 'doc = document.truncate(MAX)' a smaller one, the train precision seems to be bad.
I think 'def evaluate' can calculate precision correctly. But when I run 'trainer.evaluate(val_corpus)', there is an error like 'IndexError: list index out of range'. I haven't fix this bug yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a problem in function:"remove_overlapping(sorted_spans):" ? #21

Is there a problem in function:"remove_overlapping(sorted_spans):" ? #21

DtYXs commented Jul 30, 2020

DtYXs commented Jul 30, 2020

grig-guz commented Aug 1, 2020

DtYXs commented Aug 4, 2020

grig-guz commented Aug 4, 2020

DtYXs commented Aug 6, 2020

Is there a problem in function:"remove_overlapping(sorted_spans):" ? #21

Is there a problem in function:"remove_overlapping(sorted_spans):" ? #21

Comments

DtYXs commented Jul 30, 2020

DtYXs commented Jul 30, 2020

grig-guz commented Aug 1, 2020

DtYXs commented Aug 4, 2020

grig-guz commented Aug 4, 2020

DtYXs commented Aug 6, 2020