Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting error with vbert #41

Open
somnath-banerjee opened this issue Sep 1, 2021 · 4 comments
Open

getting error with vbert #41

somnath-banerjee opened this issue Sep 1, 2021 · 4 comments

Comments

@somnath-banerjee
Copy link

While using vbert, I am getting the error. Please help.

vbert = onir_pt.reranker('vanilla_transformer', 'bert', text_field='abstract', vocab_config={'train': True})
vbert_pipeline = (
pt.BatchRetrieve(index,wmodel='BM25',metadata=["docno", "text"]) % 1000
>>pt.text.get_text(index,"text")
>>vbert
)
df_res= vbert_pipeline.search("can vitamin d cure covid 19")

[2021-09-02 01:10:08,346][onir_pt][DEBUG] using GPU (deterministic)
[2021-09-02 01:10:11,481][onir_pt][DEBUG] [starting] batches
[2021-09-02 01:10:11,485][onir][CRITICAL] Uncaught exception
Traceback (most recent call last):
File "vbert_baseline.py", line 123, in
df_res= vbert_pipeline.search("can vitamin d cure covid 19")
File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/pyterrier/transformer.py", line 177, in search
rtr = self.transform(queryDf)
File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/pyterrier/transformer.py", line 807, in transform
topics = m.transform(topics)
File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/onir_pt/init.py", line 277, in transform
for count, batch in _logger.pbar(batches, desc='batches', tqdm=pyterrier.tqdm, total=math.ceil(len(dataframe) / self.config['batch_size'])):
File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/onir/log.py", line 110, in pbar
yield from pbar
File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/tqdm/std.py", line 1185, in iter
for obj in iterable:
File "/home/sbanerjee/miniconda3/envs/mytorch/lib/python3.8/site-packages/onir_pt/init.py", line 417, in _iter_batches
batch[f].append(len(doc_tok))
TypeError: object of type 'NoneType' has no len()

@seanmacavaney
Copy link
Contributor

Hi @somnath-banerjee,

Sorry for the delay. It looks like the vbert model is trying to re-rank based on the "abstract" field (text_field='abstract'), whereas only a "text" field is available (metadata=["docno", "text"]). I think switching to text_field='text' should resolve your problem!

@somnath-banerjee
Copy link
Author

Hi @seanmacavaney,
Thanks. It worked with changing the text_field = 'text'.
I am getting some scores that are negative. I am new to IR. I wonder if you kindly let me know how can I interpret this from a theoretical point of view.
Thanks in advance.

@seanmacavaney
Copy link
Contributor

Yes, so the query-document relevance scores produced by the model are only valuable with respect to other query-document relevance scores. In other words, the only thing that matters is that document A's score is greater or less than document B's -- this determines the order of the two documents in the rankings.

Some other models make stronger claims about the meaning of the scores produced. For instance, probabilistic models frame the scores as a probability.

@somnath-banerjee
Copy link
Author

somnath-banerjee commented Sep 9, 2021

Thanks a lot for your answer.
But if the vbert model produces a negative score for a query-document, what does this mean? How it differs from a query-document for which it gives the positive score?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants