Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: '1' when I ran ""make_train_from_ranking.py" #2

Open
XY2323819551 opened this issue May 10, 2022 · 4 comments
Open

KeyError: '1' when I ran ""make_train_from_ranking.py" #2

XY2323819551 opened this issue May 10, 2022 · 4 comments

Comments

@XY2323819551
Copy link

Hello, thanks for your amazing work, I really want to reproduce it. However, I met an issue when I run the code, could you help me?

command line:
python make_train_from_ranking.py --ranking-file /home/zhangxy/QA/ANCE-PRF/pyserini/runs/run.msmarco-passage.ance.bf.tsv --model-type ANCE --query-file /home/zhangxy/QA/ANCE-PRF-main/data/marco_raw_data/queries.train.tsv --collection-file ./data/msmarco_passage/collection/collection.tsv --pair-file /home/zhangxy/QA/ANCE-PRF-main/data/marco_raw_data/qrels.train.tsv --output data/hard/negative.result --encoder /home/zhangxy/QA/pyserini_for_ance-prf/pyserini/encoders/ance-msmarco-passage

processing:
Load Query: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 808731/808731 [00:00<00:00, 1140903.16it/s]
Load Collection: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 8841823/8841823 [00:16<00:00, 521248.96it/s]
Load Q-D Pair: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 532761/532761 [00:00<00:00, 989247.88it/s]
Load Ranking: 0%| | 0/808731000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "make_train_from_ranking.py", line 94, in
rankings, topk = read_ranking(args.ranking_file, pair, args.prf_k, args.from_top)
File "make_train_from_ranking.py", line 35, in read_ranking
targets = pair[qid].keys()
KeyError: '1'

@hanglics
Copy link
Member

hanglics commented May 10, 2022

Hi sorry about this, you need this file for training. train_query_passage_pair.tsv for the --pair-file arg.

@XY2323819551
Copy link
Author

/home/zhangxy/QA/ANCE-PRF-main/data/marco_raw_data/qrels.train.tsv

I tried the new pair file but failed. I noticed that the "queries.train.tsv" for the "--query-file" arg I used has 808731 examples, however, "train_query_passage_pair.tsv" for the "--pair-file" has 532751 examples, which is less than "queries.train.tsv". I guess this issue was caused by the mismatches between two files. So, is it convenient for you to provide me the file with the "--query-file " arg? Thank you very much!

@XY2323819551
Copy link
Author

Hi sorry about this, you need this file for training. train_query_passage_pair.tsv for the --pair-file arg.

I had this problem before in this issue, I mistakenly thought I found the correct file, but it seems I didn't.

@hanglics
Copy link
Member

For the --query-file arg, please use this file train_query_judged.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants